Hidden layers in a neural network are known as feature detectors. Ward Systems Group invented three different Backpropagation network architectures with multiple hidden layers for our consulting work.
Different activation functions applied to hidden layer slabs detect different features in a pattern processed through a network. For example, a network design may use a Gaussian function on one hidden slab to detect features in the mid-range of the data and use a Gaussian complement in another hidden slab to detect features from the upper and lower extremes of the data. Thus, the output layer will get different "views of the data." Combining the two feature sets in the output layer may lead to a better prediction.
Three Different Ward Networks
2 Hidden Slabs with Different Activation Functions
The first Ward Network is a regular three-layer Backpropagation network with two slabs in the hidden layer. Use a different activation function for each slab in the hidden layer to detect different features in the data.
3 Hidden Slabs with Different Activation Functions
This Ward Network is a Backpropagation network that adds a third slab to the hidden layer. When each slab in the hidden layer has a different activation function, it offers three ways of viewing the data.
2 Hidden Slabs, Different Activation Functions + Jump Connection
This Ward Network is a regular three-layer Backpropagation network with two slabs in the hidden layer and a jump connection between the input layer and output layer. The output layer receives two different views of the data's features as detected in the hidden slabs plus the original inputs.
Click on each Slab to select or inspect the number of neurons. Change the default settings by typing in a new value in the text box.
Backpropagation Hidden Neurons
The number of neurons in the hidden layer is usually set automatically the first time you design the network, but you may use the text box to change the default. If you're using Calibration, which limits overlearning and prevents memorization, the number of hidden neurons you use is not as critical as long as there are enough. It may be better to err on the side of using more rather than fewer when using Calibration. When not using Calibration, fewer are better for generalizaton.
The default number of hidden neurons for a 3 layer network is computed with the following formula:
# of hidden neurons = 1/2 (Inputs + Outputs) + square root of the number of patterns in the Training file, or .PAT file if there is no .TRN file.
For more layers, divide the number computed above by the number of hidden layers. If there are multiple slabs in the hidden layer, the hidden neurons will be divided evenly among the slabs.
In Backpropagation networks, the number of hidden neurons determines how well a problem can be learned. If you use too many, the network will tend to try to memorize the problem, and thus not generalize well later (although Calibration mitigates this effect to some extent). If you use too few, the network will generalize well but may not have enough “power” to learn the patterns well. Getting the right number of hidden neurons is a matter or trial and error, since there is no science to it. However, our defaults are usually pretty reliable.
Use the mouse to select a scaling or activation function from the list box. The choice of scaling or activation function will vary depending upon whether the slab is in the input layer (scaling function) or another layer (activation function).
Connection Arrows (Links)
You can click on the Connection Arrows to set or inspect parameters such as learning rate, momentum, and initial weights.
Each link in the network has its own learning rate and momentum that can be set individually. You may also set all links in the network with the same learning rate and momentum by clicking on the "Set all links like current link" check box. On simple problems we recommend that you use a large learning rate and momentum such as .9 and .6 respectively. On more complicated problems or predictive networks where your outputs are continuous values rather than categories, use a smaller learning rate and momentum such as .1 and .1, the defaults that are set in the Beginner's System. If the data is very noisy, try a learning rate of .05 and a momentum of .5.
There is an option in the Backpropagation Training Criteria that allows you to automatically increment learning rate/momentum as training progresses. This is only for experts who know what they are doing. Most problems will never need this. Refer to Changes in Learning Rate/Momentum for details.
The weight that is set in the edit box, e.g., .3, represents a range of values from + .3 to - .3. The network's initial weight for that link is a random value within that range. We recommend the default, but you can try other values if you want to experiment. If you have a large number of weights (i.e., large slabs) you may want to try lower values.