Network Architecture - Backpropagation


Network Architecture - Backpropagation

Backpropagation networks are known for their ability to generalize well on a wide variety of problems.  That is why they are used for the vast majority of working neural network applications.  Depending upon the number of patterns, training may be slow but worth it because they are such robust and global algorithms (although if you use Calibration training is usually much shorter).  When using Backpropagation networks, you can increase the precision of the network by creating a separate network for each output if your outputs are not categories.  NeuroShell 2 offers several different variations of backpropagation networks.:


a.  Each layer connected to the immediately previous layer (with either 3, 4, or 5 layers). Generally three layers (input, hidden, and output layer) are sufficient for the vast majority of problems.  Through experience and literature reviews, we have found that the three layer Backpropagation network with standard connections is suitable for almost all problems if you use Calibration.  NeuroShell 2 gives you a choice of 1, 2, or 3 hidden layers so you can find the best architecture for your problem.  No more than 5 have been included since you should never use more than 5 layers because there is no benefit.


b.  Each layer connected to every previous layer (with either 3, 4, or 5 layers).

This network architecture may be useful when you are working with very complex patterns, i.e., when it may be very difficult for a human to define the different patterns that are inherent in the data.  Sometimes this type of architecture will give better results.    The comments on the number of layers in (a) above apply to this type of network.


c.  Recurrent networks with dampened feedback.

Recurrent networks are known for their ability to learn sequences, so they are excellent for time series data, which many users find invaluable in making financial predictions.  They have the slight disadvantage of taking longer to train.  The user has a choice of feeding the input, hidden, or output layer back into the network for inclusion with the next pattern.  We recommend using the architecture where the hidden layer is fed back into the input layer, which means that features detected in all previous patterns are fed into the network with each new pattern.   Feeding the input layer from one pattern into the input layer of the next pattern is similar to but more powerful than giving the network previous values of each of the inputs, such as including yesterday's stock price and the day before yesterday's stock price with today's stock price.  Feeding the output layer into the input layer shows what the outputs of the previous patterns have been.


The architecture that feeds the hidden layer back into the input layer is commonly called a Jordan Elman recurrent network.


You must use rotational pattern selection for both your training set and test set when working with recurrent networks.  When training the network it must have the patterns presented in sequence without gaps in the data.  You must also test the network with the patterns in sequence.


d.  Ward Networks

Hidden layers in a neural network are known as feature detectors.  Ward Systems Group invented three different Backpropagation network architectures with multiple hidden layers for our consulting work.


Different activation functions applied to hidden layer slabs detect different features in a pattern processed through a network.  For example, a network design may use a Gaussian function on one hidden slab to detect features in the mid-range of the data and use a Gaussian complement in another hidden slab to detect features from the upper and lower extremes of the data.  Combining the two feature sets in the output layer may lead to a better prediction.  Thus, the output layer will get different "views of the data."


Number of Hidden Neurons

The default number of hidden neurons for a 3 layer network is computed with the following formula:

            # of hidden neurons = 1/2(Inputs + Outputs) + Sqrt(# of Patterns)


For more hidden slabs, divide the number above by the number of hidden slabs.


We have found this formula to be better than the default formula in NeuroShell 1, which was:

# of hidden neurons = 2 * square root (number of inputs or defining characteristics + the number of outputs or classifying characteristics) rounded down to the nearest integer