Backpropagation Architecture - Recurrent Network

Top  Previous  Next

 

This type of Backpropagation network has been successfully used in predicting financial markets because recurrent networks can learn sequences, so they are excellent for time series data.  They have the slight disadvantage of taking longer to train.

 

A backpropagation network with standard connections responds to a given input pattern with exactly the same output pattern every time the input pattern is presented. A recurrent network may respond to the same input pattern differently at different times, depending upon the patterns that have been presented as inputs just previously. Thus, the sequence of the patterns is as important as the input pattern itself.

 

Recurrent networks are trained the same as standard backpropagation networks except that patterns must always be presented in the same order; random selection is not allowed.  The one difference in structure is that there is one extra slab in the input layer that is connected to the hidden layer just like the other input slab.  This extra slab holds the contents of one of the layers as it existed when the previous pattern was trained.  In this way the network sees previous knowledge it had about previous inputs.  This extra slab is sometimes called the network's "long term" memory.

 

Note that when you process patterns through a recurrent network, the first couple of patterns that are processed may not produce as accurate results as subsequent patterns because the net's long term memory slab needs to be primed with data.  We recommend that you add 3 to 10 patterns to the beginning of your data for this purpose.  Patterns must be from a time just prior to the beginning of your data in order to show a trend.

 

If there is no temporal (time dependent) structure in the data, using a recurrent network will not work as well as a standard backpropagation network because the long term memory slab will be considered random noise to the net.   You also should not include too many inputs to a recurrent network except for raw data (unless the inputs have a temporal structure).  If a recurrent network does not work well, there may not be time dependencies in the data.

 

Three Types of Recurrent Networks

NeuroShell 2 allows you to select 3 different types of recurrent networks:

 

Input Layer Fed Back Into Input Layer

The long term memory remembers the new input data and uses it when the next pattern is processed.

 

Hidden Layer Fed Back Into Input Layer

The long term memory remembers the hidden layer, which contains features detected in the raw data of previous patterns.  This is the most powerful recurrent network.

 

Output Layer Fed Back Into Input Layer

Long term memory remembers outputs previously predicted.

 

Slabs

Click on each Slab to select or inspect the number of neurons.  Change the default settings by typing in a new value in the text box.

 

Backpropagation Hidden Neurons

The number of neurons in the hidden layer is usually set automatically the first time you design the network, but you may use the text box to change the default.  If you're using Calibration, which limits overlearning and prevents memorization, the number of hidden neurons you use is not as critical as long as there are enough.  It may be better to err on the side of using more rather than fewer when using Calibration.  When not using Calibration, fewer are better for generalizaton.

 

The default number of hidden neurons for a 3 layer network is computed with the following formula:

# of hidden neurons = 1/2 (Inputs + Outputs) + square root of the number of patterns in the Training file, or .PAT file if there is no .TRN file.

 

For more layers, divide the number computed above by the number of hidden layers.  If there are multiple slabs in the hidden layer, the hidden neurons will be divided evenly among the slabs.

 

In Backpropagation networks, the number of hidden neurons determines how well a problem can be learned.  If you use too many, the network will tend to try to memorize the problem, and thus not generalize well later (although Calibration mitigates this effect to some extent).  If you use too few, the network will generalize well but may not have enough “power” to learn the patterns well.  Getting the right number of hidden neurons is a matter or trial and error, since there is no science to it.  However, our defaults are usually pretty reliable.

 

Use the mouse to select a scaling or activation function from the list box.  The choice of scaling or activation function will vary depending upon whether the slab is in the input layer (scaling function) or another layer (activation function).

 

Connection Arrows (Links)

You can click on the Connection Arrows to set or inspect parameters such as learning rate, momentum, and initial weights.

 

Each link in the network has its own learning rate and momentum that can be set individually.  You may also set all links in the network with the same learning rate and momentum by clicking on the "Set all links like current link" check box.   On simple problems we recommend that you use a large learning rate and momentum such as .9 and .6 respectively.  On more complicated problems or predictive networks where your outputs are continuous values rather than categories, use a smaller learning rate and momentum such as .1 and .1, the defaults that are set in the Beginner's System.  If the data is very noisy, try a learning rate of .05 and a momentum of .5.

 

There is an option in the Backpropagation Training Criteria that allows you to automatically increment learning rate/momentum as training progresses.  This is only for experts who know what they are doing.  Most problems will never need this.  Refer to Changes in Learning Rate/Momentum for details.

 

The weight that is set in the edit box, e.g., .3,  represents a range of values from + .3 to - .3.  The network's initial weight for that link is a random value within that range.  We recommend the default, but you can try other values if you want to experiment.  If you have a large number of weights (i.e., large slabs) you may want to try lower values.

 

Recurrent networks have an extra kind of links, called feedback links.  Feedback Link 1 contains a factor that tells what proportion of the neuron values goes into the long term memory slab from itself.  Feedback Link 2 contains a factor that tells what proportion of the neuron values in the current pattern from either the input, hidden, or output layer (depending upon the network type) are fed into the long term memory slab.  Both proportions must add up to 1.  If you want to put more emphasis on historical patterns, put a higher proportion of neuron values on Feedback Link 1.  If you want to put more emphasis on more recent patterns, put a higher proportion of neuron values on Feedback Link 2.