Backpropagation - Weight Updates


Backpropagation - Weight Updates

Vanilla - NeuroShell 2's backpropagation algorithm is not the plain vanilla algorithm that appears in books, but is an algorithm that Ward Systems Group has modified for speed and accuracy.  In this case, vanilla means that a learning rate is applied to the weight updates but a momentum term is not.


Momentum - The weight updates not only include the change dictated by learning rate, but include a portion of the last weight change as well.  Like momentum in physics, a high momentum term will keep the network generally going in the direction it has been going.  In other words, weight fluctuations will tend to be dampened by a high momentum term.  Use high momentum for extremely noisy data, or when you want a high learning rate.


TurboProp - This is a training method that operates much faster in the "batch" mode than our other Backpropagation methods, and has the additional advantage that it is not sensitive to learning rate and momentum.  Training proceeds through an entire epoch before the weights are updated.  It adds all of the weight changes and at the end of an epoch modifies the weights.  The Turboprop method utilizes an independent weight update size for each different weight, rather than the usual method of having a single learning rate and momentum that applies to all weights.  Furthermore, the step sizes are adaptively adjusted as learning progresses. TurboProp is simpler to use than the other methods because the user does not have to set learning rate and momentum.


Note:  If you are using TurboProp for weight updates, there is no point in setting the Calibration interval less than the epoch size (the number of patterns in the training set).  TurboProp is a batch update technique and the weights are only updated every epoch.


Generally speaking if learning rate and momentum are set right, momentum weight updates may work better than TurboProp on speed, but not necessarily on momentum.  However, use TurboProp if you have trouble finding the right values for learning rate and momentum.