Alternative to Multiple Regression

Top  Previous  Next

Alternative to Multiple Regression

NeuroShell 2 can handle numeric problems which would otherwise be processed with regression analysis.  The independent variables used in regression become the inputs and the dependent variable becomes the output.  The observations are the sample patterns.



There are several key differences between NeuroShell 2 and multiple regression, however.


1.  Line Equation Not Required

You do not have to give NeuroShell 2 the equation of a line or surface you are trying to fit through the data a priori.  You do not, therefore, specify that the equation is of any particular degree or that the independent variables interact in any particular fashion.  NeuroShell 2 builds the appropriate model for you automatically and considers every interaction between variables.


You cannot get the coefficients of a regular equation for your model with NeuroShell 2 as you can with regression.  The calculations are much more complex than a regular equation or polynomial.  However, this is not true with GMDH since it uses many, many regressions to build this model and will, therefore, build better models than regression.  Refer to GMDH Architecture for details.


2.  Multiple Outputs

NeuroShell 2 can have more than one dependent variable (output).  If you are using Backpropagation and more than one dependent variable, you may not get as accurate or precise results as if you had built separate nets for each output, but that may be outweighed by the convenience of having one net at runtime which makes all "predictions" simultaneously.  When there are multiple outputs, NeuroShell 2 with Backpropagation uses a least squares minimization technique to decide how to apportion its weight adjustment amongst the several outputs.  That is why the NeuroShell 2 error factor during learning is based on the sum over all outputs of the squares of the differences between the network predicted output and the actual output as supplied by you in the sample patterns.


If you use GRNN, the outputs are independent and you will not need different nets for each output. When using GRNN with Calibration, the average squared error over all of the outputs is used to find the appropriate smoothing factor and therefore the predicted output values are not independent.


3.  Tighter Data Fits

We have always been able to get "tighter data fits" with NeuroShell 2 than with regression.  In other words, the "line" (or surface in a multidimensional space) that NeuroShell 2 "draws through" the sample patterns as it builds its neural model can be made to come much closer to the data, unless of course the data is exactly on the equation that regression would "draw" (which is rarely the case).  The figure on the next page illustrates this concept.


This closeness of fit is controlled by two things:


a.  How many hidden neurons you supply.


b.  How low an error factor to which your problem can descend.




The more hidden neurons and the error factor to which you learn, the closer you model will come to the training patterns.  You have to be careful, however, that you do not provide too many hidden neurons or learn too much, because generalization (the ability of the model to do well with new patterns for which is has not been trained) may suffer.  Use Calibration to avoid this.


This ability to give "tighter data fits" than regression has translated into better predictions for many of our user's problems, but not all.  It depends upon the problem, how well you adjust the hidden neurons, and how much you learn.  Some problems thrive with NeuroShell 2 and users get 10 to 25 percent better results.  Unfortunately, adjustment of hidden neurons is an art, not a science, and often trial and error are the only way to find the best number.  Calibration is a powerful tool which will produce the exact amount of training necessary to optimize your network's ability to generalize.


4.  Noisy Data

NeuroShell 2 is able to function quite nicely with a huge number of training patterns, even those with "noisy" or slightly incorrect data in them.  In fact, the more training patterns that are provided, the better (usually).  Our sophisticated users of regression tell us that NeuroShell 2 is much better than regression in this area. If your data is very noisy, use a very low learning rate (.05) and high momentum (.5).