How Do I Find the Most Important Variables (Inputs)?

Top  Previous  Next


Certainly in non-linear systems, and to a great extent in linear systems, it is very difficult to pinpoint the exact contribution any given input makes to the total functioning of the system.  Some scientists believe in holding all variables in place but one, then varying that one to see it's effect on the output. The problem with this approach is that the values that the other variables are held to are critical.  Different effects will be observed in the free variable depending upon what values the others are held to, and there are, for all practical purposes, an infinite number of settings the others can be held to.


Some people have tried running a genetic algorithm to pick combinations of inputs, then running a neural network on each combination until a combination is found that produces the best results. There are several difficulties with this approach.  The first is that most network paradigms, like backprop, behave very differently when different starting random weights are applied to the links from the same inputs.  This will be inevitable as you vary combinations, and anyway, for each combination the "starting point" in weight space will be different.  The starting point can be as important as which inputs are used.  The second problem with the approach is that some variables will contribute, but only slightly, and the approach has no way to show you the input "weighting".  The approach either includes a variable or it does not.  The final difficulty is that training that many neural nets takes too long.  If you only train the nets partially, then you probably will have inconclusive and potentially misleading results.


Another way to find the value of inputs is to do an analysis of the weights on a trained backprop network.  Our own Contributions module does this for you.  In Release 3 we have improved this technique substantially over previous releases.  It works well, but not as well as the genetic adaptive approach (see below).  Its major disadvantage is that it does not give much lower values for worthless variables; they may only be modestly less than worthwhile ones.  You may find that it works better on three layer nets with few hidden neurons.


GMDH provides an alternate way of finding the best inputs.  As an internal part of the algorithm, it is constantly examining variables at each step to see if they add to the value of the model or not.  They are included in the model if they do, and left out if they do not.  The training screen shows you which of the included ones are most important and which are less important.  (Variables which appear to add little to predicting the output do not appear at all.)  However, there is no way of "weighting" each included input any more precisely than that.


The final method, the one we believe to be the best, is integral to the functioning of the genetic adaptive GRNN and PNN networks.  As these networks are trained, the genetic algorithm is constantly trying to "smooth" variables (find the Gaussian shapes in the hidden layers - these are a kind of radial basis net) to see the effect of that smoothing on the model.  As variables are smoothed more and the model improves, then these variables evidently have more effect.  If they are smoothed more and the model gets worse, then these variables evidently have lower importance.  When the final best model is done, the smoothings give you a pretty accurate view of the importance of each variable.  The smoothing factor multipliers detail the importance of the variables from 0 to 3.  At 0 they have minimum smoothing and are removed all together.  At 3 they have maximum smoothing, and are therefore the most important ones.  Make your own decisions about what the levels between 0 and 3 signify. You may decide to remove all variables with a smoothing factor below 1, for example, and train again.


One caution is that these techniques become less and less accurate the more inputs that you have in the network.  When you have 10 or less they are usually pretty good.  When you have over 100 or so they will be pretty poor.  Therefore, the best way to decide among a couple of hundred possible inputs is probably to train with no more than 20 at a time, recording the best of each lot.  Then take the "winners" of each set of 20 and combine them in the final network.  A possible disadvantage with this approach is that if there are relationships between two inputs in different lots that give the two a powerful effect together that they do not have alone, then that relationship may not be discovered.