Which Net Should I Use?

Top  Previous  Next

 

Although the beginner's system three layer backprop network and the three layer backprop network in the advanced system are powerful networks, it would be a major mistake for anyone to use only this type of network.  Many problems respond differently to different types of networks and architectures, and there is no way to tell in advance exactly which network will work best.  The more advanced nets could make a significant difference in the predictive ability of a net in some cases.  However, there are guidelines which can lead you on your way.

 

The most powerful nets in NeuroShell 2 are probably the genetic adaptive GRNN and PNN nets.  Use the former for a continuous valued output (e.g., a predicted value) and use the latter with several outputs which are categories (e.g., a classification problem).  Make sure you select "genetic adaptive" in the Training and Stop Training Criteria, and you will need a representative test set of perhaps 20% of your data.  Be certain that none of the data in your test set is also in your training set, because these nets could overtrain easily.  They also take quite a long time if you have a lot of training data.  We suggest that training data be limited to 4000 patterns (rows).

 

Genetic adaptive nets are also the best way we know of to find the contribution of your input variables, even if you later decide to use another network type.  The sensitivity factors these networks find are better than the ones found by backprop nets.  If you have a huge number of possible input variables, we suggest that you try training sessions with no more than 30 to 50 at a time so the genetic algorithm will have a good chance to find the importance of the variables.

 

Genetic adaptive nets also require no setting of learning rate, etc.  Just use the default distance metric (Euclidean).

 

Ward Nets are our second most powerful nets.  Backprop nets are more global than the local GRNN and PNN nets, meaning they may not pick up small details as well, but consequently could generalize better on noisy data.  The most powerful backprop nets are the Ward nets, and the one with three hidden layer slabs is the default.  For Ward nets, use the default learning rate, momentum, etc. at least at first.

 

Turboprop is an alternative learning method to backprop.  It is very fast and requires no setting of Learning rate or momentum, and it can be used on Ward nets.  For a few problems it works better and is worth trying.

 

Polynomial nets (GMDH) are very good for functions that can be described well with polynomials, like many engineering problems. They are also another very good way to find which inputs are most appropriate.  They should be used whenever it is important to have an understandable formula to represent the model, but don't use the formula that is displayed on the Learning module screen for exact calculations; use the source code generator instead.

 

The other backprop nets with jump connections and recurrent nets are not as powerful as the Ward nets, but they may be worth a try.  But don't be misled by a single test. The initial starting point of a backprop net (as determined by initial random weight distribution) may have a bigger impact on how well the net does than the architecture!  It is safest to try each net type with several random number seeds if you have the time, and average the results.

 

If you do decide to use a three layer net, then we suggest using the Gaussian activation function in the hidden layer.  This is usually the most powerful three layer net.

 

For all backprop nets, including Ward nets and three layer nets, you may want to try using the linear activation function in the output slab if your output is not categories, i.e., it is a continuous value.  If the net starts giving wild or huge errors when you do this, reduce the number of hidden neurons and possibly the learning rate.  Use of the linear activation function in the output layer often gives better accuracy across the entire output range, and is very powerful when combined with a Gaussian in a hidden slab (which Ward nets have).