GMDH Type
Use the mouse to click on the button for either Smart or Advanced GMDH. Different parameter screens are displayed depending upon which type is chosen.
Note: We realize that the explanation we have given for GMDH is brief, but a complete and detailed description of the algorithm is beyond the scope of this help file. Readers interested in more technical detail should refer to Farlow’s book (listed in References).
Also, whenever the notation X^2 appears in the following documentation, it refers to X squared. X^3 refers to X cubed, etc.
It is strongly recommended that you choose Smart GMDH unless you want to become an expert in the use of GMDH. Even if you are an expert, it is recommended that you try Smart GMDH as a starting point. The controls available in Smart GMDH are sufficient to enable GMDH to find solutions to many real world problems.
1. Smart GMDH  This mode minimizes the number of controls to be set by the user and allows the user to describe the desired properties of the constructed model in simple terms.
2. Advanced GMDH  This mode gives expert users maximum freedom in setting training parameters, releasing the full power of GMDH. Because there is no universal method which will give the best results for every problem, using Advanced GMDH may achieve better results than using Smart GMDH. Advanced GMDH is very mathematical in nature and should only be used by those with training in math.
Refer to GMDH Advanced Training Criteria for additional information. Refer to GMDH Overview for details on GMDH terminology.
Managing Smart GMDH
Model Nonlinearity
This control defines the allowed nonlinearity of the candidates for survival at each layer which ultimately affects the nonlinearity of the entire generated model. (Refer to GMDH Overview for details on candidates for survival and survivors.)
Off  only linear models are tested at each layer and so the final model is also linear. This option allows you to quickly test if the problem can be solved (i.e., give satisfactory results) with linear approximation. This is equivalent to linear regression analysis.
Low  tests models with nonlinearity introduced as covariants/trivariants of the variables, but not powers of the input variables of each layer. This option results in models with relatively low nonlinearity. Note that covariants of Survivors may nevertheless result in powers of the initial input variables in the final model.
Medium  tests models with nonlinearity introduced as both covariants/trivariants and powers of the input variables of each layer. This option results in models with medium nonlinearity.
High (default)  tests models with nonlinearity introduced as both covariants and higher powers of the input variables of each layer. This option results in models with relatively high nonlinearity.
Model Diversity
This control defines the maximum number of Survivors which are allowed to pass from the output of each layer to the input of the next one. (Refer to GMDH Overview for details on Survivors.)
Low  allows a relatively small number of Survivors. Choosing this option results in faster training, but usually gives worse results than when you choose a higher model diversity.
Medium (default)  allows more Survivors. This option is usually recommended for most applications. It is a compromise between computation time and the necessity of preserving "genetic material" to create an appropriate model.
High  allows many Survivors. This option sometimes makes it possible to solve a hard problem at the expense of computation time.
Model Complexity
This control determines the allowed length of the formula of a candidate for survival by adjusting the relative penalty for overall model complexity at the output of each layer. Changing this option allows you to decide whether to allow more complex models with a tighter data fit (and maybe better predictions), or to agree to a looser fit for the sake of model simplicity or to avoid overfit. When testing different combinations of GMDH parameters, Model Complexity should be tried first.
Low  allows only candidates with relatively short formulas to survive by introducing a high penalty for overall model complexity. This mode is usually recommended if you want to obtain a simple formula, or if you are trying to avoid overfitting, especially if the number of samples in the training set is small.
Medium (default)  allows candidates with mediumsized formulas to survive by introducing a normal penalty for overall model complexity.
High  allows candidates with longer formulas to survive and introduces a lower penalty value for overall model complexity. Sometimes this mode results in a better model, but usually it overfits the data in the training set.
Model Optimization
Usually the model obtained by GMDH may be optimized by giving up some unnecessary terms at different stages of the algorithm. Since there is no way to determine ahead of time which terms should be removed, GMDH includes several different model optimization options which implement different strategies, trading thoroughness for speed.
Off  this mode is very fast, but it creates extremely complex solutions and usually is not recommended except for a very rough determination of significant variables.
Fast  this mode of optimization is fast, but it may leave some terms which would be removed in the case of a more thorough optimization.
Smart (default)  this mode provides smart optimization which in most cases is an optimal tradeoff between calculation speed and model quality.
Missing Values
See Missing Values for details.
Useful Hints
The recommended order of parameter variation (and the order of applying their corresponding values) for Smart GMDH is listed below:
Most Significant Parameters
Model Complexity: Medium, High, Low
Model Diversity: Medium, High
Setting this control to Low is not recommended, except for when you want to perform a quick test of a very large problem, or if your computer is very slow. The higher the value of Diversity, the better the result, but with longer computation time.
Model Optimization: Smart
You may also try Fast if your problem is quite large and/or your computer is quite slow. Sometimes Fast gives the same or worse results than Smart; but never better. Selecting the Off option makes sense only for a "quick and rough" estimation, because the results will definitely be worse than Fast.
Less Significant Parameters
Data scaling: <<1, 1>> with min/max chosen by mean plus or minus 1 standard deviation (or 3 standard deviations for complex data such as financial data), <<1, 1>> with min/max chosen in the usual way. (Scaling using a standard deviation is done in the Define Inputs and Outputs module.)
Model Nonlinearity: High, Off, Medium, Low
High is strongly recommended for most cases. Selecting the Off option may improve performance when your function is close to a linear combination of many inputs. If this is not the case, selecting off will definitely decrease performance. Medium and Low usually give worse results than High. Try them only as a last resort when you have tested all other combinations for Smart GMDH and you don't want to use Advanced GMDH.
