NeuroShell 2 includes a very powerful architecture which is called Group Method of Data Handling (GMDH) or polynomial nets. In fact, the GMDH network is not like regular feedforward networks and was not originally represented as a network. The GMDH network is implemented with polynomial terms in the links and a genetic component to decide how many layers are built. The result of training at the output layer can be represented as a polynomial function of all or some of the inputs.
GMDH was invented by Prof. A.G. Ivakhnenko from the Institute of Cybernetics, Ukrainian Academy of Sciences, Kiev (the Ukraine, former USSR). The most detailed English description of both the essence and the applications of GMDH can be found in the book edited by Farlow (refer to References).
Note: Whenever the notation X^2 appears in the following documentation, it refers to X squared. X^3 refers to X cubed, etc.
Unfortunately, even this description does not explain how to apply GMDH successfully. The reason is that GMDH involves many different parameters, and there is no "standard" set of these parameters. The following will explain both GMDH and its implementation in NeuroShell 2.
The main idea behind GMDH is that it is trying to build a function (called a polynomial model) which would behave in such a way that the predicted value of the output would be as close as possible to the actual value of output. For many end users it may be more convenient to have such a model, which is able to make predictions using familiar polynomial formulas that are widely understood . In NeuroShell 2, GMDH is formulated as a neural network architecture, and is called a polynomial network. However, the model that results is a standard polynomial function.
The most common approach to solving such models is to use regression analysis. The first step is to decide the type of polynomial that regression should find.
For example, a good idea is to choose powers of input variables along with their covariants and trivariants as terms of the polynomial:
{x1, x2, x3, ..., x1^2, x2^2, x3^2, ..., x1x2, x1x3, ..., xn1xn, x1x2x3, ...}
The next step is to construct a linear combination of all of the polynomial terms with variable coefficients. The algorithm determines values of these coefficients by minimizing the squared sum (over all samples) of differences between sample outputs and model predictions.
Because descriptions of linear and nonlinear regression can easily be found in any math textbook, this description only includes the basics of the method as it is used in GMDH.
The main problem when utilizing regression is how to choose the set of polynomial terms correctly. How complex should the terms be? For example, if you include in the model polynomials of one variable, what should be the largest degree of the polynomial? Should it be 10, or should the model evaluate terms such as x^13, or maybe limit consideration to terms such as x^5 and lower? GMDH works better than regression by answering these questions short of trying all possible combinations.
Selection Criterion
The decision about the quality of each model must be made using some numeric criterion.
The simplest criterion (a form of which is used in linear regression analysis) is the sum (over all samples) of the squared differences between the actual output and the model’s prediction divided by the sum of the squared actuals. This is called the Normalized Mean Squared Error or Norm. MSE (sometimes called Training Squared Error or TSE in the literature).
However, if you try to use only the Norm. MSE on real data (or even on model data with some noise added), you will see that the Norm. MSE value gets smaller and smaller as long as you add extra terms to the model. The more complex the model, the more exact it is. This is always true if you use only Norm. MSE, which determines the quality of the model by evaluating the same information you have already used to build the model. This results in an "overcomplex" model or model overfit, which means the model does not generalize well because it pays too much attention to noise in the training data. This is similar to overtraining other neural nets.
To avoid this danger, you need to use a more powerful criterion which is based upon information other than that which was used to build the evaluated model. There are several ways to define such criteria, which we call Selection Criteria.
For example, you may compute the squared sum of differences between the known output and model prediction over some other set of experimental data (a test set). In GMDH this criterion is called Regularity. It is the Calibration feature as applied to GMDH. Another way to avoid overfitting is to introduce a penalty for model complexity. This is called the Predicted Squared Error (PSE) criterion introduced by A.R.Barron.
Theoretical considerations show that you should stop increasing model complexity when the Selection Criterion reaches a minimum value. This minimum value is a measure of model reliability.
For more detailed information concerning different types of GMDH selection criteria and their properties, please refer to GMDH Advanced Training Criteria.
The method of searching for the best model based upon testing all possible models is usually called the "combinatorial GMDH algorithm". It is sometimes used for simple problems or when the user has some a priori information about what the polynomial terms should look like. However, it is clear that complete testing for a problem with many inputs and a large set of polynomial terms is practically impossible, because it would take too much time. This is not the GMDH implemented in NeuroShell 2.
MultiLayer GMDH Algorithm (The one used in NeuroShell 2)
In order to reduce computation time, one should reduce the number of polynomial terms (and the number of input variables), which are used to build the models to be evaluated. To do so, one must change from a onestage procedure of model selection to a multilayer procedure. How is this done?
To start with, take the first two input variables and combine them into a simple set of polynomial terms. For example, if the first two input variables are x1 and x2, the set of polynomial terms would be {1, x1, x2, x1*x2} (1 will represent the constant term).
Next check all possible models made from these terms, and choose one which is the best. (Any one of the evaluated models is called a candidate for survival.)
Next take another pair of input variables and repeat the operation, resulting in another candidate for survival, with its own value of the evaluation criterion. Doing the same for each possible pair of n input variables, n * (n  1)/2 candidates for survival are generated, each with its own value of the evaluation criterion.
Then compare these values and choose several candidates for survival which give the best approximation of the output variable. Usually you select a predefined number, F, of the best candidates for survival which are stored in the first layer of the net and are preserved for the next layer. These F that are selected are called “survivors”.
The layer of survivors is used for inputs in building the next layer in the network. The original network inputs in the first layer may also be chosen as inputs to this new layer. The next layer is built with polynomials of this broadened set of inputs. Note that since some inputs are already polynomials, the next layer may contain very complex polynomials.
The layer building GMDH procedure continues as long as the evaluation criteria continues to diminish. The implementation of the GMDH algorithm in NeuroShell 2 checks if this is so and continues or stops training. There may also be other conditions which affect when training is stopped. For more information, refer to GMDH Smart Training Criteria and GMDH Advanced Training Criteria.
The work of this algorithm is similar to the work of a gardener during selection of a new hybrid plant. The gardener sows some seeds, waits for the plants to grow, and selects several plants which most strongly carry on the desired property. Then he collects seeds from the selected plants and sows these seeds again, bringing up a second generation. He then chooses several plants from this generation, collects seeds, sows them again, etc., until he obtains a plant with the desired property.
We realize that the explanation we have given for GMDH is brief, but a complete and detailed description of the algorithm is beyond the scope of this help file. Readers interested in more technical detail should refer to Farlow’s book (listed in References).
