Use this module to process a data file through a trained neural network to produce the network's classifications or predictions for each pattern in the file. A file of outputs (the .OUT file) is produced. If you include actual values in the file, the module gives you check boxes to include actual values and/or the differences between the actual answer's and the network's answers in the .OUT file. If there is more than one output, the actual values and differences will be displayed for each output. The order of display is actual values, followed by predicted values, followed by differences.
When applying GRNN networks, you need to supply a smoothing factor required by the algorithm that affects the value of the output.
The following example may help to explain what occurs when various smoothing factors are used in applying a trained GRNN network. In this example, one input is used to predict one output. In the following figures which display a graph of 100 input and output patterns, only input patterns 25, 50, and 75 produce an output value of 1.
In Figure 1, the lowest smoothing factor of .05 is used and the output values are very close to either 0 or 1.
In Figure 2, a smoothing factor of .08 is used and the output values are mostly in the range .5 to 1, especially close to the outputs which are supposed to be 1.
In Figure 3, a smoothing factor of .1 is used and the output values are very close to 1. The differences between input and output values are "smoothed" out or lopped off.
For GRNN networks, the smoothing factor must be greater than 0 and can usually range from .01 to 1 with good results. You need to experiment to determine which smoothing factor is most appropriate for your data. Fortunately, no retraining is required to change smoothing factors, because the value is specified when the network is applied.
You can either type in a value for the smoothing factor in the edit box or use the default setting.
Default Setting: The default setting is the smoothing factor that was specified in the Architecture and Parameters module when you designed the network. The default value appears in the edit box when you use the Run Menu to apply the network. If you use Calibration, then the best smoothing factor for your test set was computed during training.
PNN and GRNN are very local algorithms. If you apply the network and the message “Fatal error: Smoothing Factor out of range for this data” is displayed, increase the smoothing factor in the edit box to expand its range of influence (the ability of the network to generalize). This works only when using the iterative version of Calibration, not the genetic adaptive version. This message may also appear when you are applying the network to patterns that are different from the data used to train the network. You should add new patterns to the training set to include this area of the problem domain.
Checking the Compute R-Squared, etc. box causes the following statistical calculations to be made for each output when the network is applied. (You must not check this box unless there are actual outputs already in the file so that actual vs. predicted comparisons can be made.)
R Squared - The coefficient of multiple determination is a statistical indicator usually applied to multiple regression analysis. It compares the accuracy of the model to the accuracy of a trivial benchmark model wherein the prediction is just the mean of all of the samples. A perfect fit would result in an R squared value of 1, a very good fit near 1, and a very poor fit less than 0. If your neural model predictions are worse than you could predict by just using the mean of your sample case outputs, the R squared value will be less than 0.
Do not confuse R squared, the coefficient of multiple determination, with r squared, the coefficient of determination. The latter is usually the one that is found in spreadsheets. See any statistics book for more details. Also note that sometimes the coefficient of multiple determination is called the multiple coefficient of determination, but in any case it refers to a multiple regression fit as opposed to a simple regression fit. Also, do not confuse it with r, the correlation coefficient.
You may also refer to the file RSQUARE.XLS provided in the NeuroShell 2\Examples directory. This spreadsheet explains why R squared is a better determination of model fit than r squared.
Note: R squared is not the ultimate measure of whether or not your net is producing good results, especially for classification nets. You might decide the net is OK by the number of correct classifications. For example, if you have a classification network with two outputs that generate output values of .6 and .4, the R squared value will not be very high.
r squared - This is the square of the correlation coefficient, described later in this section.
Mean Squared Error - This is the mean over all patterns in the file of the square of the actual value minus the predicted value, i.e., the mean of (actual - predicted)2.
Mean Absolute Error - This is the mean over all patterns of the absolute value of the actual minus predicted, i.e., the mean of actual - predicted .
Min Absolute Error - This is the minimum of actual - predicted of all patterns.
Max Absolute Error - This is the maximum of actual - predicted of all patterns.
Correlation Coefficient r - (Pearson’s Linear Correlation Coefficient) This is a statistical measure of the strength of the relationship between the actual vs predicted outputs. The r coefficient can range from -1 to +1. The closer r is to 1, the stronger the positive linear relationship, and the closer r is to -1, the stronger the negative linear relationship. When r is near 0, there is no linear relationship. You can get the same results by using the Correlation Scatter Plot and graphing actual vs predicted outputs. (We don’t believe the linear correlation coefficient is a good measure of the performance of neural network models, but it was included because many customers want to use it. R2 is a much better measure of the closeness of actual and predicted values.)
Percent within 5%, 10%, 20% and 30% and over 30% - These boxes list the percent of network answers that are within the specified percentage of the actual answers used to train the network. If the actual answer is 0, the percent cannot be computed and that pattern is not included in a percentage group. For that reason and rounding, the total computed percentages may not add up to 100.
The statistics computed when GRNN networks are applied to a file may be copied to the Windows clipboard for use in other applications. To copy the statistics, select the Copy Results to Clipboard option from the File Menu. For example, you may want to compare the results of different neural networks. You can copy the result to the clipboard and paste them into a spreadsheet for easy comparison.
Checking the "include actuals in .OUT file" box will cause the actual values to be displayed in the first column followed by the network's predictions or classifications in the .OUT file. (Note that actual values for the outputs must be in the file.) If there is more than one output, the actual values for each output will be displayed, followed by a blank column, followed by the network's predictions or classifications for each output.
Checking the "include in .OUT file actuals minus network outputs" will cause the differences between the actual values minus the network outputs to be displayed. (Note that actual values for the outputs must be in the file.) If there is more than one output, the difference will be displayed for each output. The order of display is actual values, followed by predicted values, followed by differences.
The Patterns classified edit box displays the number of patterns in the file that the network processed.
Note: Do Not check the boxes which add columns to the .OUT file if you used the Race Handicapping Prenetwork Module. If you do, the Race Handicapping Postnetwork Module will not be able to reconstruct the file.
If your data file includes an * in a cell beneath a column labeled A (Actual output) in the Define Inputs/Outputs module, the * will be replaced with a 0 and a prediction will be made in that row when you apply a network. A prediction will not be made in a row if your data file includes an * in a cell beneath a column labeled I (Input). Previous releases of NeuroShell 2 up to Release 2.0 would not apply a trained network to a data row if it contained an * in either an A or I column.
Use the Run Menu to start processing the data file through the network. Also use this menu to interrupt processing.
Use the File Menu to select an alternate pattern file, view the pattern file, view the output file, or copy the results (statistics computed when the network is applied) to the Windows clipboard.
File Note: This module defaults to processing the .PAT file, but you can apply the network to any file that is in the NeuroShell 2 file format (the same as Lotus 1-2-3 .WK1 or Excel 4 .XLS file format) simply by using the File Menu to select a file. The inputs must be in the same columns in the same order as the .PAT file with which the network was trained. This module places the network's classifications or predictions into an .OUT file.