Use this module to process a data file through a trained neural network to produce the network's classifications or predictions for each pattern in the file. A file of outputs (the .OUT file) is produced. If you include actual values in the file, the module gives you check boxes to include actual values and/or the differences between the actual answer's and the network's answers in the .OUT file. The order of display is actual values, followed by predicted values, followed by differences.
Checking the Compute R-Squared, etc. box causes the following statistical calculations to be made for each output when the network is applied. (You must not check this box unless there are actual outputs already in the file so that actual vs. predicted comparisons can be made.)
R Squared - The coefficient of multiple determination is a statistical indicator usually applied to multiple regression analysis. It compares the accuracy of the model to the accuracy of a trivial benchmark model wherein the prediction is just the mean of all of the samples. A perfect fit would result in an R squared value of 1, a very good fit near 1, and a very poor fit less than 0. If your neural model predictions are worse than you could predict by just using the mean of your sample case outputs, the R squared value will be less than 0.
Do not confuse R squared, the coefficient of multiple determination, with r squared, the coefficient of determination. The latter is usually the one that is found in spreadsheets. See any statistics book for more details. Also note that sometimes the coefficient of multiple determination is called the multiple coefficient of determination, but in any case it refers to a multiple regression fit as opposed to a simple regression fit. Also, do not confuse it with r, the correlation coefficient.
You may also refer to the file RSQUARE.XLS provided in the NeuroShell 2\Examples directory. This spreadsheet explains why R squared is a better determination of model fit than r squared.
Note: R squared is not the ultimate measure of whether or not your net is producing good results, especially for classification nets. You might decide the net is OK by the number of correct classifications. For example, if you have a classification network with two outputs that generate output values of .6 and .4, the R squared value will not be very high.
r squared - This is the square of the correlation coefficient, described later in this section.
Mean Squared Error - This is the mean over all patterns in the file of the square of the actual value minus the predicted value, i.e., the mean of (actual - predicted)2.
Mean Absolute Error - This is the mean over all patterns of the absolute value of the actual minus predicted, i.e., the mean of actual - predicted .
Min Absolute Error - This is the minimum of actual - predicted of all patterns.
Max Absolute Error - This is the maximum of actual - predicted of all patterns.
Correlation Coefficient r - (Pearson’s Linear Correlation Coefficient) This is a statistical measure of the strength of the relationship between the actual vs predicted outputs. The r coefficient can range from -1 to +1. The closer r is to 1, the stronger the positive linear relationship, and the closer r is to -1, the stronger the negative linear relationship. When r is near 0, there is no linear relationship. You can get the same results by using the Correlation Scatter Plot and graphing actual vs predicted outputs. (We don’t believe the linear correlation coefficient is a good measure of the performance of neural network models, but it was included because many customers want to use it. R2 is a much better measure of the closeness of actual and predicted values.)
Percent within 5%, 10%, 20% and 30% and over 30% - These boxes list the percent of network answers that are within the specified percentage of the actual answers used to train the network. If the actual answer is 0, the percent cannot be computed and that pattern is not included in a percentage group. For that reason and rounding, the total computed percentages may not add up to 100.
The statistics computed when GMDH networks are applied to a file may be copied to the Windows clipboard for use in other applications. To copy the statistics, select the Copy Results to Clipboard option from the File Menu. For example, you may want to compare the results of different neural networks. You can copy the result to the clipboard and paste them into a spreadsheet for easy comparison.
Checking the "Include actuals in .OUT file" box will cause the actual values to be displayed in the first column followed by the network's predictions or classifications in the .OUT file. (Note that actual values for the outputs must be in the file.) If there is more than one output, the actual values for each output will be displayed, followed by a blank column, followed by the network's predictions or classifications for each output.
Checking the "Include in .OUT file actuals minus network outputs" will cause the differences between the actual values minus the network outputs to be displayed. (Note that actual values for the outputs must be in the file.) If there is more than one output, the difference will be displayed for each output. The order of display is actual values, followed by predicted values, followed by differences.
The Patterns classified edit box displays the number of patterns in the file that the network processed.
Note: Do Not check boxes which add extra columns to the .OUT file if you used the Race Handicapping Prenetwork Module. If you do, the Race Handicapping Postnetwork Module will not be able to reconstruct the file.
If your data file includes an * in a cell beneath a column labeled A (Actual output), the * will be replaced with a 0 and a prediction will be made in that row when you apply a network. A prediction will not be made in a row if your data file includes an * in a cell beneath a column labeled I (Input). (The column labels were specified in the Define Inputs/Outputs module.) Previous releases of NeuroShell 2 up to Release 2.0 would not apply a trained network to a data row if it contained an * in either an A or I column.
Use the Run Menu to start processing the data file through the network. Also use this menu to interrupt processing.
Use the File Menu to select an alternate pattern file, view the pattern file, view the output file, or copy the results (statistics computed when the network is applied) to the Windows clipboard.
File Note: This module defaults to processing the .PAT file, but you can apply the network to any file that is in the NeuroShell 2 file format (the same as Lotus 1-2-3 .WK1 or Excel 4 .XLS file format) simply by using the File Menu to select a file. The inputs must be in the same columns in the same order as the .PAT file with which the network was trained. This module places the network's classifications or predictions into an .OUT file.