Tutorial Three - Advanced System Race Handicapping

Tutorial Three - Advanced System Race Handicapping

Top  Previous  Next

 

Neural networks are often used successfully to predict the outcome of a horse race or dog race or for other ranking problems, such as ranking scientific test results or personnel.  Input data for such problems may include data about a race that is the same for all horses (called common statistics, e.g., track conditions) as well as statistics on each horse.  This tutorial example was designed to show the user how to set up data, explode, train the network, apply the network, make predictions, and implode with the Race Handicapping Option in the Advanced System. The problem file named Race is in the set of NeuroShell 2 examples available on the distribution CD, so you can follow along on the computer as you read this tutorial.

 

The Problem

In this example we put ourselves in the place of a man who regularly bets on horse races.  Let us call him Sam Smith.  For quite a long time, he has been carefully recording the results of horse races, together with track conditions and some data about each horse.  Now he wants to create a neural network which will help him to make money placing bets at future races.

 

The Outputs

To decide which horse to back, Sam needs to know the finish place for each horse.  So he will have N output columns, one for each of the N horses taking part in the race.  Each output column will contain a number from 1 to N, denoting the finish place of the corresponding horse.

 

The Inputs

Now, Sam needs to decide which variables need to be considered when making the horse race prediction.  He understands that some data is the same for all horses, but it differs from race to race.  Let us call such data common data.  He decides that, for the first attempt, he will include two common input variables, the track condition (denoted by a number from 1 to 3) and air temperature.

Then he needs to include some data for each of the horses.  He thinks that the most important data for each horse are jockey weight and horse's speed in the last race.  So he decides to make two more input variables for each horse.  The number of horses in each race is 3, so 2 common variables + 2 inputs per horse x 3 horses = a total of 8 input variables for this problem.

 

Setting up data

To use the Race Handicapping Prenetwork module, Sam needs to prepare the pattern file that meets all the necessary requirements.  Since Sam has been using NeuroShell 2 to solve other problems for quite a long time, he had no problem using the Datagrid module to enter his data.  This is the beginning of his file, RACE.PAT (you may also load it from the EXAMPLES directory):

_bm15

Let us make sure that this file meets all the necessary requirements:

 

A. Each row in the file contains all inputs for a race.

Yes, each row contains all the data Sam decided to use as input data for each race.

B. The first column(s) in the row should contain all of the Common Data for each race, followed by input statistics on each horse.  Each horse must have the same number of input statistics and the statistics must be in the same order for each horse.

The first two columns contain the Common data, Track Condition and Temperature.  The next six columns contain input statistics on all horses. Each of the three horses has two input statistics, 1) jockey weight, and 2) horse speed in the last race, arranged in the same order for each horse.

C.  When using the file to train the network, you need to include a column for the finish place for each horse as an actual output on the same row as the inputs.  The value for finish place is a number from 1 to N where N is the total number of horses in the race.  The file should contain N output columns. The finish place for each horse must appear in the same order that the input statistics for each horse were entered.

The last three columns contain values for the three actual outputs, i.e. the finish place for each of the three horses.  The value for each place is a number from 1 to 3.  The finish places appear in the same order that the input statistics, that is, first goes the finish place of horse #1, then the finish place of horse #2, and the last column contains the finish place for horse # 3.

D. The file may contain blank data cells with the exception of cells in the first row. The first row must contain cells of common data plus data values for the maximum number of items that are to be compared.  In other words, the first row must include a maximum number of cells equal to:  the total number of columns with common data plus the number of horses times the number of statistics per horse.

There are 2 inputs of common data, 3 horses in the race, and 2 inputs per horse, so the first row of the file contains 8 data inputs and 3 outputs.  The number of inputs are computed as follows:  [2 + (3 x 2)] = 8.

 

So, now that all the requirements are met, Sam is ready to use the Race Handicapping Prenetwork module.

 

_bm16Race Handicapping Prenetwork Module

 

From the Advanced System screen, Sam double-clicks the Custom Icon, and then the Race Handicapping Package icon.  These actions invoke the Race Handicapping Prenetwork Module.

 

In the left frame of this module screen, he clicks the Number of Common Statistics radio button and he types in 2 as the number of common statistics.  In the right frame, he enters 3 in the field Number of Horses per Race, and he enters 2 in the field Number of Input Statistics per Horse.

 

After that, he selects the Begin Processing (Explode) item from Process menu.  When the operation finishes, Sam wants to look at the exploded file, so he selects Edit Pattern File from File menu.  The exploded pattern file compares two horses at a time:

_bm17

 

The first two rows contain label information which will be used after the file is processed by the network to implode (restore) it back again.

 

The exploded file displays rows of data that compare 2 horses at a time.  The finish place is either a 1 or 0, where 1 denotes a horse that beats the other horse in the comparison.

 

Warning:  Using this module can create a file that is much larger than your original data file since a row (or pattern) that contains statistics on N horses explodes to N*(N-1) rows (or patterns).  We generate permutations of pairs, not just combinations of pairs, so the network won't become sensitive to the order of the two horses.  For example, a row (or pattern) that contains statistics on 8 horses explodes to 56 rows (or patterns), a row that contains statistics on 9 horses explodes to 72 rows, etc.

 

Sam exits the Datagrid, Race Handicapping module, and the Custom module, and he is now back at the Advanced Neural Networks main screen.  Once the Race Handicapping module explodes the data file, creation of the model should continue as for any other network.

 

Note:  We suggest making a copy of your original .PAT and/or .PRO file before exploding the file(s).  You may want to add more races to your file and this is not possible after the file has been exploded.  The first time your .PAT file is exploded, a copy is made with a .OLD extension; but if you process additional files through the network your original file is lost.  We suggest that you make a copy with a .BAK extension so you will not lose your original file.

 

_bm2Define Inputs and Outputs

Next, Sam needs to specify in NeuroShell 2 which of the columns are inputs and which are the actual outputs.  Double click on the Define Inputs/Outputs module.  The module displays all of the column names in the exploded file.

 

Sam marks the first six columns (common data and statistics for two horses) as Inputs, the following two columns (Place 1 and Place 2) as Actual Outputs, and all the rest of the columns (named C9, C10, and C11) are blank or Unused.  These last columns contain no actual data, and they appear in the exploded file because of the first lines that contain information about the initial column names in the file before explosion (you may still see this file, as it has been renamed by the Race Handicapping Prenetwork module to RACE.OLD).

 

Sam computes minimums/maximums, as usual, and he exits the Define Inputs and Outputs module.  (Please refer to Tutorial Example One for a more detailed description of this module.)

 

_bm12_bm18_bm19Then Sam proceeds to Test Set Extract , Design and Learning modules, as usual.  For this example, we shall assume that in the Design module he selects the Standard 3-Layer Backpropagation network architecture with all default settings.

 

_bm6Apply to File

 

Sam wants to see if his network is producing good results.  He double clicks the "Apply to File" icon, and enters the Apply module.  Before he selects Start Processing from the Run Menu, he turns off the check boxes for  "Include actuals in .OUT file" and "Include in .OUT file actuals minus network outputs".

 

Note:  When applying a trained network to a file created with the Race Handicapping module, do not select the check boxes that create extra columns in the file, e.g., Include actuals in .OUT file or Include in .OUT file actuals minus network outputs. The extra columns will interfere with restoring your file to its original condition in the Race Handicapping Postnetwork module.

 

The Apply Module defaults to processing the .PAT file, which is the first set of data that Sam entered.  Each pattern in the exploded pattern file is processed through the trained network and it computes the values of the two outputs.  These values determine which of the two horses is the probable winner.  Statistics which measure the accuracy of the trained network are displayed on the screen.  Please refer to the description of the Apply Backpropagation Network module for detailed explanations of the statistics.

 

The trained network produces an R squared value of .9146 for the first output and the value of .9147 for the second output in the exploded .PAT file.  Sam records the R squared statistic to compare this network with other ones he might create later.

 

_bm7Now Sam realizes that it would be more convenient to view the network results along with input data and with the desired outputs.  (This is also necessary if he wants to view the results for all horses in a race in a single file row.)  So he leaves the Apply module and double-clicks the Attach Output File icon.  The default settings for this module join the RACE.PAT file with the RACE.OUT file.  Sam selects the Attach Files item from the Attach menu.  After attachment is complete, he exits the module.

 

_bm20Race Handicapping Postnetwork Module

 

Now it is time to use the Race Handicapping Postnetwork module to "implode" the file.  Sam double-clicks the Custom icon in the PostNetwork column of the Advanced system, and then he double-clicks the Race Handicapping icon.

 

The look of the Race Handicapping Postnetwork module is the same as that of the Race Handicapping Prenetwork module.  However, this module defaults to all of the correct settings taken from the Prenetwork module, so the only thing Sam needs to do is to select the Begin Implosion item from the Implode menu.

 

The result is a file that includes all of the information for a single race on a single row in the spreadsheet.  The output neuron results for each horse is combined to produce a ranking for each horse.

 

_bm21Sam is now ready to look at the new .OUT file, which now again contains all the information in the same form that was in the initial pattern file before exploding.  He does that, selecting the View Pattern File item from the File menu.  In the Viewer module that pops up, he is able to see only the first 10 rows of the pattern file.  However, if he is not satisfied with this, he can press the Transfer to the Datagrid button, thus invoking the Datagrid module.

 

_bm1The Datagrid is not a commercial grade spreadsheet and is in fact somewhat slow loading large files.  If you have a very fast computer this may be all right; otherwise use your usual spreadsheet.

 

Please refer to Tutorial Example One for the description of how to change NeuroShell 2 so that it always calls your spreadsheet instead of the Datagrid, how to view the data graphically.  At the end of Tutorial Example One you can also find some tips on how to make your predictions better.

 

Making Predictions

 

Making predictions with a trained network can be performed in five simple steps.

 

1.  Check your data to comply with File Requirements

 

A. If you're using the file to make predictions, you do not need to include a finish place for each horse in the output columns. This file should have a .PRO extension and should be used with the Advanced System's Production option.  (The Production mode is turned on in the Options menu of the main Advanced System screen.)

 

Note: If you are applying the network to a file in which you know the results of the race, you may place the actual race results in the actual output columns.  The network's predictions will be placed in columns to the right of the actual output columns.

 

B.  Other than A above, the rest of the file should match the specifications for creating a .PAT file for training purposes.

 

_bm162.  Use the Race Handicapping Prenetwork module

When this module is selected for processing a .PRO file, default information will be displayed for the Common Data and Horse Data.  The default information was entered when the network was trained and should not be changed if it is correct when applying the network to new data.

Use the Explode Menu to begin processing the file or to interrupt processing.

 

Note:  We suggest making a copy of your original .PAT and/or .PRO file before exploding the file(s).  You may want to add more races to your file and this is not possible after the file has been exploded.  The first time your .PAT file is exploded, a copy is made with a .OLD extension; but if you process additional files through the network your original file is lost.  We suggest that you make a copy with a .BAK extension so you will not lose your original file.

 

_bm63.  Use the Apply module

When you use the Apply module, the trained network is used to produce results for two horse comparisons.  When you are in the Production mode, the Apply module defaults to producing results for a .PRO file.

 

Note:  When applying a trained network to a file created with the Race Handicapping module, do not select the check boxes that create extra columns in the file, e.g., Include actuals in .OUT file or Include in .OUT file actuals minus network outputs. The extra columns will interfere with restoring your file to its original condition in the Race Handicapping Postnetwork module.

 

_bm74.  Use the Attach Output File module

After applying the trained network, you may want to view the file which includes the network's prediction for each horse in the race.  You need to use the Attach module to attach network predictions to the input file.

 

_bm205.  Use the Race Handicapping Postnetwork module

Use the Race Handicapping Postnetwork module to "implode" the file.  The result will be a file that includes all of the information for a single race on a single row in the spreadsheet.  The neuron output values in the .OUT file will have been mathematically combined to produce rankings for the horses.

 

Note: If you get an error message when imploding the file, it is probably because you have neglected to do one of the following:

 

1.  Apply the trained network to the exploded file you are trying to implode.

 

2.  Turn off the check boxes in the Apply module that add extra columns into the .OUT (output) file, such as actual network outputs and differences between actual outputs and network outputs.

 

3.  Run the Attach module to put the inputs back into the .OUT file.

 

 

Note:  The algorithms and techniques used in the tutorial example may have changed since the Help File was written.  Refer to Program Changes in the index for any changes, which may include other ways of preprocessing data or training. The data used in this problem was created for example purposes and was not based on real horse race data.  The name Sam Smith is fictional and is not based on any person living or dead.