Test Set Extract Detail

Test Set Extract Detail

Top 

_bm12Test Set Extract Detail

Use this module to extract a Test Set and/or a Production Set of data from the training patterns.  The Test Set may be used with Calibration, which prevents overtraining networks so they will generalize well on new data.  Calibration is used with Backpropagation, PNN, GRNN, and GMDH networks.  The Production Set may be used to test the network's results with data the network has never "seen" before.  The original file, which is usually the .PAT file, remains unchanged.

 

Extraction Methods

The module offers five different methods for selecting data from a Training Set.  Choose one by clicking on the appropriate button.  All of these options will leave the .PAT file as it was.

 

1.        N percent (Test Set), M percent (Production Set), randomly chosen -  This option will extract N percent of the .PAT file to make a Test Set (.TST) and/or M percent of the .PAT file to make a Production Set (.PRO).  The remainder of the pattern file will become the Training Set (.TRN).  The default setting extracts 20 percent of the .PAT file to create a .TST set.  You may change any of the percentages by typing a new percentage in the edit box.

 

Note: If you use the same random number, the same patterns will be included in the Test and/or Production Set(s).  The random number seed can be changed to cause different patterns to be extracted for the Test and/or Production Set(s).  Different patterns will also be extracted every time you run the extract procedure without starting the Extract Module over again.

 

1.        Every Nth pattern (Test Set), Every Mth pattern (Production Set) -  This option will extract every Nth pattern from the .PAT file to create a Test Set (.TST) and/or every Mth pattern from the .PAT file to create a Production Set (.PRO).  The remainder of the pattern file will become the training file (.TRN).

 

3.        All patterns after N thru M (Test Set), all after M (Production Set) - Use this option if the patterns you want to include in your Test Set appear at the end of the file.  This option will extract from the .PAT file all patterns after the Nth pattern through pattern M to create a Test Set (.TST).  All patterns after pattern M will create a Production Set (.PRO).  The remainder of the pattern file will become the training file (.TRN).

 

4.        Last M patterns (Production Set), N percent (Test Set), randomly chosen - Use this option to extract a Production Set from the end of the file and randomly extract a Test Set from the remainder.  This option will extract from the .PAT file the final M patterns to create a Production Set (.PRO) and will randomly extract N percent from the remaining patterns for a Test Set (.TST).  The remainder of the pattern file will become the training file (.TRN).

 

5.        By row marker - Use this option if the data file has a column that contains strings or numbers which can be used as keys or search strings to extract patterns from a file.  You select one key for the test set, one key for the training set, and possibly a key for a production (.PRO) set.

 

Information Needed for the Selected Extraction Method

Depending upon the chosen extraction method, different edit or selection boxes are displayed:

 

Where N = Type in the value of N in the text box.  See descriptions above for a definition of N.

 

Where M = Type in the value of M in the text box.  See descriptions above for a definition of M.

 

Random Number Seed If you wish to change the random number seed that is used to extract percentages of a file, type in a new number in the text box.  The default setting is 0 and if you use the default setting, the same patterns will be selected for the Test Set each time you run the extraction procedure.  If you change the random number seed (which may range from 0 to 32,767), the patterns extracted for the Test Set will be different.

 

Training Set String Use this edit box to type in the alphanumeric search string or number that the module will search for as a key to extract a Training Set of patterns when the row marker method is used.  All patterns with this value will go into the Training Set.  Spaces may be included in the string.  The default setting uses a "T" to designate the rows in a data file that will be extracted to create a Training Set.

 

Test Set String Use this edit box to type in the alphanumeric search string or number that the module will search for to extract a Test Set of patterns when the row marker method is used.  All patterns with this value will go into the Test Set.  Spaces may be included in the string.  The default setting uses a "P" to designate the rows in a data file that will be extracted to create a Test Set.

 

Production Set String Use this edit box to type in the alphanumeric string or number that the module will search for to extract a Production Set of patterns when the row marker method is used.  All patterns with this value will go into the Production Set.  You may want to create a Production Set in order to test the network's results with data the network has never "seen" before.  Spaces may be included in the string.  The default setting uses a "V" to designate the rows in a data file that will be extracted to create a Production Set.

 

Column to Search A scroll box is displayed which lists the columns in the data file.  Click on the mouse to select the column which contains the search strings for the Training, Test, and Production Sets when the row marker method is used.

 

Use the Extract Menu to separate the files. You may also use this menu to interrupt the extraction process.

 

Use the File Menu if you wish to change to a different file.

 

Note: If you change data in the pattern file, you will probably want to repeat the extract procedure.