NYSE Prediction

<< Click to Display Table of Contents >>

Navigation:  Chapter 5 - GeneHunter Examples in Excel >

NYSE Prediction

Previous pageReturn to chapter overviewNext page

Example:  GHSAMPLE.XLS, NYSE and Rules worksheets

Solution Type:  Continuous Chromosomes


This example is located in a tab labeled NYSE in the GHSAMPLE.XLS worksheet that was installed in the C:\GENEHUNTER\EXAMPLES\EXCEL subdirectory if you chose the default directory during GeneHunter installation. (If you install the entire AI Trilogy set of programs, the GeneHunter folder is a subfolder of the C:\AI Trilogy folder.)


One of the exciting aspects of genetic algorithms is the fact that the more imagination you use in creating fitness functions and their corresponding use of variables (chromosomes), the more powerful applications you will be able to develop.  For example, it is not too hard to use GeneHunter to find optimum "rules" to process data.  Ultimately, this technique may someday become the process whereby computers are programmed, and there have even been some early rudimentary attempts at this [Oliver, J., Ref. 15 ].  One of the most interesting applications of rule generation may be in the generation of rules to predict various financial markets.  The example NYSE, which is based on the paper by Yuret, D. and de la Maza [Ref. 13 ], will give you a starting point for your efforts in this area.


The formulation of the problem to GeneHunter is somewhat different than the usual GA formulation.  Each individual in the population represents a complex rule such as:


If the close 5 days ago is greater than the close 1 day ago and if the low 2 days ago is less than the high 7 days ago, then the NYSE index will rise tomorrow.


First, it is necessary to fix the rule syntactic structure to a certain complexity, and in our example, we used rules such as the one above with only one AND conjunction. The key "words" in this rule structure that will become the chromosomes are:  close  5  greater close 1  low  2  less high  7.


We don't need to worry about the "then" part, because it will always be the same:  we are searching for a rule which predicts a rise in the NYSE tomorrow.  We are simulating intraday trading, so we are going to use our rule to decide when to buy, and we assume we will sell after one day.  Therefore, our rule structure will require 10 chromosomes, and we will need a way to represent all of the chromosomes numerically.  We used the following translations:


high = 0

low = 1

close = 2


less than or equal to = 0

greater than = 1


It should be clear that any other rule structure or any other types of variables may be used in the problem of predicting the market.  For example, there are many types of "technical indicators" which might be used instead of the basic open, high, low, close, and volume (in fact we only used high, low, and close).  Your success with this technique may depend upon your savvy in this area.


More on rules in a moment.  First, let's address the fitness function.  An optimum rule is one which works the best over a large number of days (which we will call a training set, borrowing from neural network terminology).  There are many ways in which "best" can be defined, but for our example, we will consider the best rule to be one that maximizes our profit over the training set.  The training set is in columns A through D and contains the date, and the high, low, and the close for that date.  We assume that we will buy whenever the rule predicts a rise (a buy signal) and we will sell at the end of the day whether we have a paper profit or not.  Our net profit after trading this way over the entire trading set is the fitness function, which we will obviously attempt to maximize.  For the sake of simplicity, we will assume that our purchase (100 shares) takes place at the close on the day we are making the prediction, and that the sale takes place at the close of the next day, all without overhead or commissions.



Figure 5.8  View of main dialog for NYSE Prediction problem

In our example, we did not use a large number of days in the training set, but this is for clarity only.  More robust rules will result when much more data is used. Reference [13] discusses using an "out of sample" set of data to validate the rules and prevent overfitting, but we did not implement that feature.  We believe it is valid to use such a method, but to some extent its necessity probably decreases as the size of the training set increases.


The hardest part of creating this example was deciding how to parse and evaluate the rules.  We chose to write a small Visual Basic for Applications program that would perform this task.  We could have written an EXCEL macro or perhaps even found a clever use of the INDEX function for the rule parsing and evaluation.  You will see the VBA program in the worksheet labeled "RULE".  This small program produces a "buy" signal (1) or a "no buy" signal (0) for column F of the spreadsheet based upon whether or not the rule evaluates to true.  This signal is then multiplied by 100 times the value in column G, which is the change between today's close and tomorrow's close.  Therefore, column G is the profit realized from execution of the rule, which is only executed when the signal is 1 (buy).


In other words, there is no trade when the signal is 0 because the difference in close is multiplied by zero.  If the signal is 1, there is either a profit or a loss for the day depending upon whether the NYSE index went up or down.  Cell G15 is the sum over all days in which we could trade, and is therefore the fitness function.  (Note that we could not trade for the first 11 days because we are allowing an 11 day "lookback" in the rules.)


In H6:K6, there are four integer chromosomes ranging from 0 to 2 corresponding to the column chosen (high, low, or close).  In H7:K7, there are four more integer chromosomes representing the number of days back to look in the corresponding chosen column.  The final row of chromosomes, H8:I8, contains the representations for the comparison operators (either <= or >).


The final part of the NYSE GA is the small CalcRule program (written in Visual Basic for Applications) which actually interprets the rule generated and produces the buy or no buy signal in column F.  The source code for this small program is on the tab labeled “Rule”, which is just to the right of the tab labeled NYSE Prediction.




Figure 5.9  View of NYSE worksheet