Neural Net for Bob's Deli
Example: GHSAMPLE.XLS, Neural Net worksheet
Solution Type: Continuous Chromosomes
This example is located in a tab labeled Neural Net in the GHSAMPLE.XLS worksheet that was installed in the C:\GENEHUNTER\EXAMPLES\EXCEL subdirectory if you chose the default directory during GeneHunter installation. (If you install the entire AI Trilogy set of programs, the GeneHunter folder is a subfolder of the C:\AI Trilogy folder.)
A Brief Overview of Neural Nets
Artificial neural networks, which we will simply call neural networks, were originally designed to model the pattern recognition capabilities of the brain. They have since been used extensively for many practical predictive and data classification tasks. Two of the most prevalent today are stock price and sales predictions. There are many ways neural networks can be trained, and using a genetic algorithm is one of those ways. In this example, we will train a neural network to predict the number of sandwiches Bob should be prepared to sell each day at lunchtime in his New York Style Deli.
The simplest form of modern neural network is a structure of three layers of "neurons" connected by weighted connections (see Figure 5.3). The first layer, the "input" layer, presents inputs or facts about the problem to the neural network. The second layer, the "hidden" layer, detects features in the patterns presented at the input layer. The connection weights between the input layer and the hidden layer provide the relationships between each input and each feature. The output layer produces the required outputs (predictions in this case). The weighted connections between the hidden layer and the output layer provide the relationships between the features and the output values.
Figure 5.3 Example of artificial neural network.
One job of the neural network training algorithm is to find the best values for the weights between layers, so that whenever a pattern of facts is presented at the input layer, a correct or near correct answer is produced at the output layer. Finding a proper set of weights is no simple task, since the SAME set of weights must work no matter what pattern is presented as input.
When a pattern of facts is presented at the input layer, these facts should all have been scaled into the same range (usually -1 to 1), so that variables with larger measurement units will not provide a greater influence on the neural network. The inputs are multiplied by the weights leading to each hidden neuron. At each hidden neuron, the input-weight products are summed and a non-linear "squashing" function (often the hyperbolic tangent) is applied to the sum in order to squash the larger sum back into the -1 to 1 operating range, sometimes written (-1, 1). The hidden neurons are then multiplied by the weights leading to the outputs, where the squashing function is applied once more. The outputs are a scaled representation of the true outputs the network is supposed to be producing. The predicted output values are produced then by unscaling the network outputs.
Details of the Model
In this example, the genetic algorithm is used to find the connection weights, so the weights are the chromosomes which we usually limit to the range [-1, 1]. The fitness function is the squared error between the correct outputs and the predicted outputs, averaged over an entire set of input patterns (the "training set"), each with its corresponding correct output.
Predicting Sandwich Sales
Bob has decided to use a neural network to predict his sandwich sales so he will have enough ingredients and staff available each day at lunch. He has been keeping records of sales for 16 weeks. Since Bob has an outdoor patio where the sandwiches are usually consumed, he has also been keeping records of the daily temperature and precipitation, because he notices that inclement weather adversely affects sales. He also feels that the day of the week affects sales somehow, and he has definitely noticed that when it is payday at the large company across the street, sandwich consumption goes up. He has therefore decided that the weather, day of the week, and the company payday will be inputs to his network. Figure 5.4 shows the spreadsheet that Bob has created.
Figure 5.4 View of Neural Net worksheet
Building the Network
Bob has represented binary variables (True/False) as 1 and 0. The temperature he has entered in Fahrenheit degrees, and since the temperatures are not already in the range from -1 to 1, he has scaled the temperature into this range (Column E). He did this based upon the minimum and maximum of the temperatures. All of the input variable columns will constitute the patterns for the input layer of Bob's net. Also, the actual number of sandwiches sold during the 16 week period had to be scaled as well (Column L).
Next, Bob had to build the hidden layer. In order to prevent overfitting of the model, Bob kept the number of hidden neurons small (three). The three hidden layers are columns M, N, and O. Each row contains the hidden layer neuron values for the corresponding inputs in that row. If you examine the formulas in the hidden neuron columns, you will notice that they simply apply the tanh (hyperbolic tangent) function to the sum of the products of each input times the input's weight corresponding to the respective hidden column. The weights are at the bottom of the spreadsheet in the range C92:E100. See Figure 5.5.
Figure 5.5 Weight values of links from input values to hidden neurons.
The output produced by the network is in column P. If you examine the formulas in this column, you will see that they simply apply the linear transfer function (in other words, no transformation) to the sum of the products of the hidden neurons times their respective weights. These weights (weights between the hidden and output layer) are found in the range L92:L95.
The weights between the input layer and hidden layer (C92:E100) and the weights between the hidden and output layer (L92:L95) will be the chromosomes that GeneHunter will vary. Of course, a fitness function needs to be defined. In column Q, Bob unscaled the predicted number of sandwiches, again based upon the minimum and maximums of the actual numbers of sandwiches sold. Then column R is the squared error between the predicted number of sandwiches sold (unscaled) and the actual number of sandwiches sold. At the bottom of the spreadsheet (R88), the average of the squared errors is computed, and this will be the fitness function that GeneHunter should minimize.
Training the Network with GeneHunter
Figure 5.6 shows the GeneHunter selection screen after Bob has entered the location of the fitness function and the chromosomes (weights) that should be varied to minimize the fitness function. Note that Bob has restricted the weights to the range -.1 to .1. After a number of evolutions, Bob noticed that the mean squared error was about 31 and not decreasing very fast. Since this mean squared error means that the average number of sandwiches the network is off by is between 5 and 6, Bob decided to stop evolution by pressing the Ctrl and Break keys. The spreadsheet in the example contains the network values after the network was "trained" by the genetic algorithm.
Figure 5.6 View of main GeneHunter screen for Neural Net problem
Predicting for the Future
If Bob wants to predict sandwiches in the future, all he has to do is to insert a new row of data after the last one (row 86) and extend all of the formulas to the new line. After he places the correct inputs in the input columns, he can read the prediction for that day in column Q. Note that he has to use the weather forecast to enter the predicted temperature and whether or not it is expected to rain.
Figure 5.7 Scatter plot for Bob's Network
There are several variations to Bob's network that you can make if you want to experiment further with neural networks. You can change the number of hidden neurons, being careful to modify the weights accordingly. You can vary the weight ranges from as low as [-.1, .1] to [-1.0, 1.0].
Any continuous function (other than tanh(x)) can be used as a squashing function, but 1/(1+exp(-x)) is one of the most popular and is biologically motivated. If you use this one, note that scaling of the output should be in the interval [0,1] instead of [-1, 1]. In the hidden layer, you can use the Gaussian function exp(-x^2). This function doesn't really squash, but it usually works well. In the output layer, you can also try something besides the linear function f(x)=x; tanh or 1/(1+exp(-x)) also work well. You might even try several different types of hidden squashing functions in the same layer. Such nets are very popular in Ward Systems Group's NeuroShell&® 2 neural network program.
You can also add more hidden layers (but don't go beyond 5 layers total). This will require more weight ranges.