Here's a list of the basic steps for creating a neural network application:
1) Decide what you want to predict or classify
Once you're familiar with neural networks, you realize they can help you solve many different problems. You also know from experience there is more than one way to approach a problem. For example, if you're creating a neural network to predict stocks or options, you can predict a number of things: a stock price, an index price, a change in a stock or option price, stock volatility, etc. You could also classify the market as to whether it is going bull or bear, or you could get the network to make a buy, sell, or hold decision. In any of the above choices, you have a number of options about how far ahead you want to predict. The things you want to predict, or to classify, will be the network's outputs.
2) Decide which variables influence the result
Picking which variables to include in your neural network is crucial to making your neural network work properly. These will be the network's inputs.
A neural network expects each type of input to be a continuous variable that represents the strength of the input neuron. Therefore the number input is really the strength of the input. For example, suppose the input is annual rainfall, ranging from 0 to 25 inches. You can see that the input is the "strength" of the rainfall.
But suppose you try to define an input as a US state code from 1 to 50, where 1 is Alabama, 2 is Alaska, and so on to 50 for Wyoming. Because the neural network thinks the input is a strength, it will assume that Alaska is very much like Alabama, and a lot different from Wyoming, which probably does not make any sense.
Suppose you have an application that attempts to predict how an individual will vote in an election, and one input is the state where the voter lives. There are several ways you could handle the input:
a. Order the states by some meaningful ranking criteria, like average education of the inhabitants, political conservatism, etc. Then you can use a code from 1 to 50 where 1 is the most liberal, 50 is the most conservative, or 1 is the least educated, 50 is the most, etc.
b. Make 50 inputs, one for each state. One state variable will be on (coded as a 1) and the others will be off (0).
c. Make 6 inputs, each of which is 0 or 1 and all 6 are the binary representation of the state code, e.g., 010100 = 20.
d. Make 50 predictive nets, one for each state.
Neural nets expect the values for a specific input to always contain the same type of data. It should be obvious that the variable for age should not contain age in some patterns and height or blood pressure in others.
Likewise, if you are feeding in a signal or curve into a number of variables in a time series fashion, it will be more valid if the measurements for the same part of the curve occur in the input stream in the same place in every pattern, or nearly so. This is called normalizing the inputs. If you don't normalize the inputs, you will have to provide the network with many more patterns placing the curves in all the different locations.
The same is true in two dimensions. If you are feeding images to a net, they should always be presented in the same place, say in the center of the matrix of inputs.
Users doing financial applications or other time series problems may use fundamental indicators in addition to raw price data, such as interest rates, price of gold, etc. They may also find it helpful to create additional input variables derived from the raw input data they already have, such as lagged versions or averages of the raw data. Refer to the Applications Tips section, Market Predictions for more information. The NeuroShell 2 Custom Option Market Indicator Package will easily make these extra variables for you.
When deciding on variables, it's usually better to include more variables than not enough. Include all of the ones that seem reasonable because neural networks can find subtle differences in data patterns that even our brains can't discern. If a variable has no influence on the outcome, the neural network will learn to ignore it.
3) Gather the data
Decide how to collect case histories or historical patterns (records) that give the network examples of correct classifications or predictions. Your next step is to decide how much training data (the historical patterns) to use. The most important rule to remember is that you have to train the network with enough data to cover the entire problem domain. A good rule of thumb is the number of training patterns should equal 10 times the number of inputs.
In the stock market example, that does not mean you have to include a data pattern for every possible set of variables. It does mean that you have to include patterns that cover the minimum and maximum values for each variable, as well as a good spread of values in between.
You need to include relevant data. Data from the stock market in the 1970s may not be a good indicator of what will happen in the market in the 1990s.
Include examples of all possible predictions or classifications, not just the result you want. For example, if you are only interested in predicting when the market will rise, you still need to include training patterns for when the market fell or the neural network will be "confused" when presented with indicators for a falling market. In other words, if there are 10 possible results, you need to include an equal number of training patterns for each result.
If you have missing data, NeuroShell 2 gives you some options in the network design stage about how it will treat the missing data. (Refer to the Training and Stop Training Criteria for details.) The best thing you can do to fill in missing data, however, is to make an educated guess about what it should be.
4) Encode your knowledge about the data
Neural networks are like people. The more you simplify the information the easier it is for them to understand the information. If the salient information is the price to earnings ratio, then use that as a variable. Don't feed in two variables, one for the price and the other for the earnings so the neural net has to find what is important.
We created a sample program called "Triangles" for a previous version of NeuroShell which classifies various triangles as right, isosceles, equilateral, scalene, obtuse, and acute, given measurements of the three angles.
Although the problem is more typical of an expert system problem because there are a few easy and distinct rules that are used to classify triangles, the problem is a perfect example of simplifying input data to a neural network.
The inputs to "Triangles" are measurements for each of the three angles of the triangle and the output is a triangle type. The problem was trained with angles in increments of five degrees. When problems were presented in increments of 1 degree, the neural network gave acceptable answers but not to the expected performance level.
It was only after the angles were presented to the network sorted numerically, largest first followed by the second largest, then the third, that the expected performance level was successfully achieved.
Another sample problem called "Credit" from a previous version of NeuroShell benefited by simplifying the data presented to the network. In the first attempt to build the application, the network had inputs of raw salary and debt information, along with the number of years in a job and residence, etc. In the second version of the same problem, rather than using four different salary and debt variables, that information was expressed as a ratio of expenses to income.
The change greatly simplified the data by reducing the number of variables, yet was still able to capture all of the information. The result was an improvement in the sensitivity of the network.
In another application, one customer was attempting to recognize signal information from a medical device based upon curves that served as inputs. He later decided to also include data from preprocessing software that measured peaks and valleys, giving the network yet more information. He could have also fed the network changes in the signal values instead of the values themselves.
In a stock market prediction problem, it's probably better to predict a weekly or monthly price average rather than a daily price because price indicators tend to have a lot of noisy movement in them when viewed on a daily basis.
Performance may also increase by removing inputs that may have no relationship to the output. Refer to the Contribution Factors module to learn more about a NeuroShell 2 feature that may help determine if there is a significant relationship between inputs and outputs. Also refer to PNN Genetic Adaptive Learning, GRNN Genetic Adaptive Learning, and GMDH Learning for more information on selecting inputs. GMDH nets in particular may be a very good method of choosing inputs when you have a large number of inputs. The reason is GMDH nets first look at single inputs, then the best pairs, followed by the best triples, etc. Other nets look at all inputs at once. It may take a while for GMDH to filter through all inputs, but the selection of inputs could be very beneficial. Even if the GMDH net does not produce the best results, the selected inputs may be used in other nets.
If you are dealing with several magnitudes of numbers in one variable, try using the logarithm of the number instead of the number.
There are many other ways to simplify data. Any sort of preprocessing or normalization of data that you can apply may be beneficial.
5) Design a network
A three layer Backpropagation network is probably an effective network for most applications. According to some sources, this type of network is used in 95 percent of the working neural network applications and trains much more quickly than 4 or 5 layer networks. Using Calibration, it will also generalize well.
NeuroShell 2 offers a type of Backpropagation network called a recurrent network. Recurrent networks are excellent for time series data.
If your training data is sparse and you want to separate your training patterns into categories, use a Probabilistic Neural Network (PNN), which is known for its ability to train very quickly and work on sparse data.
Like PNN networks, General Regression Neural Networks are known for the ability to train quickly on sparse data sets. Rather than categorizing data like PNN, however, GRNN applications are able to produce continuous valued outputs. In our tests we found that GRNN responds much better than Backpropagation to many types of problems (but not all). It is especially useful for continuous function approximation.
A Kohonen Self Organizing Map is useful for clustering data. Because it is an unsupervised type of network, all you have to tell the network is the number of categories you desire.
6) Train and Test a Network
How long do you train a network? Until it generalizes well, i.e, until it is able to give the best possible answers for future data!
Most experts agree that a major reason networks fail is that they have been overtrained. The network memorizes the training set but is unable to give a good answer on new data it hasn't seen before.
The solution? Let's consider two types of problems.
The first type of problem is when you train the network on the entire universe of data patterns. You want the network to learn the training patterns very well. This is when you continue to train a network until it eventually ceases to make any progress.
The second type of problem is when the number of training patterns that can be encountered is infinite or at least very large, and the training set is only representative of this huge number of training patterns. This is the type of network that can master the training patterns but fail when presented with new data.
There is a solution called Calibration, however, that we've implemented in NeuroShell 2: Calibration creates an entirely separate set of data patterns called a test set and uses it to evaluate how well the network is predicting or classifying. We generally use a test set that is approximately 10 percent the size of the training set.
Calibration works differently depending upon the type of neural network you are using. For Backpropagation networks, Calibration uses the training and test set data to compute the optimum point to save the network when it is able to generalize well on new data. For PNN and GRNN networks, Calibration is used to determine the optimum smoothing factor.
For further information on using Calibration, refer to Backpropagation - Calibration, GRNN Learning, and PNN Learning for details.
7) Redesign network if necessary
It is often a good idea to maintain a third set of data called the production or verification set. The data set contains patterns that are in neither the training nor test sets. You can compare the answers you know with the answers the network is producing.
If your network is not giving good results you may want to consider the following tips. Try each option one at a time and then let the network learn again.
A. Setting Minimums and Maximums Tightly Around Data
Make sure your minimums and maximums are set fairly tightly around your data. You can use the Define Inputs and Outputs module to automatically set these for you, or you can enter your own values in this module if you want to raise them slightly if there is a possibility that larger values may later be encountered when applying the network to new patterns.
B. Adding More Hidden Neurons
As you add hidden neurons to a Backpropagation neural network, you get "more degrees of freedom" and the network is able to store more complex patterns. The learning time, however, will take longer because there are a lot more computations involved. (There are situations when adding hidden neurons will shorten learning time. If you don't have enough hidden neurons to store all of the complex patterns that appear in data, the network will oscillate and never be able to store these patterns. Adding hidden neurons will allow the network to store the patterns.)
In addition to increasing learning time, there is another hazard in increasing the number of hidden neurons in the network. As you start adding more neurons to the network, you get tighter and tighter data fits. In other words, the "lines" that the neural net is "drawing" through the data points get closer and closer to the data points until the network is actually just memorizing the patterns. When that occurs generalization is then poor for new cases.
We'd like to point out, however, that it hasn't been a problem for most of the applications that we've built with a number of hidden neurons, especially when using Calibration. The bottom line is not how well did the network learn the training set or sample cases, but how well does the network predict answers for the production or verification set that it has never seen before.
C. Change the Learning Rate and Momentum or use TurboProp
When using Backpropagation networks, for example, some problems get better results with low learning rate and momentum. Others do better with high learning rate and momentum. TurboProp is a paradigm that does not require you to set learning rates and momentum at all.
D. Change Pattern Presentation
Often using random presentation of patterns will cause the network to find lower average error factors, indicating that it has found better solutions. The network oscillates more as it learns, however. Other times, the slow but steady learning of rotational presentation is best.
E. Raising or Lowering the Smoothing Factor
For GRNN and PNN networks, you may get better results by raising or lowering the smoothing factor when you apply the network. You can use Calibration to automatically compute the best smoothing factor. Refer to GRNN Learning and PNN Learning for details.
F. Changing Your Variables
The most likely reason your network is not providing good results is that your variables are not the right ones or are not presented in the most appropriate way. In a stock market prediction example you may have overlooked some excellent technical variable. The Market Indicator Package has many to choose from, and indicators can even be made from indicators. You may want to collect some fundamental variables, or even some market sentiment variables. Even your personal evaluation of the day's news might be another predictor you can use. Perhaps you have your own indicators, or you want to use some from other programs. NeuroShell 2 can "stand on the shoulders of giants" and predict even from the opinions of other experts. You can use the Variable Graphs module to identify input variables that match trends of what you are trying to predict (in a stock market prediction system, for example) or to look for correlations using the scatter plot.
G. Network Size
If a network is taking too long to learn, you may want to consider breaking down the problem into smaller problems with a single network dedicated to each smaller portion of the problem.