This example called REALTY uses a Kohonen Self Organizing Map to classify real estate for sale. The network looks at a number of single family homes and self classifies them. We selected 3 categories for self categorization in hopes that the network would separate them naturally into Low, Middle, and High price categories based only upon their physical characteristics. Therefore the sale price was not included in the patterns. Such a network might be useful, for example, in checking whether an asking price is reasonable. See if you think the network succeeded in the goal of separating homes by price category. In some cases, the network's predictions did not match the asking price, but in several of these cases the home characteristics may not have supported the price. Some of these homes were in more desirable neighborhoods than others, also possibly accounting for some differences. Location should be an input variable for yet more accurate clustering.
Inputs and Outputs
Inputs are home characteristics such as number of bedrooms, number of other rooms, number of bathrooms, whether the house has an eat-in kitchen, number of square feet, and lot size in acres. There are 3 output categories which we hope will correspond to low, medium, and high price dwellings.
Unlike backpropagation networks, Kohonen Self Organizing Map networks are trained with input variables only. You also need to specify the number of output categories, in this case 3.
The data was entered in the datagrid and the yes/no answers on whether the house had an eat-in kitchen were transformed to 1 and 0 using the Symbol Translate module.
In Kohonen Self Organizing Map networks there are only two layers: an input layer where patterns of N variables are placed, and an output layer which has one neuron for each of K possible categories.
The patterns are presented to the input layer, then propagated to the output layer and evaluated. One output neuron is the "winner," i.e., the weight vector (all the weights) leading to this neuron is closer in N dimensional space to the input pattern than that of any other output neuron.
The network weights are then adjusted during training by bringing this weight vector slightly closer to the input pattern. This process is repeated for all patterns for a number of epochs usually chosen in advance.
This type of network also depends upon adjusting the weights of "neighboring" neurons during training to function properly, otherwise one neuron could end up winning all of the time. To make this adjustment, you need to specify the size of the initial neighborhood.
The neighborhood size is variable, starting off fairly large (sometimes even close to K) and decreasing as training progresses. During the last training events the neighborhood is zero, meaning that only the winning neuron's weights are changed. By that time the learning rate is very small too, and the clusters have been pretty well defined. The subsequent (small) weight changes are only making refinements on the cluster arrangements.
When designing your network, you have to choose between two methods that are used to measure pattern distance. Vanilla or Euclidean distance is the most accurate, but it requires more computation than the Normalized distance metric. If computation time is not a problem, Vanilla usually works better.
Training for this Kohonen network proceeds quite differently than training in a backpropagation network. The number of learning epochs and neighborhood size decrease as training progresses, compared to the increasing numbers you see as backpropagation training progresses.
You can apply the trained network and attach the outputs to your training patterns to see how well the network clustered the data. The K outputs will not be in any particular order, so you have to do some comparison to see if the output columns correspond to the asking prices in our datagrid.
This real estate database was created from actual homes on the market in Frederick, Maryland during the summer of 1992.