Ward Systems Group Logo - Artificial Intelligence, Genetic Algorithm and Neural Network Software   

Web-based User Profiling
Using Artificial Neural Networks

Ryan MacDonald, BCSH.
Acadia University

Daniel L. Silver, PhD.
Acadia University

Abstract. The Internets worldwide growth and acceptance has resulted in a massive E-commerce movement. E-markets are growing so rapidly that companies must now strive to not only have a presence on the Internet, but to create a presence that far exceeds all of its competitors. User profiling is one approach companies have traditionally taken to better understand their customers so that they may adjust their business model accordingly. By coupling this method with technologies such as JavaScript, Applets and Artificial Neural Networks (ANNs), a powerful profiling system for the Internet is introduced. This system can help websites adapt their content and layout based on prior interaction from its users. Sufficient testing was performed to examine the validity of such a system and it proved to provide significant enhancements and opportunity to a website.

1. Introduction

             Methods and systems that help users navigate the web and filter information are few and far between. When searching through the Internet users often feel overwhelmed by the amount of data being returned to them. This is particularly important for E-commerce sites since they could risk the chance that customers will leave their site with a bad impression and not return. We have attempted to combine user profiling and adaptive web interfacing to help companies satisfy their users.
             User profiling has existed for quite sometime in areas such as television, radio, and advertising. However, user profiling on the Internet can go beyond the reaches of its predecessors due to the capabilities a website has of easily attaining information about its users. A website can ask you to register and request information such as your age, sex, likes and dislikes. A website can also keep track of your purchases, which sections of the site you visit, how long you view a page, and other information. Once collected, this information can allow companies to adapt the information content presented to each of their customers such that it meets the customer's interests. These adjustments permit the company to better satisfy its users and can increase the probability of purchases.
             User profiling/modeling can be done in 3 ways; (1) using stereotypes, (2) using surveys/questionnaires, or (3) using a "learned model" [Langley, 2000]. The first two are rather straightforward approaches using information we already know, or information we collect, to then build appropriate profiles. The third one however, using "learned models", is particularly interesting. It entails creating a system that has no knowledge of its users to begin with, but over time, as users interact with the system, it learns from their trends and behaviours and create profiles based on the experience it gains. It is also possible to create these profiles individualistically, on a per-user basis, or collaboratively, collecting all users' data together to form a general profile.
             We sought to build a collaborative system that could "learn" a general user profile and test it's usefulness on a typical E-commerce website. A summary of all our work and research can be found in [MacDonald, 2001].

2. Background

             Our system was created with the use of a prototypical E-commerce portal site and artificial neural network (ANN) software. This section describes the class of ANN we used and the E-commerce website that was developed.

2.1 Artificial Neural Networks

             Artificial Neural Networks (ANNs) are programs designed to simulate the way scientists believe our biological nervous system functions. Similar to a human's brain where neurons are connected together and communicate through interconnecting synapses, ANNs are composed of numerous processing elements, or nodes, that are tied together with weighted connections. The earliest discoveries in Neural Computing go back to the 1940's and 1950's; however it was a renewed interest during the 1980's that brought them to the forefront in a number of different research areas, such as machine learning and applied data mining.
             ANNs are designed in a highly connected layer structure, as demonstrated in Figure 1 below:

Figure 1 - Structure of an Artificial Neural Network

             Here, X1 through X5 represent the input layer while Z1 and Z2 represent the output layer. As an example, if we were trying to predict weather conditions, the inputs could be the day of the week, the season, the temperature, and so on, while the predicted outputs may be whether or not it is going to be sunny and the wind speed for that day. The middle layer, often called the "hidden nodes", provide internal representation for the development of ANN models. For relatively complex problems it is often necessary to compensate by adding a larger number of hidden nodes.
             How does an ANN learn? All the lines shown above in Figure 4 are given weight values that initially are set to small random values. As training examples (previously observed input and output values) are given to the network, the network "learns" by adjusting these weight values to best represent the relationship between the input and output variables. Assuming that there is a relationship between the input and output, if enough training examples are given to the neural network then it will usually have no problem generalizing itself to future examples.
             There are various network architectures available to choose from when building a neural network. The basic one is a back-propagation network where the nodes are structured such as in Figure 1 shown before. The network used in this thesis however is a recurrent network, an example of which is shown in Figure 2.

Figure 2 - A Recurrent Network

             As you can see when comparing this design to the back-propagation one, in the recurrent network there is an extra layer of nodes (Slab 4) that acts as input. Each slab consists of several nodes, as was shown in Figure 4, so Slab 1 here represents the input layer (X1-X5 in Figure 4), Slab 2 the middle layer and Slab 3 the output layer (Z1-Z2). The extra layer is different from the input (Slab 1) layer however because it gets affected by what the middle layer outputs. As training examples are given to the network, the extra layer is being modified and adjusted in accordance with previous examples. Recurrent networks are excellent at learning sequences and are often used for applications such as sales prediction and stock analysis. We will demonstrate that a recurrent network's ability to learn sequences is what we need for our user profiling system.
             The ability to learn in a way that is similar to the human brain makes ANNs a very powerful tool when used properly. Currently, they are being used in many fields such as Data Mining, the Stock Market, Weather prediction and User Profiling.

2.2 Navigate.ca

             Navigate.ca is the E-Commerce website we developed to house the profiling system. Essentially it is a shopping website that has numerous links to sites where products can be purchased. These links are grouped into categories such as jewelry, clothing, office supplies, computer hardware, etc. In total there are 62 categories of links organized into a hierarchy system for easier browsing.

Figure 3 - The entry page to Navigate.ca

             Figure 3 shows the initial start up page for Navigate.ca. Users can find a particular product by working their way through the folder system on the left side of the page. When the final product category is found a group of links to websites that offer that product are provided to the user (see Figure 4).

Figure 4 - Shown are the categories and subcategories leading to women's, everyday clothing. The main portion of the window displays links to various websites that sell the relevant products.

             Essentially Navigate.ca is nothing more than a portal to other websites. The website offers links to a large variety of products but does not concern itself with the end transaction. From a commercial and business standpoint the website generates money by collecting commission and click-thru fees from the companies that appear on the site.
             Navigate.ca, before the addition of the profiling system, was implemented with the use of HTML and JavaScript. The prototype site can be viewed at www.navigate.ca, with the password to login set as "password".

3. A more "intelligent" Navigate.ca

Goals. Our objective is to create a system that allows users to find what they are looking for with greater ease on Navigate.ca through the use of collaborative profiling. The system should be able to learn a profile on its own by keeping track of how past users interacted with the site and then use the generated profile to assist future users in navigating the site. Ideally the entire solution should be as tightly coupled as possible and have a rapid response time.

Method. The primary difficulty that arises when searching through a portal site such as Navigate.ca is locating the specific category you would like to because there are so many categories located in a multitude of folders. Therefore, the final solution that was chosen was to track what categories users visit, in sequence. Those sequences are used as data to help predict where future users are most likely to want to go. So we will use user click-streams as input to then predict what category future users will most likely want to visit next based on the behaviour of past users who followed a similar path through the site.

3.1 Implementation
             The implementation of our solution is broken up into three steps. The first step is to collect the click-streams generated by users and to then transform that data so that it can be used as training examples for an ANN modeling system. Then, the model must be created. For this we will be using a program called NeuroShell 2 by Ward Systems Group Inc [WSG, 2001]. Finally, once the model has been learned we must integrate it into the website so that it can be used to make predictions for new users. Each of these three steps should be tightly coupled so as to make the system easy to update with new models periodically as more training examples are collected.

Figure 5 - The 3 steps of implementation

3.1.1 Data Collection and Preparation
             The first step in building neural network models is collecting a set of training examples. For our particular scenario these training examples will be users' click-streams through the site. First each category was given a numeric identifier from 0-61 (e.g. baby stuff=0, books=1, …). In order to track where a user goes we've created a cookie that will keep track of every category a user visits. All categories are similar in that each has a webpage that consists of all the links that lead to websites that sell products within that category. When accessed, each of these pages writes its particular identifier to the cookie. After a visit to Navigate.ca, one particular user's click stream could look something like:
                                       12 0 18 28 30 2 6 48 54 14 1 23 … etc.
             From the stream we must create training examples that can be used to train our ANN. As stated in the "Background" section, recurrent networks are excellent at learning sequences. Since essentially the click-streams that we are collecting can be thought of as sequences of paths through the site, this network architecture was an excellent match for our system. Under a recurrent ANN the training examples would simply be one input and one output:
                                       (12 0), (0 18), (18 28), (28 30), …
             Before we can pass these examples to the neural network, more data preparation has to be done. It is important to note that since our categories are nominal values they cannot simply be placed on a scale from 0-61 if our ANN is going to properly learn them. These values must therefore be transformed from numeric values ranging from 0-61, to individual discrete variables that can be more easily learned by our network. Rather than use the categories actual number, we represent it by a series of 61 '0's, with only a single '1' appearing in the nth place, where n-1 is the category number (keep in mind we are starting from 0). So, for instance, the category number 5 would be represented as:
                                       0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0 0
             Therefore, the click-stream is first gathered into its pairings of 2 category numbers (input and output) and those numbers are then changed to the series of '0/1' representation. The resulting binary representation is used to train the neural network.

3.1.2 Training The Artificial Neural Network
             Training the ANN will be done with an off-the-shelf product called Neuroshell 2. This software allows users to specify the input and outputs, and the network architecture for the model. The NeuroShell 2 package provided us with all the flexibility we needed to construct a model for our specific requirements in an easy to use interface.
             With our training examples created as mentioned in the previous section, we then extract a training set, test set and production set. The training set is what will be used to actually train the neural network, while the test set allows us to verify that the model being generated will perform well not only on the set it is being trained with, but for all values from our data. The production set is used at the end as a final test of how accurately the model actually represents all the data that we have.
             Next we specify the architecture we would like to use. As mentioned earlier we have decided to use a recurrent network for our scenario. The NeuroShell 2 program sets the number of hidden nodes to what it feels is the most appropriate based on its knowledge of our network to date (number of inputs/outputs and architecture). Initially we experimented with other numbers of hidden nodes but in general the default value (of 84) provided by NeuroShell 2 worked well. Thus our final network architecture consisted of 62 input and output nodes, and 84 nodes for our two middle layers.
             Training continues until the lowest average level of error is reached. The weights that produced the lowest average error on the test set are kept. The architecture and weights make up the new "learned" model. If we so choose we can then test this model against the production set that we created earlier to ensure that it in fact does perform well on our data.
             Next we need to generate a representation of the model that can be used by Navigate.ca software. NeuroShell 2 provides a feature that will allow us to do this by generating software that consists of mathematical functions representative of our "learned" model. The code accepts an input array with a length of 62 and then outputs an array also with a length of 62. Keep in mind that for a website that generates a lot of traffic this process of creating the model would only be done periodically, perhaps once a day, week or month. Basically we would re-train the network from time to time using our ever increasing amount of training examples in order for it to remain accurate in depicting how the website is currently being used.

3.1.3 System Integration
             The profile model must be embedded within our website so that we can use it when future visitors come to the site.
             For future visitors, when a category's link page is viewed we need to pass that category's identifying number to our "learned" model and then display to the user a link to the category the model feels the visitor is most likely to want to see next. To perform this task we use a Java applet that displays on the right side of all our category pages. The applet's job is to: (1) identify which category is currently being viewed; (2) convert that category into suitable input (0 0 0 0 1 0 0 … 0 0, etc.); (3) pass that input to a Java version of the model code produced by NeuroShell; and (4) based on the output of our model, display the top 3 category links our model thinks the user will be interested in seeing (note: the top 3 will be displayed in order to increase the chances that one of them is useful to the user).
             The first 2 parts are easily done. When the applet is called on each category page, we simply pass along with it a parameter that identifies which page is currently being viewed. The category number is converted to an array of '0's with a single '1' in the right position to create the desired format for our input to the predictive model. After several calculations an output array (of size 62) is created.

Figure 6 - What happens when a user visits the
new, "intelligent" Navigate.ca

             Finally the applet code must determine the highest 3 outputs. The associated category links for the "top 3" are the ones we want the user to be provided with in hopes that they will spark interest or help users find their way through the site. When a user now checks out a category, the applet will be displayed on the right showing links to the top 3 categories that our network model has deemed the user is likely to want to see next.

Figure 7 - The applet suggests other categories for the user to go to

             We now have a system that can track where users go, build a model or profile based on that information and use that model to suggest categories to new users who come to the site.

4. Testing

             An important step in creating any new software system is proper testing. While for many computer-based ideas this often means ensuring there are no errors within the code and that everything runs properly, it is also important to test a system for validity and usefulness. Navigate.ca's new profiling system was tested in three different ways.

4.1 Mathematical testing of the neural network's design
             A test was performed to ensure that the recurrent network that was used to perform user profiling could in fact develop models for complex but deterministic functions. Because the network was so large (62 input and output nodes), there was concern that it might be impossible to create a model that accurately portrays the data. To test this we created a series of mathematical functions to produce sample data. We started with a simple function that created a straightforward click stream of 0, 1, 2, 3, …, 60, 61 and verified that the model would pick up on the sequence. We then progressed to more complex functions. All functions are "mod 62" to remain similar to our application which consists of category identifiers labeled 0-61. We wanted to reflect the same scenario with our functions.
             We rated the network's performance by taking the average R2 value (coefficient of determination) for each of our outputs. The R2 values represent how well the network has been able to learn the data and draw associations between input and output. An R2 value close to 1 means that a strong relationship has been identified between input and output, while a value closer to 0 means that the model has not been able to generalize to the data very well.

First function: x = (x + 1) mod 62
Sample: 0, 1, 2, 3, 4, 5, …

Second function: x = ((x + 3) * 7) mod 62
Sample: 43, 13, 51, 7, 9, 23, 59, 1, 29, 39, 47, 41, 61, 15, 3, 43, 13, 51, 7, 9, 23, …

Third function: y = 3 + y
                      x = ((x + 1) * y) mod 62
Sample: 17, 3, 41, 51, 27, 37, 31, 57, 13, 1, 7, 49, 17, 31, 47, 59, 21, 33, 51, 11, 25, ...


First Function
Second Function
Third Function
Average R2
Table 1 - Results from mathematical function tests

4.2 User Accuracy and Effectiveness
             The most revealing and important test was to simulate how the system would actually be used and compare this implementation to another program that simply provided the user with random links to other categories. The test was done in 2 stages. First, 10 participants were asked to interact with the system and were provided with 4 separate shopping scenarios. The scenarios ranged from shopping for an upcoming trip to preparing for a wedding. For this part of the test the applet was disabled so that no categories were being recommended to the user on the right side. As the participants shopped the system kept track of how these 10 participants navigated through the site and once they were all done a model was built based on the data collected. The next stage involved the users coming back to shop again, but this time the applet offered links on the right side of the site. Each participant was given a separate set of 4 scenarios (different than the 4 they were originally given) and as they progressed through each one they were asked to rate the category links that were being provided to them on the right side of the site. They rated each link as being either useless (0), somewhat useful (1), or very useful (2). The users were unaware that 2 of the scenarios provided links using the model that had been created, while the 2 others provided a random set of 3 links.

0 - Useless
1 - Somewhat
Useful 2 - Very Useful
User Model
Table 2 - Detailed breakdown of the users' ratings for the category
links provided by the two different systems.

  At least 1 "Very Useful" in the top 3 shown At least 1 "Somewhat Useful" or better, in top 3
Random 15.4% 63.5%
User Model 49.2%
Table 3 - Number of times there was 1 or more "Very Useful" or 1 or more
"Somewhat Useful" or better link in the top 3

4.3 Usefulness
             While the prior test verified whether or not the system could provide users with helpful links to categories, this test simply wanted to check whether or not it was worth the trouble. In other words, if users were not told explicitly to look at or use the links, would they use them? Using the same profiling model that had been created earlier, 5 new test subjects were given 4 of the scenarios that were mentioned earlier. We observed the number of times the users used the links the profiling system recommended to them.


Tester #1
Tester #2
Tester #3
Tester #4
Tester #5
Table 4 - Number of times each test subject used one of the provided links

5. Discussion of Results

             The mathematical tests ensured that the recurrent network was capable of producing sufficiently accurate models for deterministic functions. For the functions that were relatively simple the neural network performed extremely well. The third function was the most revealing. The numbers that it generated resembled closely what we expected our user data to be like, in that there was no clear repetition or sequence despite the existence of various patterns. Even for this more difficult function the ANN performed adequately with an average R2 = 0.773.
             Examining the user testing results would seem to indicate that the User Profiling system performs relatively well. We were surprised by the difference between the random system and the model driven one. Perhaps the most telling result is that 49% of the set of 3 links generated by the model contained at least one "very useful" link in them, as compared to only 15% of the links generated by the random system. It is also worth noting that less than 7% of the time the users were given 3 model links that they felt were all useless, as compared to about 36% of the time for the random links.
             While these results appear favourable, there were various factors that may have caused the system to perform better in the tests than it should have. For instance, the shopping scenarios created for testing did not generally cover all the categories. This put the random system (that chose from all 62 links) at a potential disadvantage. Also, the "books" category was rather popular when the test subjects went through most of the shopping scenarios. As a result, the model that was generated output "books" as one of the top 3 links almost every single time. Since "books" is a category that can be useful in so many different situations, this trend resulted in a good deal of the "very useful" and "somewhat useful" ratings the model driven system received. Nonetheless, the profiling system performed considerably better than the random system for every single user we tested it on.
             The second user test determined if people would actually use the links if they were not told anything about them. On average each user used one of the model-generated links 3.2 times per sitting (four shopping scenarios). However, two of the testers did not use any of the links at all, which lowered this average but also raised the issue of why did they not use them. Perhaps it is due to the fact that the suggested links do not stand out enough if the user is concentrating on navigating through the site using the folder system located on the left side. The testers that did take notice of the links, however, made ample use of them being there and seemed to benefit by their presence.

6. Summary and Future Work

             As the Internet continues to grow and become more commercialised it will become increasingly important to improve its manageability. Already there are various tools and techniques being used to do this and we have examined a simple example of one of the more recent developments, user profiling. This paper presents the results of a simple approach to profiling users. The results suggest that it was indeed capable of predicting a satisfactory amount of "intelligent" links to its users. The profiling system enhanced the website and established the groundwork for future improvements and ideas. Increasingly sophisticated profiling systems, data mining tools and other personalisation methods will alter the way we interact with the Internet in the years to come. Hopefully these advancements will result in easier and friendlier to use systems that provide the right information, at the right time, to the right person.
             The profiling system provides a basis on which to expand. Its capabilities of learning a general model that represents many users, as a whole, is adequate, however there is more that could be done. Most importantly profiling could be tailored to one individual's likes and dislikes. While a general collaborative model is good for new or infrequent users, as visitors to the site interact with it more and more, ideally the site should be able to tailor itself to each individual. There are certain similarities in the way users behave, but each individual has their own set of interests and in order to achieve better performance, the system should reflect those individualities.
             Profiling improvements could also be achieved by collecting additional data on each user, such as the shipping capabilities of the links that are clicked on or currency preferences. With today's technology and with the proper tools, it is even possible to track what a user is looking at. Experimentation is underway in our lab on the use of eye-tracking technology as a source of user profiling information. Advancements in technology such as this, as well as an unbounded limit of new ideas being developed each day offer tremendous opportunities for user profiling systems and the Internet in general.


[Langley, 1995] Pat Langley, "User Modeling and Adaptive Interfaces", Seventeenth National Conference on Artificial Intelligence, Daimler Chrysler Research and Technology Centre, 2000.

[MacDonald, 2001] Ryan D. MacDonald, "Web-based User Profiling Using Artificial Neural Networks", Honours Thesis, Acadia University, 2001.

[WSG, 2001] Ward Systems Group Inc., NeuroShell2, www.wardsystems.com, Frederick, MD, 2001.

You can contact Ryan at 035316m@acadiau.ca

Ward Systems Group, Inc.
Executive Park West
5 Hillcrest Drive
Frederick, MD 21703

Email: sales@wardsystems.com
Copyright 1997-2007 Ward Systems Group, Inc. All rights reserved.
Copyright Information
| Privacy Statement