SQL Server Data Mining—Who Is Using the Web Site

Microsoft .NET Framework, ASP.NET, Visual C# (CSharp, C Sharp, C-Sharp) Developer Training, Visual Studio


Jump to: navigation, search
CSharp-Online.NET:Articles
Database Articles

SQL Server Data Mining

© 2007 Pearson Education, Inc.

Who Is Using the Web Site?

The most important dimension is certainly Customer, but data warehouse architects face an interesting challenge when it comes to the Web—we often don't know who the customers are. Most e-commerce sites require users to create an online profile, and if the customer must sign in before placing any orders, we can usually associate the Web site activity after the user signs in with the customer's profile. However, online profiles usually contain little mandatory information (and as many online retailers will testify, the information they contain is often meaningless).

The goal for effective BI from the Web site visit information is to augment the minimal Internet profile information with rich, accurate demographic information. In our solution, the site profile includes an optional loyalty card number that is issued by the physical stores. Because customers build up credits to earn free CDs, this information is well populated and can be used to connect the online behavior from the Web site's customer profile with the data warehouse Customer dimension. (In case you have ever wondered why stores are so keen to hand out loyalty cards, now you know—they are trying to build a good Customer dimension!)

For customers who don't have a loyalty card number and an existing Customer record in the data warehouse, we have two choices: Either we can create new Customer records for each of the unmatched customer profiles with minimal information or we can use a single "Unknown Internet Customer" record that groups all these customer together. Because even the minimal online profile will allow us to track information such as how long they have been using our Web site, we will still be able to achieve some of our segmentation goals such as identifying frequent visitors, and so we will go ahead and create new Customer records for every distinct online profile that doesn't have a loyalty card.

Note that the CustomerKey will be blank for all visits where the user did not sign on but just browsed the site. If the user actually places an order, he must sign on and therefore there will be a customer key, but we will allocate all the other facts to an "Unknown Customer" record. It is important not to just discard these fact records, because even without the customer information, the Visit fact table is a valuable source of information about peak traffic levels on the site.

Alternatively, we could have solved the problem by modeling a separate "Internet Customer" dimension that is only used for this area and not related to other facts such as Sales, and thus avoid creating extra records in our Customer dimension. However, this would mean that we couldn't create a cube with a common Customer dimension that combines measure groups for Internet visit measures with sales and other facts for the business, which is really the central goal of this solution.


Previous_Page_.gif Next_Page_.gif


Personal tools