SQL Server Data Mining—What Interesting Attributes Can We Track

Microsoft .NET Framework, ASP.NET, Visual C# (CSharp, C Sharp, C-Sharp) Developer Training, Visual Studio


Jump to: navigation, search
CSharp-Online.NET:Articles
Database Articles

SQL Server Data Mining

© 2007 Pearson Education, Inc.

What Interesting Attributes Can We Track?

The first attribute we can add to the data warehouse Customer dimension is one of the easiest but most interesting: an InternetUser flag that indicates whether the customer has created a profile on the site, as shown in Figure 10-3. This is easy to populate and will enable analysts to start to understand the characteristics of people who use the Internet channel versus those who don't.


Image:PracBISS200510-3.jpg
Figure 10-3 Customer dimension


A related InternetPurchaser attribute can be derived by looking at the Sales transactions and flagging every customer who has made any purchases using the Internet channel. All InternetPurchasers will, of course, be InternetUsers, but the reverse is not true because some users will have created a profile but not yet made a purchase online. Although analysts could easily and flexibly get a list of customers who had purchased through the Internet by browsing the Sales cube and selecting the Internet channel and a time period, it is still a good idea to add the InternetPurchaser flag so that it is easy for both analysts and data mining models to distinguish those customers who have ever used the Internet channel from those who haven't.

Other interesting customer attributes are DateFirstInternetVisit, which tells us how long they have been using our site, and DateLatestInternetVisit, which tells us how recently they have visited. Both of these attributes are derived from the underlying Visit fact table, but will be added to the Customer table to make the dimension easy to query. Note that this means we will be updating our customer records much more often, so one way of simplifying the ETL process would be to create a view over Customer and the Visit fact table that returns the maximum date for each customer and is used as the source for the Analysis Services Customer dimension. We can also add equivalent date columns for the date of the first actual online purchase, and the most recent online purchase.


Previous_Page_.gif Next_Page_.gif


Personal tools