SQL Server Data Mining—How Often Are Users Visiting the Web Site
Microsoft .NET Framework, ASP.NET, Visual C# (CSharp, C Sharp, C-Sharp) Developer Training, Visual Studio
| CSharp-Online.NET:Articles |
| Database Articles |
| © 2007 Pearson Education, Inc. |
How Often Are Users Visiting the Web Site?
The new fact table in the data warehouse is Visit, which has one record for each completed customer visit to the site, as shown in Figure 10-2. So, if a customer signed on at 3:10 p.m. and then clicked through several pages with the last page hit logged at 3:25 p.m., the e-commerce application's log parser will see the last page hit and create a single record that spans the whole time period.
The measures that we will be tracking are the duration of the visit and the number of requests (or page hits) during the visit. Because we are interested in the date that the visit took place as well as the time, we will use the approach explained in earlier chapters and have separate Date and Time of Day dimensions. We can also include the Referrer Domain that the user came from which helps us to determine which Web sites are sending the most traffic to our site, and the type of browser platform that the customer was using, including browser version and operating system. This dimension is often called User Agent rather than Browser Platform because other software such as search spiders can also visit the site; however, we always use business-friendly names in the data warehouse rather than terms such as User Agent, which probably only makes sense to Web geeks.
If your log parser supports it, one useful dimension that we can add is Visit Result, which has values such as Browsed, Abandoned Transaction, and Completed Transaction. This is somewhat difficult for parsers to derive from the Web logs, however, because they would need to look for specific marker pages in the log, such as a confirmation page when the user completes a transaction.
The e-commerce application's database also includes another table with the actual page hits that took place, so in some ways it seems we are breaking one of the cardinal rules of dimensional modeling—always use the most detailed grain available. By using the summarized Visit table, we are losing the ability to analyze by a Page dimension, which shows which pages the user hit. Although powerful, the Page Hits fact table will inevitably be huge, and we would need a good business case to go to the trouble of managing this volume of data. Also, the kinds of analysis that Page Hits would enable are often already provided directly by e-commerce applications, and in this case don't need to be augmented with the extra information stored in the data warehouse.
One question that arises is whether we can tie the Visit fact back to the Sales Transaction fact. If we could do that, maybe we could show information such as how profitable visits were. It turns out that for SQL Server 2005, it doesn't really matter if you have the information on the same fact record. Because a single cube can contain measure groups for both Sales and Visits, if a user has selected a date range and a customer or grouping of customers, measures such as total visits and total revenue display properly anyway.
In other words, the common dimensions between the two fact tables provide the means to tie the information together, and we don't actually need to link the two fact records. The site activity is tied to the sales transactions by way of the fact that they occurred in the same time interval to the same customer.
|

