SQL Server Data Mining—Customer Dimension Changes
Microsoft .NET Framework, ASP.NET, Visual C# (CSharp, C Sharp, C-Sharp) Developer Training, Visual Studio
| CSharp-Online.NET:Articles |
| Database Articles |
| © 2007 Pearson Education, Inc. |
Customer Dimension Changes
The new customer attributes can easily be added to the cube definition by refreshing the data source view (DSV) in BI Development Studio to pick up the new columns, and then adding these as attributes to the Customer dimension. However, they may not be in the best format for analysis purposes—having the specific date that a user first visited the site is not very illuminating for users of the cube. In fact, they would probably be better served by being able to select customers based on groups that show how long they have been Internet site users (for example, "3–6 months").
We can add this information into the DSV as a named calculation on Customer or add it to the underlying view in the database. You can implement the MonthsSinceFirstInternetVisit named calculation by using the DateDiff function to work out the number of months between the date of the first visit and the current system date:
DateDiff(m, DateFirstInternetVisit, GetDate())
Instead of showing the user a long list of numbers, it would be better to group the numbers together into ranges, such as 1–3 months, 3–6 months, and so on. Although we could do this manually using a lookup table of month ranges, we can take advantage of the Analysis Services discretization feature to do this for us. After adding the MonthsSinceFirstInternetVisit attribute to the Customer dimension, change the DiscretizationMethod property of the attribute to Automatic to allow Analysis Services to decide on the best method of grouping these time periods. If you want to specify the approximate number of groups (or "buckets") that are created, set the DiscretizationBucketCount property, too.
| Tip: Use the Data Load Date Rather Than System Dates |
Although using the GetDate function to work out durations based on the current system date would work, bear in mind that because we will only be loading the data on a weekly basis, the GetDate function should probably be changed to return the latest date that the data was actually loaded. This date could be stored in a data warehouse table that is populated during the ETL process.
|
|

