SQL Server Data Mining—Discriminating Between Similar Clusters

Microsoft .NET Framework, ASP.NET, Visual C# (CSharp, C Sharp, C-Sharp) Developer Training, Visual Studio


Jump to: navigation, search
CSharp-Online.NET:Articles
Database Articles

SQL Server Data Mining

© 2007 Pearson Education, Inc.

Discriminating Between Similar Clusters

If you look at the profiles of Clusters 8 and 9 in BI Development Studio, you will notice that they both have multiple values for the number of months that customers have been Internet visitors and Internet purchasers. This illustrates an important point about the clusters that the algorithm identifies: Every customer in the group does not have to have exactly the same value for every attribute. This is somewhat confusing when you start working with clusters; for example, you might have named a cluster Urban Professionals and then discover that it also contains a customer who lives in the countryside.

The reason for this is that the customer, when you look at all of his or her attributes together, is most similar to the customers who live in urban areas and have professional occupations. So naming a cluster Urban Professionals does not necessarily imply that it contains absolutely no manual laborers who live in the suburbs, but rather gives a high-level shorthand for the predominant combination of attributes in that cluster.

Because the clusters identified are therefore sometimes ambiguous, we need a way of discriminating between similar clusters to find out what exactly makes them different. We can use the Cluster Discrimination view, as shown in Figure 10-8, to select the two clusters we are interested in comparing and get an idea of what the most important differences are.


Image:PracBISS200510-8.jpg
Figure 10-8 Cluster discrimination


We can see in the discrimination view that although the cluster profiles of 8 and 9 look similar, in fact Cluster 9 contains mostly customers who have made a visit and purchase in the past few months, are fairly frequent visitors, and have spent a lot of money with us—we could call this group Frequent Visitors. Cluster 8, on the other hand, contains mostly customers who have not visited the site for many months, although in the past they have spent some money with us. This cluster is probably one that we want to pay careful attention to, because they may now be doing their shopping with a competitor. That is, they may be Defectors.

With the cluster profile and discrimination views, we can understand the clusters well enough to give them meaningful names, so we can now turn our attention to providing this information back to users to enable them to perform analyses on the data using the clusters.


Previous_Page_.gif Next_Page_.gif


Personal tools