SQL Server Data Mining—Using Data Mining in Integration Services to Improve Data Quality
Microsoft .NET Framework, ASP.NET, Visual C# (CSharp, C Sharp, C-Sharp) Developer Training, Visual Studio
| CSharp-Online.NET:Articles |
| Database Articles |
| © 2007 Pearson Education, Inc. |
Using Data Mining in Integration Services to Improve Data Quality
One of the major scenarios that we have not looked at in this chapter is the use of data mining in Integration Services. We could create a clustering model against a subset of data that is already in the data warehouse and is known to have clean, correct values, and then query this model in an Integration Services package that loads new data to determine the probability that each new record is valid. Records that are selected as likely "bad data" can be split out during the load process into a separate table for further validation or human checking and correction.
Integration Services also has other data mining features, such as loading data directly into data models within the data flow and specifying samples of data to be used for training models rather than just using all the data.
|

