Keep in mind that data mining is based on a tool kit rather than a fixed routine or process. Specific data mining techniques cited here are merely examples of how the tools are being used by organizations to explore their data in search of trends, correlations, intelligence, and business insight.
Generally speaking, data mining approaches can be categorized as directed – focused on a specific desired result – or undirected as a discovery process. Other explorations might be aimed at sorting or classifying data, such as grouping prospective customers according to business attributes like industry, products, size, and location. A similar objective, outlier or anomaly detection, is an automated method of recognizing real anomalies (rather than simple variability) within a set of data that displays identifiable patterns.
Association
Another interesting goal is association – linking two seemingly unrelated events or activities. A classic story from the early days of analytics and data mining, perhaps fictitious, has a convenience store chain discovering a correlation between sales of beer and diapers. Speculating that harried new fathers who run out late in the evening to get diapers may grab a couple of six-packs while they are there. The stores position the beer and diapers in close proximity and increase beer sales as a result.
Clustering
This approach is aimed at grouping data by similarities rather than pre-defined assumptions. For example, when you mine your customer sales information combined with external consumer credit and demographic data, you may discover that your most profitable customers are from midsize cities.
Much of the time, data mining is pursued in support of prediction or forecasting. The better you understand patterns and behaviors, the better job you can do of forecasting future actions related to causations or correlations.
Regression
One of the mathematical techniques offered in data mining tool kits, regression analysis predicts a number based on historic patterns projected into the future. Various other pattern detection and tracking algorithms provide flexible tools to help users better understand the data and the behavior it represents.
These are just a few of the techniques and tools available in data mining tool kits. The choice of tool or technique is somewhat automated in that the techniques will be applied according to how the question is posed. In earlier times, data mining was referred to as “slicing and dicing” the database, but the practice is more sophisticated now and terms like association, clustering, and regression are commonplace.