Modern companies are no longer satisfied with simple reporting – they use explorative data analyzes (so-called “predictive analytics”) to gain more insight into the future behavior of their customers. The latest data mining research conducted by the BI analyst and consultancy firm mayato shows which tools are best used for this.
Data analysis as a CRM core task
In customer relationship management (CRM) the need for customer knowledge has always been great. Therefore, exploratory methods, which also allow predictions about future customer behavior based on existing business data, play an increasingly important role.
Unclear software market
Such information determines which existing customers are served at what cost, which potential customers are addressed in which form and which content, and which former customers are to be recovered in which way. Answering these questions is of strategic importance and affects the success of the entire company.
Data Mining Suites on the test
However, comprehensive information is required for the implementation of the three basic strategies of customer acquisition, connection and recovery, most of which can be obtained from internally existing data on customer history.
Test scenario Customer reactivation
Since these are mostly large data sets, in which the relevant information often hides in extensive "data noise", automated analysis tools are particularly in demand in this area.
The current market provides a wide range of data mining tools: there are currently over 150 different tools available for data analysis.
Depending on the focus, functionality and operating concept, they can be divided into different types: the typology ranges from analysis tools for special purposes ("data mining tools") to functionally broader data mining suites to business intelligence (BI) tools, which also increasingly provide data mining capabilities.
The classic Data Mining Suites are characterized by a comprehensive functionality, so they cover almost every analytical problem. These include forecasting methods such as the prediction of customer churning probability as well as association procedures for shopping cart analysis or segmentation procedures for the calculation of customer segments.
In addition, they provide support throughout the analysis process, for example through a variety of data exploration functions, data preprocessing, comparison of different data mining models, and graphical representation and export of results
Data Mining tools, on the other hand, are mostly specialized in specific business functions (such as controlling), application areas (for example, real-time analysis / real-time data mining), analysis cases (such as forecasting tasks) or a combination thereof. The software implementation of Self Acting Data Mining takes a special place in this category: this highly automated approach is largely without manual data preprocessing and parameterization.
The test field is composed of the following tools:
In the presented data mining study, the three market-leading suites of SAS, StatSoft and IBM SPSS are directly interlinked. Since the data to be analyzed are often kept in existing BI systems, it is also a good idea to carry out the actual data mining analyzes in this environment - in many cases, an attractive start, since no separate tool has to be procured and set up.
The best security-freeware
Wide range of functions and innovative operating concepts
To assess how a classical BI tool is compared to the established data mining suites, SAP BW Data Mining Workbench was included in the test field.
For the study, a practical analysis scenario for customer recovery was developed: A large online mail order company would like to encourage buyers, who do not follow a defined period after a defined period, to buy a new one through targeted actions. In this case, only the customers are to receive a shopping voucher, which is highly likely to assume that they would not have placed an order without this incentive.
Conclusion: Automation Accelerates Predictive Analytics
These customers should be predicted on the basis of the present customer history using a prediction model (churn prediction). In addition to established methods such as decision-making trees, new forecasting methods such as support vector machines (SVM) have been used and tested separately for their suitability and quality of forecast for each tested tool.
Based on the described scenario, the entire data mining process is read from the reading of the data through the data preprocessing up to the interpretation of the results. Such an elaborate test concept provides valuable practical facts and insights in a direct comparison, which are not derived from the product descriptions of the tool manufacturers. The installation and testing of each tool in the same, predefined system environment is also directly comparable.
Guide: 99 Tricks to Windows 7
The evaluation of the tools is based on a large number of individual criteria. This includes both functional aspects (functional scope in the categories data preprocessing, analysis procedures and parameterization, result visualization as well as overall efficiency) as well as the user-friendliness (stability, execution speed, documentation and operation)
The three data mining suites are characterized by very high system stability, fast execution speed and their handling of large amounts of data. In this regard, the transition to 64-bit architectures has made significant progress.
However, the high functional capability leads to an increasing product complexity: this results in comparatively long processing times. As a consequence, some vendors are making several user interfaces available for different user groups.
In this respect, there are significant differences between the data mining tools: the strength of SAS is the embedding of the Enterprise Miner into a powerful BI overall architecture which, besides the analysis, offers flexible data retention possibilities or extensive ETL functions (extracting, transforming, Loading)
However, there is no advantage for users who use other SAS platform tools (for example, the Enterprise Guide or the Data Integration Studio), since each tool has a different operating concept. IBM SPSS has managed to pack a great deal of functionality into a modern, intuitive interface: the modeler offers the best ergonomics and a very good one - the only documentation available in American in the test field.
Guide: Basic equipment for Windows
StatSoft provides the data miner at no extra charge with the full functionality of the statistics package, which includes powerful data preprocessing functions as well as a large number of freely configurable graphics. This gives STATISTICA the best value for money in the test field.
How does the only BI tool compare in the test? The SAP Data Mining Workbench notes, above all, on the confusing and poorly structured interface that it has not experienced any substantial updates for several years: The often-required change between the analysis process designer (APD) and the data mining workbench costs in practice Time and is difficult to understand from the user's point of view. In addition, the data mining functions are very limited in their scope as well as in terms of their parameterization possibilities - interactive decision trees or more recent methods such as support vector machines are not available at all.
"In his article" Future Trends in Data Mining "from 2007, Hans-Peter Kriegel projected the current biggest challenge of the manufacturers of data mining tools Code>
Because the increase in the frequency of use is increasingly questioned about the efficiency of the entire analysis process: How much work, time, and expert knowledge does the analysis of a particular question need? What is the relationship between the quality and the commercial value of the analysis results?
The answers to these questions determine to a decisive extent the data mining tools used. In the end, they specify the question of the speed, the range of functions, the ease of use, and, above all, the degree of automation, which questions can be analyzed in which time and in what quality.
The tool manufacturers have recognized this: SAS offers a separate data mining environment with sensitively limited parametrization options, in addition to the classic model formation with the Rapid Predictive Modeler. Here, in the test, acceptable results could already be achieved with the standard parameters within a short time, which can be refined manually if desired.
StatSoft's contribution to the automation of the process lies in alternatively offered prefabricated data mining recipe for standard forecasting tasks. The selection is used to systematically query the necessary inputs and, if necessary, the necessary preprocessing steps using an assistant. IBM SPSS provides an automatic classifier that automatically calculates multiple predictive models with different procedures and parameter settings in a dialog, and can compare their results.
In addition, vendors are specifically expanding the components of their products, which are particularly important for a quick entry into the analysis. This includes new developments and differentiation of their operating concepts, extensive documentation including online help and practical tutorials as well as innovative approaches to automation with practical presets.
No comments:
Post a Comment