In this post, you will come to know about the crisp dm modelling phase (Cross Industry Standard Process for Data Mining), after the third post on Data Preparation.
Selecting the technique for modelling
The first step involves the actual technique for modelling, that you will be using. Well, during the business understanding stage, you may already have picked a tool. Here, you need to select the specific technique for modelling. Therefore, it is necessary to choose the right data modelling tool. Example: Using C5.0 for decision-tree building or using backpropagation for neural network generation. In case multiple techniques are used, you need to perform the task for each technique separately.
Modelling technique: This refers to the documentation of actual technique for modelling, that will be used in the process.
Modelling assumptions: In certain modelling techniques, specific assumptions about the data are made. For instance, the class attribute has to be symbolic, all missing values come with a uniform distribution and so on.
Generating test design
Before you build a model for your data mining project, it is necessary for you to generate a mechanism or process, that you can use to test the validity and quality of the model. For instance, when you consider data classification or other data mining tasks that are supervised, it is necessary for you to separate the data set into test and train sets. Here, you can use the test set to estimate the quality of data and the train set to build the model.
Test design: Here, the desired plan for training, evaluating and testing the models is made. It is necessary to determine how the available dataset will be divided into testing, validating and training datasets.
Build the model
You can create one or more models by running the modelling tool on the dataset that you have created.
Here is a checklist that will help you out to carry out the different processes in this stage:
- Describing characters of the current model that you might find useful in future.
- Adjusting the parameter setting that is used in producing the model.
- Providing a complete description of the crisp model in data mining, along with special features, if any.
- Listing the rules procedures for rule-based models, along with an assessment of the overall accuracy of the model and its coverage.
- In case the model is opaque, you need to create a list of technical information on it, like neutral network topology, sensitivity, accuracy and other behavioural descriptions that have been generated during the modelling process.
- Describing the interpretation and behaviour of the model.
- Stating conclusions about data patterns, if any.
Assessing the model
It is necessary to evaluate the models, based on knowledge in your own domain, data mining success criteria and the intended test design. The success of the application of modelling as well as discovery techniques are to be judged technically. You need to contact the domain experts and business analysts in order to discuss the results of data mining in the context of your business. Only the models are considered in this task, whereas the other results produced while the project is on course are included in the evaluation phase. In this stage, you need to rank the models and evaluate them, according to the criteria for assessment. The business success criteria and business objectives are to be taken into account as far as possible in this stage. In most of the cross-industry standard process for data mining projects, a single technique has to be applied multiple times and other results for data mining are generated with various other techniques.
The checklist provides a detailed guide to the tasks that are to be accomplished at this stage:
- Executing the validation tests and evaluating results, as per the evaluation criteria.
- Make a comparison between results of comparison and interpretation.
- Creating rankings for the results, with regard to the evaluation criteria and success.
- Interpreting the results in terms of business in this stage, as far as possible.
- Seek comments from domain data experts on the models.
- Checking whether the models are readable.
- Checking the effects on the goals of data mining.
- Checking the models against knowledge provided to examine whether the information obtained is new and beneficial.
- Checking the credibility of the obtained results.
- Analysing the potential for the incorporation of each result.
- Evaluating the rules, in case you have a verbal description of the model that has been generated.
- Evaluating the specific features of each of the modelling techniques and find out the reason behind why certain parameter settings and modelling techniques lead to impressive or poor results.
You can approach PGBS for any assistance with crisp model management. In the next post, you will come to know about Phase 5, Evaluation. This stage will focus on the concrete results and evaluation of the outcome.