Issues regarding classification and prediction
Data Preparation :
Data cleaning
Preprocess data in order to reduce noise and handle missing values
Relevance analysis (feature selection)
Remove the irrelevant or redundant attributes
Data transformation
Generalize and/or normalize data
Evaluating Classification Methods :
•Predictive accuracy
•Speed and scalability
–time to construct the model
–time to use the model
–efficiency in disk-resident databases
•Robustness
–handling noise and missing values
•Interpretability:
–understanding and insight provided by the model
•Goodness of rules
–decision tree size
–compactness of classification rules