classification by decision tree induction
Scalable Decision Tree Induction
•Partition the data into subsets and build a decision tree for each subset?
•SLIQ (EDBT’96 — Mehta et al.)
–builds an index for each attribute and only the class list and the current attribute list reside in memory
•SPRINT (VLDB’96 — J. Shafer et al.)
–constructs an attribute list data structure
•PUBLIC (VLDB’98 — Rastogi & Shim)
–integrates tree splitting and tree pruning: stop growing the tree earlier
•RainForest (VLDB’98 — Gehrke, Ramakrishnan & Ganti)
–separates the scalability aspects from the criteria that determine the quality of the tree
–builds an AVC-list (attribute, value, class label)
Gini Index (IBM IntelligentMiner)
Data Cube-Based Decision-Tree Induction
•Integration of generalization with decision-tree induction (Kamber et al’97).
•Classification at primitive concept levels
–E.g., precise temperature, humidity, outlook, etc.
–Low-level concepts, scattered classes, bushy classification-trees
–Semantic interpretation problems.
•Cube-based multi-level classification
–Relevance analysis at multi-levels.
–Information-gain analysis with dimension + level.
Presentation of Classification Results