SFR | e-Content Development Program

classification by decision tree induction

Scalable Decision Tree Induction

•Partition the data into subsets and build a decision tree for each subset?

•SLIQ (EDBT’96 — Mehta et al.)

–builds an index for each attribute and only the class list and the current attribute list reside in memory

•SPRINT (VLDB’96 — J. Shafer et al.)

–constructs an attribute list data structure

•PUBLIC (VLDB’98 — Rastogi & Shim)

–integrates tree splitting and tree pruning: stop growing the tree earlier

•RainForest (VLDB’98 — Gehrke, Ramakrishnan & Ganti)

–separates the scalability aspects from the criteria that determine the quality of the tree

–builds an AVC-list (attribute, value, class label)

Gini Index (IBM IntelligentMiner)

Data Cube-Based Decision-Tree Induction

•Integration of generalization with decision-tree induction (Kamber et al’97).

•Classification at primitive concept levels

–E.g., precise temperature, humidity, outlook, etc.

–Low-level concepts, scattered classes, bushy classification-trees

–Semantic interpretation problems.

•Cube-based multi-level classification

–Relevance analysis at multi-levels.

–Information-gain analysis with dimension + level.

Presentation of Classification Results