other classification methods
Case-Based Reasoning
•Also uses: lazy evaluation + analyze similar instances
•Difference: Instances are not “points in a Euclidean space”
•Example: Water faucet problem in CADET (Sycara et al’92)
•Methodology
–Instances represented by rich symbolic descriptions (e.g., function graphs)
–Multiple retrieved cases may be combined
–Tight coupling between case retrieval, knowledge-based reasoning, and problem solving
•Research issues
–Indexing based on syntactic similarity measure, and when failure, backtracking, and adapting to additional cases
Remarks on Lazy vs. Eager Learning
•Instance-based learning: lazy evaluation
•Decision-tree and Bayesian classification: eager evaluation
•Key differences
–Lazy method may consider query instance xq when deciding how to generalize beyond the training data D
–Eager method cannot since they have already chosen global approximation when seeing the query
•Efficiency: Lazy - less time training but more time predicting
•Accuracy
–Lazy method effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function
–Eager: must commit to a single hypothesis that covers the entire instance space
Genetic Algorithms – Evolutionary Approach
•GA: based on an analogy to biological evolution
•Each rule is represented by a string of bits
•An initial population is created consisting of randomly generated rules
–e.g., IF A1 and Not A2 then C2 can be encoded as 100
•Based on the notion of survival of the fittest, a new population is formed to consists of the fittest rules and their offsprings
•The fitness of a rule is represented by its classification accuracy on a set of training examples
•Offsprings are generated by crossover and mutation
Rough Set Approach
•Rough sets are used to approximately or “roughly” define equivalent classes (applied to discrete-valued attributes)
•A rough set for a given class C is approximated by two sets: a lower approximation (certain to be in C) and an upper approximation (cannot be described as not belonging to C)
•Also used for feature reduction: Finding the minimal subsets (reducts) of attributes (for feature reduction) is NP-hard but a discernibility matrix (that stores differences between attribute values for each pair of samples) is used to reduce the computation intensity
Fuzzy Set Approaches
•Fuzzy logic uses truth values between 0.0 and 1.0 to represent the degree of membership (such as using fuzzy membership graph)
•Attribute values are converted to fuzzy values
–e.g., income is mapped into the discrete categories {low, medium, high} with fuzzy values calculated
•For a given new sample, more than one fuzzy value may apply
•Each applicable rule contributes a vote for membership in the categories
•Typically, the truth values for each predicted category are summed