Decision Tree Learning as a Search Problem
1. Introduction
Decision tree learning (e.g., ID3 algorithm) can be understood as:
A search through the space of all possible decision trees
The goal is to find a tree that:
- Accurately classifies training data
- Generalizes well to unseen data
2. Search Strategy and Hypothesis Space
2.1 Hypothesis Space
- The hypothesis space = all possible decision trees
- This space is very large, but also complete
๐ Meaning: - Any finite discrete function can be represented
2.2 Search Strategy
ID3 uses:
- Top-down approach
- Greedy (hill-climbing) search
- Simple โ complex model building
2.3 Search Flow (Mermaid)
graph TD
A[Start with Full Dataset] --> B[Evaluate All Attributes]
B --> C[Select Best Attribute<br/>Max Information Gain]
C --> D[Split Dataset into Subsets]
D --> E{Stopping Condition Met?}
E -->|No| F[Repeat Recursively on Each Subset]
F --> B
E -->|Yes| G[Create Leaf Node]
3. Evaluation Function
Information Gain
Used to select the best attribute:
๐ Higher Information Gain = better split
4. Key Characteristics
4.1 Completeness of Hypothesis Space
- ID3 can represent any finite discrete-valued function
- No restriction on model expressiveness
4.2 Single Hypothesis Maintenance
- Maintains only one tree at a time
- Does NOT track all possible trees
๐ Contrast:
- Candidate-Elimination โ keeps multiple hypotheses
- ID3 โ keeps only the current best tree
4.3 No Backtracking (Greedy Nature)
- Once a split is chosen โ never changed
-
May lead to:
-
Local optimum โ
- Not global optimum
4.4 Robustness to Noise
- Uses statistical measures over full dataset
-
Less sensitive to:
-
Individual errors
- Noisy samples
5. Inductive Bias in Decision Trees
5.1 What is Inductive Bias?
Inductive bias = assumptions used to generalize beyond training data
5.2 Types of Bias
| Type | Description |
|---|---|
| Restriction Bias | Limits hypothesis space |
| Preference Bias | Prefers some hypotheses over others |
5.3 ID3 Uses Preference Bias
- Does NOT restrict possible trees
- Instead, it prefers:
๐ Simpler trees over complex ones
6. Bias Behavior in ID3
Preferences:
- Shorter trees
- High Information Gain attributes near root
Mermaid Representation
graph TD
A[All Possible Trees] --> B[Search Strategy]
B --> C[Prefer Short Trees]
B --> D[Prefer High IG at Root]
C --> E[Final Selected Tree]
D --> E
7. Occamโs Razor Principle
โSimpler models are more likely to be correct.โ
Why?
- Fewer simple hypotheses exist
- Less chance of fitting data by coincidence
Intuition
| Model | Risk |
|---|---|
| Simple Tree | Better generalization |
| Complex Tree | Overfitting |
8. Why Preference Bias is Powerful
Advantage over Restriction Bias
| Preference Bias | Restriction Bias |
|---|---|
| Flexible | Limited |
| Can represent true function | May exclude it |
| Guided by search | Hard constraints |
๐ Decision trees:
- Explore full space
- But prioritize simpler solutions
9. Overall Search Process (Mermaid)
graph TD
A[Start] --> B[All Training Data]
B --> C[Evaluate Attributes]
C --> D[Select Best Split]
D --> E[Partition Data]
E --> F{Stopping Condition?}
F -->|No| C
F -->|Yes| G[Create Leaf Node]
10. Summary
| Concept | Description |
|---|---|
| Search Space | All possible trees |
| Strategy | Greedy, top-down |
| Evaluation | Information Gain |
| Bias Type | Preference bias |
| Key Assumption | Simpler trees are better |
11. Conclusion
Decision tree learning:
- Searches a complete hypothesis space
- Uses a greedy strategy
- Relies on preference bias (simplicity)
It does not limit what it can learn, but assumes that simpler trees are more likely to generalize well.