Skip to content

Decision Tree Learning as a Search Problem

1. Introduction

Decision tree learning (e.g., ID3 algorithm) can be understood as:

A search through the space of all possible decision trees

The goal is to find a tree that: - Accurately classifies training data
- Generalizes well to unseen data


2. Search Strategy and Hypothesis Space

2.1 Hypothesis Space

  • The hypothesis space = all possible decision trees
  • This space is very large, but also complete

๐Ÿ‘‰ Meaning: - Any finite discrete function can be represented


2.2 Search Strategy

ID3 uses:

  • Top-down approach
  • Greedy (hill-climbing) search
  • Simple โ†’ complex model building

2.3 Search Flow (Mermaid)

graph TD
    A[Start with Full Dataset] --> B[Evaluate All Attributes]
    B --> C[Select Best Attribute<br/>Max Information Gain]
    C --> D[Split Dataset into Subsets]
    D --> E{Stopping Condition Met?}
    E -->|No| F[Repeat Recursively on Each Subset]
    F --> B
    E -->|Yes| G[Create Leaf Node]

3. Evaluation Function

Information Gain

Used to select the best attribute:

\[ IG(S, A) = Entropy(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} Entropy(S_v) \]

๐Ÿ‘‰ Higher Information Gain = better split


4. Key Characteristics

4.1 Completeness of Hypothesis Space

  • ID3 can represent any finite discrete-valued function
  • No restriction on model expressiveness

4.2 Single Hypothesis Maintenance

  • Maintains only one tree at a time
  • Does NOT track all possible trees

๐Ÿ‘‰ Contrast:

  • Candidate-Elimination โ†’ keeps multiple hypotheses
  • ID3 โ†’ keeps only the current best tree

4.3 No Backtracking (Greedy Nature)

  • Once a split is chosen โ†’ never changed
  • May lead to:

  • Local optimum โŒ

  • Not global optimum

4.4 Robustness to Noise

  • Uses statistical measures over full dataset
  • Less sensitive to:

  • Individual errors

  • Noisy samples

5. Inductive Bias in Decision Trees

5.1 What is Inductive Bias?

Inductive bias = assumptions used to generalize beyond training data


5.2 Types of Bias

Type Description
Restriction Bias Limits hypothesis space
Preference Bias Prefers some hypotheses over others

5.3 ID3 Uses Preference Bias

  • Does NOT restrict possible trees
  • Instead, it prefers:

๐Ÿ‘‰ Simpler trees over complex ones


6. Bias Behavior in ID3

Preferences:

  • Shorter trees
  • High Information Gain attributes near root

Mermaid Representation

graph TD
    A[All Possible Trees] --> B[Search Strategy]
    B --> C[Prefer Short Trees]
    B --> D[Prefer High IG at Root]
    C --> E[Final Selected Tree]
    D --> E

7. Occamโ€™s Razor Principle

โ€œSimpler models are more likely to be correct.โ€

Why?

  • Fewer simple hypotheses exist
  • Less chance of fitting data by coincidence

Intuition

Model Risk
Simple Tree Better generalization
Complex Tree Overfitting

8. Why Preference Bias is Powerful

Advantage over Restriction Bias

Preference Bias Restriction Bias
Flexible Limited
Can represent true function May exclude it
Guided by search Hard constraints

๐Ÿ‘‰ Decision trees:

  • Explore full space
  • But prioritize simpler solutions

9. Overall Search Process (Mermaid)

graph TD
    A[Start] --> B[All Training Data]
    B --> C[Evaluate Attributes]
    C --> D[Select Best Split]
    D --> E[Partition Data]
    E --> F{Stopping Condition?}
    F -->|No| C
    F -->|Yes| G[Create Leaf Node]

10. Summary

Concept Description
Search Space All possible trees
Strategy Greedy, top-down
Evaluation Information Gain
Bias Type Preference bias
Key Assumption Simpler trees are better

11. Conclusion

Decision tree learning:

  • Searches a complete hypothesis space
  • Uses a greedy strategy
  • Relies on preference bias (simplicity)

It does not limit what it can learn, but assumes that simpler trees are more likely to generalize well.