Classification
Classification is a supervised learning task that involves predicting the category or class label of a given input. This task is foundational to many applications like spam detection, sentiment analysis, fraud detection, and image classification.
Key Concepts
Mathematical Formulation
Given a dataset:
Where:
- \( X_i \) represents feature vectors.
- \( y_i \) is the class label (\( y_i \in \{C_1, C_2, \dots, C_k\} \)).
The objective is to learn a function \( f \) such that:
Where \( \hat{y} \) is the predicted class label.
Types of Classification Algorithms
1. Logistic Regression
Logistic Regression predicts the probability of a binary class using the logistic (sigmoid) function.
Mathematical Formula
For binary classification (\( y \in \{0, 1\} \)):
Decision rule:
2. Decision Tree Classifier
Decision Trees split data based on feature conditions to predict class labels.
Splitting Criterion
Common measures:
- Gini Index:
- Entropy:
Where \( p_i \) is the probability of class \( i \).
3. Random Forest Classifier
Random Forest aggregates predictions from multiple decision trees.
Prediction Formula
For classification:
Where \( T_i(X) \) is the prediction of the \( i \)-th tree.
4. Support Vector Machine (SVM)
SVM finds the hyperplane that best separates the classes with the largest margin.
Mathematical Formulation
Given:
- Data points \( X_i \)
- Labels \( y_i \in \{-1, 1\} \)
Objective:
Subject to:
Kernel Trick
For non-linear data, SVM uses kernel functions:
- Linear: \( K(X, X') = X \cdot X' \)
- Polynomial: \( K(X, X') = (X \cdot X' + c)^d \)
- RBF: \( K(X, X') = e^{-\gamma ||X - X'||^2} \)
5. K-Nearest Neighbors (KNN)
KNN classifies data based on the majority vote of its \( k \)-nearest neighbors.
Decision Rule
Where \( y_{i_j} \) are the labels of the nearest neighbors.
6. Naive Bayes Classifier
Naive Bayes applies Bayes' theorem under the assumption of conditional independence.
Formula
For features \( X = \{x_1, x_2, \dots, x_n\} \): [ P(X|y) = \prod_{i=1}^n P(x_i|y) ]
7. Neural Networks
Neural Networks use layers of interconnected neurons to model complex patterns.
Formula
For a single neuron:
Where:
- \( z \) is the weighted sum.
- \( a \) is the activation output.
- \( \sigma \) is the activation function (e.g., sigmoid, ReLU).
8. Gradient Boosting Classifier
Gradient Boosting builds an additive model by minimizing a loss function.
Update Rule
Where \( h_m(X) \) is the weak learner (decision tree).
Performance Metrics
Confusion Matrix
A confusion matrix summarizes prediction results:
Predicted Positive | Predicted Negative | |
---|---|---|
Actual Positive | True Positive (TP) | False Negative (FN) |
Actual Negative | False Positive (FP) | True Negative (TN) |
Metrics
- Accuracy:
- Precision:
- Recall:
- F1-Score:
- ROC-AUC:
Area under the ROC curve measures the model's ability to distinguish between classes.
Choosing the Right Algorithm
- Linear Relationships: Logistic Regression, SVM (with linear kernel).
- Non-linear Data: Decision Trees, Random Forest, SVM (with RBF kernel).
- Text Data: Naive Bayes, Logistic Regression.
- Large Datasets: Neural Networks, Gradient Boosting.
Conclusion
Classification is a cornerstone of machine learning with algorithms ranging from simple models like Logistic Regression to complex ones like Gradient Boosting. Selecting the right model depends on the data and problem domain.