Classification

Classification is a supervised learning task that involves predicting the category or class label of a given input. This task is foundational to many applications like spam detection, sentiment analysis, fraud detection, and image classification.

Key Concepts

Mathematical Formulation

Given a dataset:

\[ D = \{(X_1, y_1), (X_2, y_2), \dots, (X_n, y_n)\} \]

Where:

\( X_i \) represents feature vectors.
\( y_i \) is the class label (\( y_i \in \{C_1, C_2, \dots, C_k\} \)).

The objective is to learn a function \( f \) such that:

\[ f(X) = \hat{y} \]

Where \( \hat{y} \) is the predicted class label.

Types of Classification Algorithms

1. Logistic Regression

Logistic Regression predicts the probability of a binary class using the logistic (sigmoid) function.

Mathematical Formula

For binary classification (\( y \in \{0, 1\} \)):

\[ P(y=1|X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}} \]

Decision rule:

\[ \hat{y} = \begin{cases} 1 & \text{if } P(y=1|X) \geq 0.5 \\ 0 & \text{otherwise} \end{cases} \]

2. Decision Tree Classifier

Decision Trees split data based on feature conditions to predict class labels.

Splitting Criterion

Common measures:

Gini Index:

\[ G = 1 - \sum_{i=1}^k p_i^2 \]

Entropy:

\[ H = -\sum_{i=1}^k p_i \log_2 p_i \]

Where \( p_i \) is the probability of class \( i \).

3. Random Forest Classifier

Random Forest aggregates predictions from multiple decision trees.

Prediction Formula

For classification:

\[ \hat{y} = \text{Mode}(\{T_1(X), T_2(X), \dots, T_m(X)\}) \]

Where \( T_i(X) \) is the prediction of the \( i \)-th tree.

4. Support Vector Machine (SVM)

SVM finds the hyperplane that best separates the classes with the largest margin.

Mathematical Formulation

Given:

Data points \( X_i \)
Labels \( y_i \in \{-1, 1\} \)

Objective:

\[ \text{Maximize } \frac{2}{||w||} \]

Subject to:

\[ y_i (w \cdot X_i + b) \geq 1 \]

Kernel Trick

For non-linear data, SVM uses kernel functions:

Linear: \( K(X, X') = X \cdot X' \)
Polynomial: \( K(X, X') = (X \cdot X' + c)^d \)
RBF: \( K(X, X') = e^{-\gamma ||X - X'||^2} \)

5. K-Nearest Neighbors (KNN)

KNN classifies data based on the majority vote of its \( k \)-nearest neighbors.

Decision Rule

\[ \hat{y} = \text{Mode}(\{y_{i_1}, y_{i_2}, \dots, y_{i_k}\}) \]

Where \( y_{i_j} \) are the labels of the nearest neighbors.

6. Naive Bayes Classifier

Naive Bayes applies Bayes' theorem under the assumption of conditional independence.

Formula

\[ P(y|X) \propto P(X|y)P(y) \]

For features \( X = \{x_1, x_2, \dots, x_n\} \): [ P(X|y) = \prod_{i=1}^n P(x_i|y) ]

7. Neural Networks

Neural Networks use layers of interconnected neurons to model complex patterns.

Formula

For a single neuron:

\[ z = w \cdot X + b, \quad a = \sigma(z) \]

Where:

\( z \) is the weighted sum.
\( a \) is the activation output.
\( \sigma \) is the activation function (e.g., sigmoid, ReLU).

8. Gradient Boosting Classifier

Gradient Boosting builds an additive model by minimizing a loss function.

Update Rule

\[ F_m(X) = F_{m-1}(X) + h_m(X) \]

Where \( h_m(X) \) is the weak learner (decision tree).

Performance Metrics

Confusion Matrix

A confusion matrix summarizes prediction results:

	Predicted Positive	Predicted Negative
Actual Positive	True Positive (TP)	False Negative (FN)
Actual Negative	False Positive (FP)	True Negative (TN)

Metrics

Accuracy:

\[ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \]

Precision:

\[ \text{Precision} = \frac{TP}{TP + FP} \]

Recall:

\[ \text{Recall} = \frac{TP}{TP + FN} \]

F1-Score:

\[ \text{F1-Score} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} \]

ROC-AUC:

Area under the ROC curve measures the model's ability to distinguish between classes.

Choosing the Right Algorithm

Linear Relationships: Logistic Regression, SVM (with linear kernel).
Non-linear Data: Decision Trees, Random Forest, SVM (with RBF kernel).
Text Data: Naive Bayes, Logistic Regression.
Large Datasets: Neural Networks, Gradient Boosting.

Conclusion

Classification is a cornerstone of machine learning with algorithms ranging from simple models like Logistic Regression to complex ones like Gradient Boosting. Selecting the right model depends on the data and problem domain.