Description: Models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data.
varsha-sweetie / ML Cheatsheet
ML Cheatsheet
A comprehensive cheat sheet covering core machine learning algorithms, evaluation metrics, and essential concepts for interview preparation. Includes supervised, unsupervised learning, deep learning and NLP.
Supervised Learning: Regression
Linear Regression
|
|
|
Formula: y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n + \epsilon |
|
Assumptions: Linearity, independence, homoscedasticity, normality of residuals. |
|
Use Cases: Predicting sales, estimating prices, forecasting demand. |
|
Advantages: Simple, easy to interpret, computationally efficient. |
|
Disadvantages: Sensitive to outliers, assumes linearity, can suffer from multicollinearity. |
|
Regularization: Not inherently regularized. Use Ridge or Lasso for regularization. |
Ridge Regression
|
Description: Linear regression with L2 regularization. Adds a penalty term equal to the square of the magnitude of coefficients. |
|
Formula: Minimize $ \sum_{i=1}{n}(y_i - \beta_0 - \sum_{j=1}{p} \beta_jx_{ij})2 + \alpha \sum_{j=1}{p} \beta_j^2$ |
|
Effect of α: Controls the strength of regularization. Higher α shrinks coefficients towards zero, reducing overfitting. |
|
Use Cases: When multicollinearity is present, or to prevent overfitting. |
|
Advantages: Reduces overfitting, handles multicollinearity better than linear regression. |
|
Disadvantages: Requires tuning of the regularization parameter α, less interpretable than linear regression. |
Lasso Regression
|
Description: Linear regression with L1 regularization. Adds a penalty term equal to the absolute value of the magnitude of coefficients. |
|
Formula: Minimize $ \sum_{i=1}{n}(y_i - \beta_0 - \sum_{j=1}{p} \beta_jx_{ij})2 + \alpha \sum_{j=1}{p} |\beta_j|$ |
|
Effect of α: Controls the strength of regularization. Higher α can lead to feature selection (some coefficients become exactly zero). |
|
Use Cases: Feature selection, when many features are irrelevant. |
|
Advantages: Performs feature selection, reduces overfitting, handles multicollinearity. |
|
Disadvantages: Can arbitrarily select one feature among correlated features, requires tuning of the regularization parameter α. |
|
Notes: L1 regularization. |
Supervised Learning: Classification
Logistic Regression
|
Description: Models the probability of a binary outcome using a logistic function. |
|
Formula: p(y=1|x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x_1 + ... + \beta_nx_n)}} |
|
Use Cases: Binary classification problems like spam detection, disease prediction. |
|
Advantages: Simple, interpretable, provides probability estimates. |
|
Disadvantages: Assumes linearity, can suffer from overfitting with high-dimensional data. |
|
Regularization: Can be regularized using L1 or L2 regularization to prevent overfitting. |
k-Nearest Neighbors (k-NN)
|
Description: Classifies data points based on the majority class among its k nearest neighbors. |
|
Algorithm:
|
|
Use Cases: Recommendation systems, pattern recognition, image classification. |
|
Advantages: Simple, no training phase, versatile. |
|
Disadvantages: Computationally expensive, sensitive to irrelevant features, requires appropriate choice of k. |
|
Distance Metrics: Euclidean, Manhattan, Minkowski. |
Decision Trees
|
Description: A tree-like model that makes decisions based on features. Each node represents a feature, each branch represents a decision rule, and each leaf represents an outcome. |
|
Splitting Criteria: Gini impurity, entropy, information gain. |
|
Use Cases: Classification and regression tasks, feature selection, interpretable models. |
|
Advantages: Easy to understand and interpret, handles both categorical and numerical data, can capture non-linear relationships. |
|
Disadvantages: Prone to overfitting, can be sensitive to small changes in the data. |
|
Ensemble Methods: Random Forests, Gradient Boosting. |
Model Evaluation and Tuning
Evaluation Metrics
|
Accuracy: |
|
Precision: |
|
Recall: |
|
F1-Score: |
|
ROC-AUC: Area Under the Receiver Operating Characteristic curve. Measures the ability of a classifier to distinguish between classes. |
|
Confusion Matrix: A table summarizing the performance of a classification model. |
Model Tuning
|
Cross-validation: |
|
Bias-variance tradeoff: |
|
Overfitting/underfitting: |
|
Hyperparameter tuning: |
|
GridSearchCV: |
|
RandomizedSearchCV: |
Deep Learning Fundamentals
Core Concepts
|
Neural Networks: |
|
Perceptron: |
|
Activation Functions: |
|
Backpropagation: |
|
Loss Functions: |
|
Optimizers: |
Convolutional Neural Networks (CNNs)
|
Description: |
|
Key Layers: |
|
Use Cases: |
|
Advantages: |
|
Disadvantages: |