ML Master Notes

Your complete Machine Learning repository for B.Tech success

500+ Topics
1000+ Code Examples
500+ Interview Qs

Study Streak

0 days

Keep learning daily to maintain your streak!

Overall Progress

0%

Bookmarked

No bookmarks yet. Start exploring!

Recently Viewed

Start learning to see your history

Linear Algebra

Vectors

Definition

A vector is a mathematical object that has both magnitude and direction. In machine learning, vectors represent data points in n-dimensional space.

Key Formulas

Vector addition: $\\mathbf{u} + \\mathbf{v} = (u_1 + v_1, u_2 + v_2, ..., u_n + v_n)$

Dot product: $\\mathbf{u} \\cdot \\mathbf{v} = \\sum_{i=1}^{n} u_i v_i$

Magnitude: $\\|\\mathbf{v}\\| = \\sqrt{\\sum_{i=1}^{n} v_i^2}$

Example

Given vectors $\\mathbf{a} = [3, 4]$ and $\\mathbf{b} = [1, 2]$:

  • Addition: $\\mathbf{a} + \\mathbf{b} = [4, 6]$
  • Dot product: $\\mathbf{a} \\cdot \\mathbf{b} = 3(1) + 4(2) = 11$
  • Magnitude of a: $\\|\\mathbf{a}\\| = \\sqrt{9 + 16} = 5$

Interview Questions

Q: Why are vectors important in ML?

A: Vectors allow us to represent data points, features, and weights mathematically. Operations like dot products form the basis of similarity measures, neural network computations, and geometric interpretations of data.

Matrices

Definition

A matrix is a rectangular array of numbers arranged in rows and columns. In ML, matrices represent datasets, transformations, and model parameters.

Key Operations

Matrix multiplication: $(AB)_{ij} = \\sum_{k=1}^{n} A_{ik} B_{kj}$

Transpose: $(A^T)_{ij} = A_{ji}$

Determinant (2×2): $\\det(A) = ad - bc$ for $A = \\begin{bmatrix} a & b \\\\ c & d \\end{bmatrix}$

Python Implementation

import numpy as np

# Create matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix operations
C = np.dot(A, B)  # Matrix multiplication
D = A.T           # Transpose
det = np.linalg.det(A)  # Determinant
inv = np.linalg.inv(A)  # Inverse

Eigenvalues & Eigenvectors

Definition

An eigenvector of a matrix is a non-zero vector that, when multiplied by the matrix, results in a scalar multiple of itself. The scalar is called the eigenvalue.

$A\\mathbf{v} = \\lambda\\mathbf{v}$

Applications in ML

  • PCA: Uses eigenvectors to find principal components
  • Spectral Clustering: Uses eigenvalues for graph partitioning
  • PageRank: Uses eigenvectors for ranking web pages
  • Dimensionality Reduction: Eigenvectors define new feature space

Probability

Basic Probability

Definition

Probability measures the likelihood of an event occurring, ranging from 0 (impossible) to 1 (certain).

Key Formulas

$P(A) = \\frac{\\text{Number of favorable outcomes}}{\\text{Total number of outcomes}}$

$P(A \\cup B) = P(A) + P(B) - P(A \\cap B)$

$P(A|B) = \\frac{P(A \\cap B)}{P(B)}$ (Conditional Probability)

Example: Bayes' Theorem

$P(A|B) = \\frac{P(B|A) \\cdot P(A)}{P(B)}$

Application: Naive Bayes classifier uses this to calculate the probability of a class given features.

Probability Distributions

Normal Distribution

$f(x) = \\frac{1}{\\sigma\\sqrt{2\\pi}} e^{-\\frac{(x-\\mu)^2}{2\\sigma^2}}$

Used in: Gaussian Naive Bayes, Linear Regression errors

Binomial Distribution

$P(X=k) = \\binom{n}{k} p^k (1-p)^{n-k}$

Used in: Binary classification, A/B testing

Poisson Distribution

$P(X=k) = \\frac{\\lambda^k e^{-\\lambda}}{k!}$

Used in: Count data, event prediction

Python for ML - Basics

Python Fundamentals

Why Python for ML?

  • Simple, readable syntax
  • Extensive libraries (NumPy, Pandas, Scikit-learn)
  • Strong community support
  • Integration with C/C++ for performance

Basic Data Structures

# Lists
my_list = [1, 2, 3, 4, 5]
my_list.append(6)
my_list[0]  # Access: 1

# Dictionaries
my_dict = {'name': 'ML', 'type': 'algorithm'}
my_dict['name']  # Access: 'ML'

# NumPy Arrays
import numpy as np
arr = np.array([1, 2, 3])
arr_2d = np.array([[1, 2], [3, 4]])

NumPy Essentials

NumPy Operations

import numpy as np

# Array creation
arr = np.array([1, 2, 3, 4, 5])
zeros = np.zeros((3, 3))
ones = np.ones((2, 4))
random = np.random.randn(3, 3)

# Operations
mean = np.mean(arr)
std = np.std(arr)
dot = np.dot(arr, arr)
reshape = arr.reshape(5, 1)

Common Mistakes

  • Using Python lists instead of NumPy arrays for large data
  • Not specifying dtype when memory efficiency matters
  • Modifying arrays in-place unintentionally

Linear Regression

Supervised
Regression
Parametric

Definition & Intuition

What is Linear Regression?

Linear Regression is a supervised learning algorithm that models the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a linear equation.

Intuition

Imagine drawing the best-fit straight line through scattered data points. This line minimizes the total distance (error) from all points to the line. The algorithm finds the line that best predicts the target variable based on input features.

Mathematical Formulation

Simple Linear Regression: $y = \\beta_0 + \\beta_1 x + \\epsilon$

Multiple Linear Regression: $y = \\beta_0 + \\beta_1 x_1 + \\beta_2 x_2 + ... + \\beta_n x_n + \\epsilon$

Matrix Form: $\\mathbf{y} = \\mathbf{X}\\beta + \\epsilon$

Cost Function & Optimization

Mean Squared Error (MSE)

$J(\\beta) = \\frac{1}{2m} \\sum_{i=1}^{m} (h_\\beta(x^{(i)}) - y^{(i)})^2$

Where $m$ is the number of training examples, $h_\\beta(x)$ is the hypothesis function.

Normal Equation (Closed Form)

$\\beta = (X^T X)^{-1} X^T y$

Pros: No iteration needed, exact solution

Cons: $O(n^3)$ complexity, doesn't work if $X^T X$ is singular

Gradient Descent Update

$\\beta_j := \\beta_j - \\alpha \\frac{\\partial J}{\\partial \\beta_j}$

$\\frac{\\partial J}{\\partial \\beta_j} = \\frac{1}{m} \\sum_{i=1}^{m} (h_\\beta(x^{(i)}) - y^{(i)}) x_j^{(i)}$

Python Implementation

From Scratch

import numpy as np

class LinearRegression:
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.lr = learning_rate
        self.n_iter = n_iterations
        self.weights = None
        self.bias = None
    
    def fit(self, X, y):
        n_samples, n_features = X.shape
        
        # Initialize parameters
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        # Gradient descent
        for _ in range(self.n_iter):
            y_pred = self.predict(X)
            
            # Compute gradients
            dw = (1/n_samples) * np.dot(X.T, (y_pred - y))
            db = (1/n_samples) * np.sum(y_pred - y)
            
            # Update parameters
            self.weights -= self.lr * dw
            self.bias -= self.lr * db
    
    def predict(self, X):
        return np.dot(X, self.weights) + self.bias

Using Scikit-Learn

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create and train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Coefficients: {model.coef_}")
print(f"Intercept: {model.intercept_}")
print(f"MSE: {mse:.4f}")
print(f"R² Score: {r2:.4f}")

Advantages & Disadvantages

Advantages

  • Simple to understand and interpret
  • Fast training and prediction
  • No hyperparameter tuning required
  • Works well with linearly separable data
  • Less prone to overfitting (with regularization)

Disadvantages

  • Assumes linear relationship (often not true)
  • Sensitive to outliers
  • Multicollinearity can cause issues
  • Cannot model complex patterns
  • Assumes independent features

Interview Questions

Q: What are the assumptions of Linear Regression?

  1. Linearity: Relationship between X and y is linear
  2. Independence: Observations are independent
  3. Homoscedasticity: Constant variance of errors
  4. Normality: Errors are normally distributed
  5. No multicollinearity: Features are not highly correlated

Q: How do you handle multicollinearity?

  • Remove highly correlated features
  • Use PCA for dimensionality reduction
  • Apply Ridge or Lasso regularization
  • Calculate VIF (Variance Inflation Factor)

Q: What's the difference between R² and Adjusted R²?

R² always increases when adding features, even if they're not useful. Adjusted R² penalizes for adding unnecessary features:

$R^2_{adj} = 1 - \\frac{(1-R^2)(n-1)}{n-p-1}$

Where n = samples, p = features

Exploratory Data Analysis (EDA)

What is EDA?

Definition

Exploratory Data Analysis (EDA) is the process of analyzing datasets to summarize their main characteristics, often using visual methods. It's a critical first step in any data science project.

Key Steps in EDA

  1. Understand the data: Shape, columns, data types
  2. Check for missing values: Identify and plan handling strategy
  3. Statistical summary: Mean, median, std, percentiles
  4. Visualize distributions: Histograms, box plots
  5. Check correlations: Heatmaps, scatter plots
  6. Identify outliers: Box plots, Z-score

EDA with Pandas

Complete EDA Workflow

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
df = pd.read_csv('data.csv')

# Basic info
print(df.shape)           # (rows, columns)
print(df.info())          # Data types and non-null counts
print(df.describe())      # Statistical summary

# Check missing values
print(df.isnull().sum())

# Value counts for categorical
print(df['category'].value_counts())

# Correlation matrix
corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.show()

# Distribution plots
df['numeric_col'].hist(bins=30)
sns.boxplot(data=df, x='category', y='numeric_col')

Interview Preparation Hub

ML Questions

100+ Questions

Python Questions

100+ Questions

DSA Questions

100+ Questions

HR Questions

50+ Questions

Top 10 ML Interview Questions

1. What is the difference between supervised and unsupervised learning?

Supervised Learning: Uses labeled data (input-output pairs). The algorithm learns to map inputs to outputs. Examples: Classification, Regression.

Unsupervised Learning: Uses unlabeled data. The algorithm finds patterns and structures. Examples: Clustering, Dimensionality Reduction.

2. Explain overfitting and how to prevent it.

Overfitting: Model performs well on training data but poorly on test data. It memorizes noise instead of learning patterns.

Prevention:

  • Cross-validation
  • Regularization (L1, L2)
  • More training data
  • Feature selection
  • Early stopping
  • Dropout (in neural networks)

3. What is the bias-variance tradeoff?

Bias: Error from oversimplified assumptions. High bias = underfitting.

Variance: Error from sensitivity to small fluctuations. High variance = overfitting.

Tradeoff: As model complexity increases, bias decreases but variance increases. The goal is to find the sweet spot that minimizes total error.

Total Error = Bias² + Variance + Irreducible Error

Action completed