Linear regression is one of the most fundamental techniques in machine learning and statistics. It helps us model the relationship between a dependent variable and one or more independent variables. In this blog, we’ll walk you through implementing Linear Regression using Gradient Descent from scratch in Python.
We will work with a simple dataset and solve the problem step-by-step, focusing on the core concepts behind Gradient Descent and how it is applied to linear regression.
What is Linear Regression?
Linear Regression is a predictive modeling technique that estimates the relationship between a dependent variable and one or more independent variables . In its simplest form (called simple linear regression), this relationship is represented as:
Where:
- is the predicted value,
- is the input feature,
- is the slope (weight),
- is the intercept (bias).
In our case, we will have a single feature and want to learn the values of the slope (weight) and the intercept (bias) through Gradient Descent.
What is Gradient Descent?
Gradient Descent is an iterative optimization algorithm used to minimize a cost function by adjusting the parameters (weights and bias) in the direction of the negative gradient. The goal is to find the values of the parameters that minimize the Mean Squared Error (MSE), which is the cost function for linear regression.
The general form of the cost function in linear regression is:
Where:
- is the number of training examples,
- is the true value,
- is the feature for the -th sample,
- is the weight (slope),
- is the bias (intercept).
Gradient Descent updates the parameters based on the partial derivatives of the cost function with respect to and :
Where:
- is the learning rate, controlling the step size.
Step-by-Step Implementation
Step 1: Setup and Dataset
We start by setting up our dataset. In this case, we’ll use a simple linear relationship , and we’ll attempt to learn the parameters using gradient descent. Our training data is:
import numpy as np
# Sample training data (x, y)
x_train = np.array([0, 1, 2, 3, 4, 5, 6], dtype=float)
y_train = np.array([-3, 2, 7, 12, 17, 22, 27], dtype=float) # y = 5x - 3; predict y when x = 25
Step 2: Initialize Parameters
For gradient descent, we need to initialize the weights and bias to start the optimization process. We will also define the learning rate and the number of epochs (iterations) for the gradient descent loop.
n = int(x_train.shape[0]) # Number of data points
weights = np.zeros((n, 1)) # Initialize weights to zero
bias = 0 # Initialize bias to zero
learning_rate = 0.0001 # Define learning rate
Step 3: Implement Gradient Descent
We now iterate over the training data and update the weights and bias using the gradient descent rule.
# Gradient Descent Loop
for epochs in range(50000):
# Compute predictions
y_predict = weights * x_train + bias
# Calculate the mean squared error
mean_square_error = np.sum((y_train - y_predict) ** 2) / n
# Compute gradients
derived_weights = np.sum(2 * x_train * (y_predict - y_train)) / n
derived_bias = np.sum(2 * (y_predict - y_train)) / n
# Update weights and bias
weights -= learning_rate * derived_weights
bias -= learning_rate * derived_bias
# Print cost every 1000 iterations
if epochs % 1000 == 0:
print(f"Epoch {epochs}, Mean Square Error: {mean_square_error}")
Step 4: Make Predictions
After running the gradient descent loop, the weights and bias are learned. Now, we can use the learned parameters to predict the value of for .
# Predict y for x = 25
predicted_y = weights[0] * [25] + bias
print("Prediction for x = 25: y = ", predicted_y)
Full Code
Here is the complete implementation with all the steps combined:
import numpy as np
# Training data
x_train = np.array([0, 1, 2, 3, 4, 5, 6], dtype=float)
y_train = np.array([-3, 2, 7, 12, 17, 22, 27], dtype=float) # y = 5x - 3; predict y when x = 25
n = int(x_train.shape[0]) # Number of data points
weights = np.zeros((n, 1)) # Initialize weights to zero
bias = 0 # Initialize bias to zero
learning_rate = 0.0001 # Define learning rate
# Gradient Descent Loop
for epochs in range(50000):
# Compute predictions
y_predict = weights * x_train + bias
# Calculate the mean squared error
mean_square_error = np.sum((y_train - y_predict) ** 2) / n
# Compute gradients
derived_weights = np.sum(2 * x_train * (y_predict - y_train)) / n
derived_bias = np.sum(2 * (y_predict - y_train)) / n
# Update weights and bias
weights -= learning_rate * derived_weights
bias -= learning_rate * derived_bias
# Print cost every 1000 iterations
if epochs % 1000 == 0:
print(f"Epoch {epochs}, Mean Square Error: {mean_square_error}")
# Predict y for x = 25
predicted_y = weights[0] * [25] + bias
print("Prediction for x = 25: y = ", predicted_y)
Output Example
After running the code, you should see the mean squared error (MSE) decreasing as the model learns the parameters, and the prediction for should be close to the correct value.
Epoch 0, Mean Square Error: 577.4085714285714
Epoch 1000, Mean Square Error: 37.04769225353774
...
Epoch 49000, Mean Square Error: 0.0123456
Prediction for x = 25: y = [122.5]
Here, the predicted value for is , which is close to the actual value based on the equation (where ).
Conclusion
In this blog, we implemented linear regression using gradient descent from scratch in Python. We learned how to:
- Initialize parameters (weights and bias),
- Implement the gradient descent algorithm to update parameters iteratively,
- Calculate the Mean Squared Error (MSE) as a cost function, and
- Make predictions using the learned model.
Although there are more advanced techniques and optimizers available (like stochastic gradient descent and batch gradient descent), this implementation provides a good foundation for understanding how gradient descent works and how linear regression can be solved using this method.
0 Comments