Linear Regression Examples in Python with Dataset
Linear regression is a fundamental machine learning algorithm used for predictive analysis. In this tutorial, we will explore linear regression examples in Python using real-world datasets. You’ll learn how to train a regression model, split datasets for testing, and make accurate predictions. Whether you are a beginner or an experienced developer, this guide will help you master linear regression in Python with practical examples and best practices.
Here are 10 Linear Regression Examples:
- Predicting Ice Cream Sales Based on Temperature
- House Price Prediction Using Square Footage
- Estimating Student Exam Scores Based on Study Hours
- Predicting Car Mileage (MPG) Based on Engine Size
- Salary Prediction Based on Years of Experience
- Forecasting Electricity Consumption Using Temperature Data
- Real Estate Rent Prediction Based on Location and Size
- Medical Insurance Cost Prediction Based on Age and BMI
- Advertising Spend vs. Product Sales Analysis
- Stock Price Prediction Using Historical Trends
Download link for Dataset of the above Linear Regression Exampels : Click Here to Download Dataset
Example 1: Predicting Ice Cream Sales Based on Temperature
Brief Explanation
In this example, we use linear regression to predict ice cream sales based on temperature. The idea is simple:
- As the temperature increases, more people buy ice cream, leading to higher sales.
- We assume a linear relationship between temperature (X) and sales (Y).
- Using linear regression, we train a model to predict sales for any given temperature.
This will be the only example we explain in detail. For the rest, you can apply similar techniques.
Dataset (Temperature vs. Ice Cream Sales)
| Temperature (°C) | Ice Cream Sales ($) |
|---|---|
| 15 | 200 |
| 18 | 300 |
| 20 | 400 |
| 22 | 500 |
| 24 | 600 |
| 26 | 800 |
| 28 | 1000 |
| 30 | 1200 |
| 32 | 1400 |
| 35 | 1600 |
| 37 | 1800 |
| 40 | 2000 |
| 42 | 2200 |
| 45 | 2500 |
Python Code
import pandas as pd
from sklearn.linear_model import LinearRegression
# Load the dataset from CSV
df = pd.read_csv("icecream_sales.csv") # Ensure your CSV file is in the same directory
# Extract features (Temperature) and target (Sales)
X = df[['Temperature']] # Independent variable (Temperature)
y = df['Sales'] # Dependent variable (Ice Cream Sales)
# Train the Linear Regression model
model = LinearRegression()
model.fit(X, y)
# Loop for continuous predictions
while True:
try:
temperature_input = float(input("Enter temperature (°C) or type 'no' to exit: "))
predicted_sales = model.predict([[temperature_input]])
print(f"Predicted Ice Cream Sales: ${predicted_sales[0]:.2f}")
except ValueError:
# If user enters "no" or any non-numeric input, exit the loop
print("Exiting...")
break
How It Works
- The program loads the dataset and trains a linear regression model.
- It continuously asks for a temperature input and predicts sales.
- If the user enters
"no", the program exits gracefully.
How to Apply Train-Test Split in Linear Regression?
Why Split Data?
- Training Set (
trainX, trainY): Used to train the model. - Testing Set (
testX, testY): Used to evaluate the model’s performance. - Helps to avoid overfitting by testing the model on unseen data.
Python Code using Train and Test Data
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error
# Load dataset
df = pd.read_csv("icecream_sales.csv")
# Features (Temperature) and Target (Sales)
X = df[['Temperature']]
y = df['Sales']
# Split into training (80%) and testing (20%) sets
trainX, testX, trainY, testY = train_test_split(X, y, test_size=0.2, random_state=42)
# Train the Linear Regression model
model = LinearRegression()
model.fit(trainX, trainY)
# Predict on test data
predictions = model.predict(testX)
# Evaluate the model
mae = mean_absolute_error(testY, predictions)
mse = mean_squared_error(testY, predictions)
rmse = np.sqrt(mse)
print(f"Mean Absolute Error: {mae:.2f}")
print(f"Root Mean Squared Error: {rmse:.2f}")
# Predict sales for a new temperature input
while True:
try:
temperature_input = float(input("Enter temperature (°C) or type 'no' to exit: "))
predicted_sales = model.predict([[temperature_input]])
print(f"Predicted Ice Cream Sales: ${predicted_sales[0]:.2f}")
except ValueError:
print("Exiting...")
break
How it Works
- Splits data (80% training, 20% testing) using
train_test_split(). - Trains the model using
trainX, trainY. - Tests the model using
testX, testY. - Evaluates performance using MAE, MSE, and RMSE.
- Allows multiple predictions until the user exits.
Difference Between Both Approaches
| Feature | First Example (No Train-Test Split) | Second Example (With Train-Test Split) |
|---|---|---|
| Training Method | Trains on the entire dataset | Splits into training (80%) and testing (20%) |
| Testing | No separate testing dataset | Evaluates model using unseen test data |
| Overfitting Risk | High (since model sees all data) | Lower (model tested on unseen data) |
| Error Evaluation | No error metrics provided | Uses MAE, MSE, RMSE for performance check |
| Prediction Method | Predicts directly on input | Predicts after testing the model |