ADVERTISEMENT
ADVERTISEMENT

Linear Regression Examples in Python with Dataset

Linear regression is a fundamental machine learning algorithm used for predictive analysis. In this tutorial, we will explore linear regression examples in Python using real-world datasets. You’ll learn how to train a regression model, split datasets for testing, and make accurate predictions. Whether you are a beginner or an experienced developer, this guide will help you master linear regression in Python with practical examples and best practices.

Here are 10 Linear Regression Examples:

  1. Predicting Ice Cream Sales Based on Temperature
  2. House Price Prediction Using Square Footage
  3. Estimating Student Exam Scores Based on Study Hours
  4. Predicting Car Mileage (MPG) Based on Engine Size
  5. Salary Prediction Based on Years of Experience
  6. Forecasting Electricity Consumption Using Temperature Data
  7. Real Estate Rent Prediction Based on Location and Size
  8. Medical Insurance Cost Prediction Based on Age and BMI
  9. Advertising Spend vs. Product Sales Analysis
  10. Stock Price Prediction Using Historical Trends

Download link for Dataset of the above Linear Regression Exampels : Click Here to Download Dataset

Example 1: Predicting Ice Cream Sales Based on Temperature

 

Brief Explanation 

In this example, we use linear regression to predict ice cream sales based on temperature. The idea is simple:

  • As the temperature increases, more people buy ice cream, leading to higher sales.
  • We assume a linear relationship between temperature (X) and sales (Y).
  • Using linear regression, we train a model to predict sales for any given temperature.

This will be the only example we explain in detail. For the rest, you can apply similar techniques.

Dataset (Temperature vs. Ice Cream Sales)

Temperature (°C) Ice Cream Sales ($)
15 200
18 300
20 400
22 500
24 600
26 800
28 1000
30 1200
32 1400
35 1600
37 1800
40 2000
42 2200
45 2500

Python Code

import pandas as pd
from sklearn.linear_model import LinearRegression

# Load the dataset from CSV
df = pd.read_csv("icecream_sales.csv")  # Ensure your CSV file is in the same directory

# Extract features (Temperature) and target (Sales)
X = df[['Temperature']]  # Independent variable (Temperature)
y = df['Sales']          # Dependent variable (Ice Cream Sales)

# Train the Linear Regression model
model = LinearRegression()
model.fit(X, y)

# Loop for continuous predictions
while True:
    try:
        temperature_input = float(input("Enter temperature (°C) or type 'no' to exit: "))
        predicted_sales = model.predict([[temperature_input]])
        print(f"Predicted Ice Cream Sales: ${predicted_sales[0]:.2f}")
    except ValueError:
        # If user enters "no" or any non-numeric input, exit the loop
        print("Exiting...")
        break

How It Works

  • The program loads the dataset and trains a linear regression model.
  • It continuously asks for a temperature input and predicts sales.
  • If the user enters "no", the program exits gracefully.

How to Apply Train-Test Split in Linear Regression? 

Why Split Data?

  • Training Set (trainX, trainY): Used to train the model.
  • Testing Set (testX, testY): Used to evaluate the model’s performance.
  • Helps to avoid overfitting by testing the model on unseen data.

Python Code using Train and Test Data 

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Load dataset
df = pd.read_csv("icecream_sales.csv")  

# Features (Temperature) and Target (Sales)
X = df[['Temperature']]  
y = df['Sales']          

# Split into training (80%) and testing (20%) sets
trainX, testX, trainY, testY = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Linear Regression model
model = LinearRegression()
model.fit(trainX, trainY)

# Predict on test data
predictions = model.predict(testX)

# Evaluate the model
mae = mean_absolute_error(testY, predictions)
mse = mean_squared_error(testY, predictions)
rmse = np.sqrt(mse)

print(f"Mean Absolute Error: {mae:.2f}")
print(f"Root Mean Squared Error: {rmse:.2f}")

# Predict sales for a new temperature input
while True:
    try:
        temperature_input = float(input("Enter temperature (°C) or type 'no' to exit: "))
        predicted_sales = model.predict([[temperature_input]])
        print(f"Predicted Ice Cream Sales: ${predicted_sales[0]:.2f}")
    except ValueError:
        print("Exiting...")
        break

How it Works

  • Splits data (80% training, 20% testing) using train_test_split().
  • Trains the model using trainX, trainY.
  • Tests the model using testX, testY.
  • Evaluates performance using MAE, MSE, and RMSE.
  • Allows multiple predictions until the user exits.

Difference Between Both Approaches

Feature First Example (No Train-Test Split) Second Example (With Train-Test Split)
Training Method Trains on the entire dataset Splits into training (80%) and testing (20%)
Testing No separate testing dataset Evaluates model using unseen test data
Overfitting Risk High (since model sees all data) Lower (model tested on unseen data)
Error Evaluation No error metrics provided Uses MAE, MSE, RMSE for performance check
Prediction Method Predicts directly on input Predicts after testing the model

 


ADVERTISEMENT

ADVERTISEMENT