Predicting House Prices with Machine Learning

·

8 min read

You will learn how to predict the house price using Machine Learning and host a Django website.

For the model, we are just going to create a basic model using Linear regression from sklearn and we will not focus on improving the accuracy of model since this post is mainly to show a simple end-to-end project from building a model and deployment.

This dataset contains information collected by the U.S Census Service concerning housing in the area of Boston Mass. It was obtained from the StatLib archive (lib.stat.cmu.edu/datasets/boston), and has been used extensively throughout the literature to benchmark algorithms. However, these comparisons were primarily done outside of Delve and are thus somewhat suspect. The dataset is small in size with only 506 cases.

The data was originally published by Harrison, D. and Rubinfeld, D.L. `Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.

Variables

There are 14 attributes in each case of the dataset. They are:

CRIM - per capita crime rate by town

ZN - proportion of residential land zoned for lots over 25,000 sq.ft.

INDUS - proportion of non-retail business acres per town.

CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)

NOX - nitric oxides concentration (parts per 10 million)

RM - average number of rooms per dwelling

AGE - proportion of owner-occupied units built prior to 1940

DIS - weighted distances to five Boston employment centres

RAD - index of accessibility to radial highways

TAX - full-value property-tax rate per $10,000

PTRATIO - pupil-teacher ratio by town

B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

LSTAT - % lower status of the population

MEDV - Median value of owner-occupied homes in $1000's (TARGET)

1. Model

Importing Libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston

Loading and Cleaning data with pandas

data = load_boston()
for keys in data:
    print(keys)

image.png

See the full description of the dataset

print(data.DESCR)

image.png

And to just see the data

data

image.png

Convert the data into a pandas dataframe so that it is easier to understand for us and use this data frame for the data manipulation

df = pd.DataFrame(data.data,columns = data.feature_names)
df.head()

To check the shape of our data

df.shape

Now add the target column for our data

df['MEDV'] = data.target
df.head()

EDA

Choose any features to explore relationship with target variables.

import seaborn as sns
sns.set(rc={'figure.figsize':(15.7,8.27)})
sns.scatterplot(data = df,x = 'RM',y = 'MEDV')
plt.xlabel("Number of rooms")
plt.ylabel("Median Value in $1000's")
plt.show()

image.png

Create a Model

#create instance of LinearRegression

model = LinearRegression()

model

Training our Model

model.fit(np.array(df.RM).reshape(-1,1),df.MEDV)

model

To see the coefficients of our linear function

model.coef_

To print the intercept of a function

model.intercept_

Prediction

x = np.array([6.25]).reshape(-1,1)

model.predict(x)

Check MSE

x = np.array(df.RM).reshape(-1,1)

from sklearn.metrics import mean_squared_error

y_pred = model.predict(x)

y_pred

image.png

MSE = mean_squared_error(df.MEDV,y_pred)
MSE

Try to plot the fitted line

sns.set(rc={'figure.figsize':(15.7,8.27)})
sns.scatterplot(data = df,x = 'RM',y = 'MEDV')
plt.plot(x, y_pred,color='red')
plt.xlabel("Number of rooms")
plt.ylabel("Median Value in $1000's")
plt.show()

image.png

Prediction accuracy

#accuracy score for fitted line 
model.score(df.MEDV,y_pred)

Save the model using pickle as houseprice.pkl

import joblib

# Save the model as a pickle in a file
joblib.dump(model, 'houseprice.pkl')

# Load the model from the file
model_from_joblib = joblib.load('houseprice.pkl')

# Use the loaded model to make predictions
model_from_joblib.predict(x)

2. Website

Now that we have our model saved, we can use this model in our website to predict the price of a house. We will create a simple website using Django, simple HTML, CSS and bootstrap.

We need to create a virtual environment so that we can install the packages and store in this particular environment.

py -3 -m venv env

Activate the environment

.\env\Scripts\activate
pip install django
pip install sklearn
pip install joblib

We will first create our project in django using the following. In the following we kept our project name as 'housepriceprediction'

django-admin startproject housepriceprediction

cd into the project directory

cd housepriceprediction

Now, inside the project we will create our app called 'core'

python manage.py startapp core

Add the app in the INSTALLED_APPS list in settings.py file:

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'core',  # add the app to installed app list here
]

Create templates

Create templates folder inside the project directory (root) and create two html files:

image.png

'index.html' :

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Home</title>
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-1BmE4kWBq78iYhFldvKuhfTAU6auU8tT94WrHftjDbrCEXSU1oBoqyl2QvZ6jIW3" crossorigin="anonymous">
</head>

<body>
    <div class="card text-center" style="padding: 2rem; width: 60%; margin: 20%;background-color: #D9AFD9;
    background-image: linear-gradient(0deg, #D9AFD9 0%, #97D9E1 100%);
    ">

        <h2 style="font-weight: 700;">Prediction of Boston House Prices</h2>

        <div class="card-body">
            {% if error %}
            <p>{{ error }}</p>
            {% endif %}
            <form action="{% url 'core:predict' %}" class="rmForm" method="POST">
                {% csrf_token %}
                <label for="rminput">Number of Rooms: </label>
                <input type="text" id="rminput" name="rminput">
                <input type="submit" class="btn btn-primary" value="Predict">
            </form>

        </div>


    </div>


</body>

</html>

'result.html':

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Result</title>
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-1BmE4kWBq78iYhFldvKuhfTAU6auU8tT94WrHftjDbrCEXSU1oBoqyl2QvZ6jIW3" crossorigin="anonymous">

</head>

<body>

    <div class="card text-center" style="padding: 22.2%; background-color: #8EC5FC;
    background-image: linear-gradient(62deg, #8EC5FC 0%, #E0C3FC 100%);
    ">

        <h2 style="font-weight: 800;">Result: ${{result}} thousands</h2>

        <span><a href="{% url 'core:index' %}" class="btn">Go back</a></span>



    </div>


</body>

</html>

Next, add path to the templates inside the settings.py by adding 'TEMPLATE_DIR' in TEMPLATES list:

TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': [TEMPLATE_DIR],
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                'django.template.context_processors.debug',
                'django.template.context_processors.request',
                'django.contrib.auth.context_processors.auth',
                'django.contrib.messages.context_processors.messages',
            ],
        },
    },
]

Loading the model and use it for prediction

Create a 'model' directory inside the project root folder as shown below and put your model that you saved in the above steps inside this folder:

image.png

Create the view inside 'views.py' file. In here we load our model using joblib and then use this model for prediction:

from django.shortcuts import render

# Create your views here.
import joblib
import numpy as np


model_from_joblib = joblib.load('model/houseprice.pkl')


def index(request):
    return render(request,'index.html')

def predict(request):
    try:
        nosrm = [request.POST.get('rminput')]
        # print(nosrm)
        # print(type(nosrm))                  #get the sentence 
        x = np.array([nosrm], dtype=float).reshape(-1,1)  #convert the number of rooms to float

        # Use the loaded model to make predictions
        prediction = model_from_joblib.predict(x)
        prediction = format(prediction[0], '.3f')  # format to 3 decimals

        context = {'result': prediction, 'rooms': x}
    except:
        context = {'error': 'Please enter the number of rooms'}
        return render(request, 'index.html', context)

    return render(request,'result.html',context)

Inside the app directory i.e 'core' folder in our case, create a file called 'urls.py' and paste the following code to create the url to predict view:

image.png

from django.urls import path
from .views import index, predict

app_name = 'core'

urlpatterns = [
    path('',index,name='index'),
    path('predict/', predict, name='predict'),
]

Go to the 'urls.py' file of the project directory and include the url that we created inside the core:

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path('admin/', admin.site.urls),
    path('', include('core.urls')),
]

Deploy on Heroku

Firstly install the following modules

pip install gunicorn
pip install whitenoise
pip install dj-database-url
pip install psycopg2

Add all dependencies installed in env to requirements.txt,

pip freeze > requirements.txt

Create Procfile

In the Procfile: housepriceprediction is a projectname

web: gunicorn housepriceprediction.wsgi —log-file -

Make runtime.txt , all those files(requirements.txt, Procfile, and runtime.txt) should be inside the root directory(inside housepriceprediction)

runtime.txt:

python-3.9.11

Check the supported version in devcenter.heroku.com

In order to host a website on heroku, we need to create an account on heroku and download and set up the heroku CLI

i. Make a heroku account ii. Download Heroku CLI iii. Configure Django Heroku:

We will add the files on heroku using git

git init
git add .
git commit -m "First commit"

Log into heroku by typing into the command line

heroku login

Create a website name

heroku create housepprediction
heroku config:set DISABLE_COLLECTSTATIC=1

In the settings.py :

Add it in MIDDLEWARE in settings.py file

DEBUG = False
ALLOWED_HOSTS = ['housepprediction.herokuapp.com', 'localhost', '127.0.0.1'] 


MIDDLEWARE = [
  # 'django.middleware.security.SecurityMiddleware',
  'whitenoise.middleware.WhiteNoiseMiddleware',

]

In case use want to use the database from the heroku, specify the following in settings.py to say you will use the default heroku database.

Database update for heroku

import dj_database_url

db_from_env = dj_database_url.config(conn_max_age=600)
DATABASES['default'].update(db_from_env)

Make migrations on heroku using following (This is not necessary if we don't want to make changes or use database in our website)

heroku run python manage.py makemigrations
heroku run python manage.py migrate

If above don't work, turn on a vpn

Now finally push the files to heroku

git push heroku master

Open the website from terminal as below, you can also directly open from your heroku account dashboard.

heroku open

Finally our machine learning project is completed.

In case, you want to make an update to your website or improve your model, you can make changes to the local files first and the push it again to heroku. This will update the files on heroku and updated website will be displayed.

In order to do this, we will again git to push our edited files:

git add .
git commit -m "edited "
git push heroku master

Check out my final result on heroku here.

My github link