Skip to the content.

Build a Python Package from your ML Model

This is the second part of the multi-part series on how to build and deploy a machine learning model -  building and installing a python package out of your predictive model in Python


The first part on building pipelines can be read here

The first part covers how to re-write your model code into the form of a sklearn pipeline for easy understanding, management, and edits. A model can be deployed without the pipeline structure, but it is always the best practice to make pipelines and separate different parts of the code (config, preprocessing, feature engineering, data, and tests).

This post builds up from the earlier code of building a pipeline. If you had difficulty following the previous article, you can read on how to build sklearn-pipelines on the Internet, and then look at the GitHub repos for each stage of package building

Part 1: Organize code in pipelines, Training the model

The directories are restructured as in the image below


This is just a part of the code which uses three main files:,, and Apart from this, train.csv and test.csv are stored in the folder /packages/regression_model/datasets .

Every folder must have a file (they are not present in the GitHub repo)

The GitHub repo for Part 1 is here

Details of directories:

Packages: Root folder containing the package .

Regression_model: Name of the package

Datasets: Test.csv and train.csv - Kaggle datasets on Housing price predictions downloaded from

Trained_model: the place for saving the models in .pkl file

Files: : Build a pipeline with all the operations All the fit and transform functions used in the pipeline Running the model and saving the models Requirements.txt: All the necessary packages with versions which need to be installed

Prerequisites before running the model and training

Create a new environment

pip install virtualenv
virtualenv my_env_name
source my_env_name/bin/activate

Building a new environment is recommended for various reasons. Read about it here[]

Add your directory to PYTHONPATH

Here is how to do it for mac [google for other OS, it is quite straightforward]

1. Open 
2. Open the file ~/.bash_profile in your text editor - e.g. atom ~/.bash_profile 
3. Add the following line to the end: export PYTHONPATH=""
4. Close terminal
5. Open and test $ echo PYTHONPATH

Installing package: Need to run the following command with the correct location of requirements.txt file

$ pip install -r requirements.txt

Running the model (training):

$ python packages/regression_model/


Output: a new file regression_model.pkl is generated in the packages/regression_model/trained_models folder

Part 2: Restructuring the project, making predictions and writing tests

The project needs to be restructured (will be explained when building package) so that we have a separate package directory with its own requirements.txt file, as well as a separate test module for testing the models before deployment .

GitHub repo for part 2 is here

Folder Structure


Note the new structure - there is a regression_model folder inside regression_model inside packages

The Github repo does not include files, please add them (blank files, no content) before running

Adding Test folder will be covered just after this block, need to install PyTest for this

Major Changes


Config files with all the fixed variable names, features, name of train and test data, target variable. This is done to clean up the code and make it more readable. Also if something needs to be changed (say the name of the file or removing a feature), it can be done only at one place rather than going through the code

Using the config files:

from regression_model.config import config

Packages/regression_model/regression_model/processing/ This contains functions to load_dataset, save_pipeline and load_pipeline. This cleans up the code Using

from regression_model.processing.data_management import ( load_dataset, save_pipeline)

Training the model (ensure you have added PYTHONPATH to environment variable as explained earlier)

$ python packages/regression_model/regression_model/

Make Predictions

$ python packages/regression_model/regression_model/

This will not print anything. To test if the modules are working fine, Test modules have to be added


New Directory for Test at packages/regression_model/tests 

Image1 contains the code for testing the model

Requirements.txt: Add

# testing

Writing tests is optional but it is always recommended. This will ensure that you model does not break at any point after you make any major or minor change.

Read more about tests here

Contents of Just check the first prediction is correct

import math

from regression_model.predict import make_prediction
from regression_model.processing.data_management import load_dataset

def test_make_single_prediction():
    # Given
    test_data = load_dataset(file_name='test.csv')
    single_test_json = test_data[0:1].to_json(orient='records')

    # When
    subject = make_prediction(input_data=single_test_json)

    # Then
    assert subject is not None
    assert isinstance(subject.get('predictions')[0], float)
    assert math.ceil(subject.get('predictions')[0]) == 112476

Running Tests:

$ pytest packages/regression_model/tests -W ignore::DeprecationWarnings


Part 3: Building the package

At this stage, your code is complete and has passed all the tests. The next step is building a package.

GitHub repo for Part 3 is here

These things need to be added to the current directory: provides detail on what files to keep in the package

include *.txt
include *.md
include *.cfg
include *.pkl
recursive-include ./regression_model/*

include regression_model/datasets/train.csv
include regression_model/datasets/test.csv
include regression_model/trained_models/*.pkl
include regression_model/VERSION

include ./requirements.txt
exclude *.log

recursive-exclude * __pycache__
recursive-exclude * *.py[co]Ne Other details on the model, meta-data, requirements, license information and other details

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import io
import os
from pathlib import Path

from setuptools import find_packages, setup

# Package meta-data.
NAME = 'regression_model'
DESCRIPTION = 'Train and deploy regression model.'
URL = 'your github project'
EMAIL = ''
AUTHOR = 'Your name'

# What packages are required for this module to be executed?
def list_reqs(fname='requirements.txt'):
    with open(fname) as fd:

# The rest you shouldn't have to touch too much :)
# ------------------------------------------------
# Except, perhaps the License and Trove Classifiers!
# If you do change the License, remember to change the
# Trove Classifier for that!

here = os.path.abspath(os.path.dirname(__file__))

# Import the README and use it as the long-description.
# Note: this will only work if '' is present in your file!
    with, ''), encoding='utf-8') as f:
        long_description = '\n' +
except FileNotFoundError:
    long_description = DESCRIPTION

# Load the package's module as a dictionary.
ROOT_DIR = Path(__file__).resolve().parent
about = {}
with open(PACKAGE_DIR / 'VERSION') as f:
    _version =
    about['__version__'] = _version

# Where the magic happens:
    package_data={'regression_model': ['VERSION']},
        # Trove classifiers
        # Full list:
        'License :: OSI Approved :: MIT License',
        'Programming Language :: Python',
        'Programming Language :: Python :: 3',
        'Programming Language :: Python :: 3.6',
        'Programming Language :: Python :: Implementation :: CPython',
        'Programming Language :: Python :: Implementation :: PyPy'


This is another requirements.txt file inside the package. This needs to be provided. There are two additional packages that needs to be installed for packaging, so make sure you run 

$pip install -r packages/regression_model/regression_model/requirements.txt
# production requirements

# packaging

# testing requirements

Run: Command for building source distribution (sdist) and wheel distribution (bdist_wheel)

$ python packages/regression_model/ sdist bdist_wheel


If all goes well, you’ll have the following new files in your directory .


This will depend on your OS. This is built on MacOS 10.15

Your package is now ready to be installed and used - just like a normal Python package

Install Package

$ pip install -e packages/regression_model/

Use Package


The next post will cover some of the best practices (I know, there are a lot of them) - versioning & logging, and how to host this package on the web from where anyone can install this. Future posts will cover the deployment as an API - on Heroku and AWS


  • Udemy Course on Deployment of ML Models Course by Soledad Galli & Christopher Samiullah - If you really want to go deep with proper software writing, logging, CI/CD, Flask and deployment on multiple platforms - You should do this course
Written on January 29, 2020