Deploying Multiple Machine Learning Models with Flask and Heroku

Last month, I gave a talk on Recommendation Systems at the Milwaukee Machine Learning Meetup. Using TensorFlow, Keras, and Theano I was able to train and run my model on my local machine, but I wanted to deploy it publicly. I decided to build a web interface into my model, but I hadn’t ever run a model anywhere but my development machine. I happen to use Heroku at work, so I started there. Through some trial and error, a workable solution now exists at movies.mitchellhenke.com.

The models were about 25MB in size and somewhat complex. I had concerns about the resources required to deploy the models and being able to run them at a reasonable speed. Those concerns were unfounded as Heroku was able to handle the models. It was a learning experience starting with trained models and bringing them to the web, so I wanted to write it up!

Web Stuff

Flask is a straightforward Python web framework, which was a great complement to the Python models. The server itself is less than 100 lines of code (github). The server doesn’t store anything or maintain any state. It is simply an interface to the machine learning models. It has a single route, in which it transforms input as necessary to generate predictions. The predictions are then returned in the same HTTP call. Despite having to generate two predictions, it is able to return results within a second.

The single route takes these few lines:

@app.route('/predict', methods=['POST'])
def predict():
    json = request.json
    prediction = ml_predict(json)
    next_recs = rnn_predict(json)
    return jsonify({'predictions': prediction, 'next_recs': next_recs}), 200

There isn’t much to say about the frontend in relation to the deployment other than it works. The frontend is a simple bootstrap + React interface that shows the recommendations of two models side by side. The code for it is here.

Heroku

Heroku’s Python environment understands pip and requirements.txt for dependency management. In total, the dependencies for my server were:

# requirements.txt
gunicorn
Flask
Flask-Compress
scikit-learn
keras
theano==0.7.0
numpy
tensorflow
lasagne
h5py

A Procfile defines which processes should run. This server only needs one process for the web server:

web: gunicorn app:app

This will run the Flask server in app.py behind Gunicorn.

Assuming heroku is installed and authenticated, I can create the Heroku application, and push the app to it:

heroku create mh-movie-recommender
git push heroku master

Creating the application adds a git remote named heroku to the current repository. Heroku will install the dependencies, and run the Procfile command to start the server. The server is now running, and able to receive requests! It currently runs on Heroku’s free tier, which means it will idle after 30 minutes, and can run for 550 hours per month.

Wrapping Up

The website itself is live at movies.mitchellhenke.com, though it may take a while to boot. I was concerned that Heroku wouldn’t be able to execute the models, but it turns out that it can! Heroku is limited to CPU instances only, but for this case, even a free tier server was fine!

If this is neat and you’re in the Milwaukee area, you should come check out the Milwaukee Machine Learning Meetup! If you are looking for help applying machine learning to your business, feel free to contact me.