Building and Running Custom Python Container
Introduction
In this tutorial example, we will walk you through building your own Python container and running the container on Bacalhau.
Prerequisites
To get started, you need to install the Bacalhau client, see more information here
1. Sample Recommendation Dataset
We will be using a simple recommendation script that, when given a movie ID, recommends other movies based on user ratings. Assuming you want recommendations for the movie 'Toy Story' (1995), it will suggest movies from similar categories:
Downloading the dataset
Download Movielens1M dataset from this link https://files.grouplens.org/datasets/movielens/ml-1m.zip
In this example, we’ll be using 2 files from the MovieLens 1M dataset: ratings.dat
and movies.dat
. After the dataset is downloaded, extract the zip and place ratings.dat
and movies.dat
into a folder called input
:
The structure of the input directory should be
Installing Dependencies
To create a requirements.txt
for the Python libraries we’ll be using, create:
To install the dependencies, run:
Writing the Script
Create a new file called similar-movies.py
and in it paste the following script
What the similar-movies.py
script does
Read the files with pandas. The code uses Pandas to read data from the files
ratings.dat
andmovies.dat
.Create the ratings matrix of shape (m×u) with rows as movies and columns as user
Normalise matrix (subtract mean off). The ratings matrix is normalized by subtracting the mean off.
Compute SVD: a singular value decomposition (SVD) of the normalized ratings matrix is performed.
Calculate cosine similarity, sort by most similar, and return the top N.
Select k principal components to represent the movies, a
movie_id
to find recommendations, and print thetop_n
results.
For further reading on how the script works, go to Simple Movie Recommender Using SVD | Alyssa
Running the Script
Running the script similar-movies.py
using the default values:
You can also use other flags to set your own values.
2. Setting Up Docker
We will create a Dockerfile
and add the desired configuration to the file. These commands specify how the image will be built, and what extra requirements will be included.
We will use the python:3.8
docker image and add our script similar-movies.py
to copy the script to the docker image, similarly, we also add the dataset
directory and also the requirements
, after that run the command to install the dependencies in the image
The final folder structure will look like this:
See more information on how to containerize your script/app here
Build the container
We will run docker build
command to build the container:
Before running the command replace:
hub-user
with your docker hub username, If you don’t have a docker hub account follow these instructions to create a docker account, and use the username of the account you created
repo-name
with the name of the container, you can name it anything you want
tag
this is not required, but you can use the latest
tag
In our case:
Push the container
Next, upload the image to the registry. This can be done by using the Docker hub username
, repo name
or tag
.
In our case:
3. Running a Bacalhau Job
After the repo image has been pushed to Docker Hub, we can now use the container for running on Bacalhau. You can submit a Bacalhau job by running your container on Bacalhau with default or custom parameters.
Running the Container with Default Parameters
To submit a Bacalhau job by running your container on Bacalhau with default parameters, run the following Bacalhau command:
Structure of the command
bacalhau docker run
: call to Bacalhaujsace/python-similar-movies
: the name and of the docker image we are using-- python similar-movies.py
: execute the Python script
When a job is submitted, Bacalhau prints out the related job_id
. We store that in an environment variable so that we can reuse it later on.
Running the Container with Custom Parameters
To submit a Bacalhau job by running your container on Bacalhau with custom parameters, run the following Bacalhau command:
Structure of the command
bacalhau docker run
: call to Bacalhaujsace/python-similar-movies
: the name of the docker image we are using-- python similar-movies.py --k 50 --id 10 --n 10
: execute the python script. The script will use Singular Value Decomposition (SVD) and cosine similarity to find 10 movies most similar to the one with identifier 10, using 50 principal components.
4. Checking the State of your Jobs
Job status: You can check the status of the job using bacalhau list
.
When it says Published
or Completed
, that means the job is done, and we can get the results.
Job information: You can find out more information about your job by using bacalhau describe
.
Job download: You can download your job results directly by using bacalhau get
. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (results
) and downloaded our job output to be stored in that directory.
5. Viewing your Job Output
To view the file, run the following command:
Support
If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel).