Building and Running your Custom R Containers on Bacalhau
Last updated
Was this helpful?
Last updated
Was this helpful?
This example will walk you through building Time Series Forecasting using Prophet. Prophet is a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts.
To get started, you need to install the Bacalhau client, see more information here
Open R studio or R-supported IDE. If you want to run this on a notebook server, then make sure you use an R kernel. Prophet is a CRAN package, so you can use install.packages
to install the prophet
package:
After installation is finished, you can download the example data that is stored in IPFS:
The code below instantiates the library and fits a model to the data.
Create a new file called Saturating-Forecasts.R
and in it paste the following script:
This script performs time series forecasting using the Prophet library in R, taking input data from a CSV file, applying the forecasting model, and generating plots for analysis.
Let's have a look at the command below:
This command uses Rscript to execute the script that was created and written to the Saturating-Forecasts.R
file.
The input parameters provided in this case are the names of input and output files:
example_wp_log_R.csv
- the example data that was previously downloaded.
outputs/output0.pdf
- the name of the file to save the first forecast plot.
outputs/output1.pdf
- the name of the file to save the second forecast plot.
To use Bacalhau, you need to package your code in an appropriate format. The developers have already pushed a container for you to use, but if you want to build your own, you can follow the steps below. You can view a dedicated container example in the documentation.
To build your own docker container, create a Dockerfile
, which contains instructions to build your image.
These commands specify how the image will be built, and what extra requirements will be included. We use r-base
as the base image and then install the prophet
package. We then copy the Saturating-Forecasts.R
script into the container and set the working directory to the R
folder.
We will run docker build
command to build the container:
Before running the command replace:
hub-user
with your docker hub username. If you don’t have a docker hub account follow these instructions to create docker account, and use the username of the account you created
repo-name
with the name of the container, you can name it anything you want
tag
this is not required but you can use the latest
tag
In our case:
Next, upload the image to the registry. This can be done by using the Docker hub username, repo name, or tag.
In our case:
The following command passes a prompt to the model and generates the results in the outputs directory. It takes approximately 2 minutes to run.
bacalhau docker run
: call to Bacalhau
-i ipfs://QmY8BAftd48wWRYDf5XnZGkhwqgjpzjyUG3hN1se6SYaFt:/example_wp_log_R.csv
: Mounting the uploaded dataset at /inputs
in the execution. It takes two arguments, the first is the IPFS CID (QmY8BAftd48wWRYDf5XnZGkhwqgjpzjyUG3hN1se6SYaFtz
) and the second is file path within IPFS (/example_wp_log_R.csv
)
ghcr.io/bacalhau-project/examples/r-prophet:0.0.2
: the name and the tag of the docker image we are using
/example_wp_log_R.csv
: path to the input dataset
/outputs/output0.pdf
, /outputs/output1.pdf
: paths to the output
Rscript Saturating-Forecasts.R
: execute the R script
When a job is submitted, Bacalhau prints out the related job_id
. We store that in an environment variable so that we can reuse it later on:
Job status: You can check the status of the job using bacalhau job list
.
When it says Published
or Completed
, that means the job is done, and we can get the results.
Job information: You can find out more information about your job by using bacalhau job describe
.
Job download: You can download your job results directly by using bacalhau job get
. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (results
) and downloaded our job output to be stored in that directory.
To view the file, run the following command:
You can't natively display PDFs in notebooks, so here are some static images of the PDFs:
output0.pdf
output1.pdf
If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel).