Skip to main content

Building and Running your Custom R Containers on Bacalhau

Open In Colab Open In Binder

Introduction

This example will walk you through building Time Series Forecasting using Prophet.

Prophet is a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts.

TL;DR

bacalhau docker run -v QmY8BAftd48wWRYDf5XnZGkhwqgjpzjyUG3hN1se6SYaFt:/example_wp_log_R.csv ghcr.io/bacalhau-project/examples/r-prophet:0.0.2 -- Rscript Saturating-Forecasts.R "/example_wp_log_R.csv" "/outputs/output0.pdf" "/outputs/output1.pdf"

Prerequisites

1. Running Prophet in R Locally

Open R studio or R supported IDE. If you want to run this on a notebook server, then make sure you use an R kernel.

Prophet is a CRAN package so you can use install.packages to install the prophet package.

%%bash
R -e "install.packages('prophet',dependencies=TRUE, repos='http://cran.rstudio.com/')"

After installation is finished, you can download the example data that is stored in IPFS.

%%bash
wget https://w3s.link/ipfs/QmZiwZz7fXAvQANKYnt7ya838VPpj4agJt5EDvRYp3Deeo/example_wp_log_R.csv

The code below instantiates the library and fits a model to the data.

%%bash
mkdir -p outputs
mkdir -p R
%%writefile Saturating-Forecasts.R
library('prophet')

args = commandArgs(trailingOnly=TRUE)
args

input = args[1]
output = args[2]
output1 = args[3]


I <- paste("", input, sep ="")

O <- paste("", output, sep ="")

O1 <- paste("", output1 ,sep ="")


df <- read.csv(I)

df$cap <- 8.5
m <- prophet(df, growth = 'logistic')

future <- make_future_dataframe(m, periods = 1826)
future$cap <- 8.5
fcst <- predict(m, future)
pdf(O)
plot(m, fcst)
dev.off()

df$y <- 10 - df$y
df$cap <- 6
df$floor <- 1.5
future$cap <- 6
future$floor <- 1.5
m <- prophet(df, growth = 'logistic')
fcst <- predict(m, future)
pdf(O1)
plot(m, fcst)
dev.off()
Writing Saturating-Forecasts.R
%%bash
Rscript Saturating-Forecasts.R "example_wp_log_R.csv" "outputs/output0.pdf" "outputs/output1.pdf"

2. Running R Prophet on Bacalhau

To use Bacalhau, you need to package your code in an appropriate format. The developers have already pushed a container for you to use, but if you want to build your own, you can follow the steps below. You can view a dedicated container example in the documentation.

Dockerfile

In this step, you will create a Dockerfile to create an image. The Dockerfile is a text document that contains the commands used to assemble the image. First, create the Dockerfile.

FROM r-base
RUN R -e "install.packages('prophet',dependencies=TRUE, repos='http://cran.rstudio.com/')"
RUN mkdir /R
RUN mkdir /outputs
COPY Saturating-Forecasts.R R
WORKDIR /R

Next, add your desired configuration to the Dockerfile. These commands specify how the image will be built, and what extra requirements will be included. We use r-base as the base image, and then install the prophet package. We then copy the R script into the container and set the working directory to the R folder.

We've already pushed this image to GHCR, but for posterity, you'd use a command like this to update it:

docker buildx build --platform linux/amd64 --push -t ghcr.io/bacalhau-project/examples/r-prophet:0.0.1 .

After you have built the container successfully, the next step is to test it locally and then push it docker hub

Fitting a Prophet Model on Bacalhau

The following command passes a prompt to the model and generates the results in the outputs directory. It takes approximately 2 minutes to run.

%%bash --out job_id
bacalhau docker run \
--wait \
--id-only \
-v QmY8BAftd48wWRYDf5XnZGkhwqgjpzjyUG3hN1se6SYaFt:/example_wp_log_R.csv \
ghcr.io/bacalhau-project/examples/r-prophet:0.0.2 \
-- Rscript Saturating-Forecasts.R "/example_wp_log_R.csv" "/outputs/output0.pdf" "/outputs/output1.pdf"

Running the commands will output a UUID that represents the job that was created. You can check the status of the job with the following command:

%%bash
bacalhau list --id-filter ${JOB_ID}
 CREATED   ID        JOB                      STATE      VERIFIED  PUBLISHED               
 15:10:22  0316d0c2  Docker jsace/r-proph...  Completed   /ipfs/QmYwR3uaSnhLpE... 

Where it says Completed, that means the job is done, and we can get the results.

To find out more information about your job, run the following command:

%%bash
bacalhau describe ${JOB_ID}
APIVersion: V1alpha1
ClientID: 77cf46c04f88ffb1c3e0e4b6e443724e8d2d87074d088ef1a6294a448fa85d2e
CreatedAt: "2022-11-11T15:10:22.177011613Z"
Deal:
Concurrency: 1
ExecutionPlan:
ShardsTotal: 1
ID: 0316d0c2-162d-4c57-9c10-391c908f981d
JobState:
Nodes:
QmYgxZiySj3MRkwLSL4X2MF5F9f2PMhAE3LV49XkfNL1o3:
Shards:
"0":
NodeId: QmYgxZiySj3MRkwLSL4X2MF5F9f2PMhAE3LV49XkfNL1o3
PublishedResults: {}
State: Cancelled
VerificationResult: {}
QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL:
Shards:
"0":
NodeId: QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL
PublishedResults:
CID: QmYwR3uaSnhLpEZYDdUGXQMVCuCmsd8Rc4LHsuHL6pSUz3
Name: job-0316d0c2-162d-4c57-9c10-391c908f981d-shard-0-host-QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL
StorageSource: IPFS
RunOutput:
exitCode: 0
runnerError: ""
stderr: |-
Loading required package: Rcpp
Loading required package: rlang
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
stderrtruncated: false
stdout: "[1] \"example_wp_log_R.csv\" \"outputs/output0.pdf\" \"outputs/output1.pdf\"
\nnull device \n 1 \nnull device \n 1"
stdouttruncated: false
State: Completed
Status: 'Got results proposal of length: 0'
VerificationResult:
Complete: true
Result: true
RequesterNodeID: QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL
RequesterPublicKey: CAASpgIwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDVRKPgCfY2fgfrkHkFjeWcqno+MDpmp8DgVaY672BqJl/dZFNU9lBg2P8Znh8OTtHPPBUBk566vU3KchjW7m3uK4OudXrYEfSfEPnCGmL6GuLiZjLf+eXGEez7qPaoYqo06gD8ROdD8VVse27E96LlrpD1xKshHhqQTxKoq1y6Rx4DpbkSt966BumovWJ70w+Nt9ZkPPydRCxVnyWS1khECFQxp5Ep3NbbKtxHNX5HeULzXN5q0EQO39UN6iBhiI34eZkH7PoAm3Vk5xns//FjTAvQw6wZUu8LwvZTaihs+upx2zZysq6CEBKoeNZqed9+Tf+qHow0P5pxmiu+or+DAgMBAAE=
Spec:
Docker:
Entrypoint:
- Rscript
- Saturating-Forecasts.R
- example_wp_log_R.csv
- outputs/output0.pdf
- outputs/output1.pdf
Image: jsace/r-prophet
Engine: Docker
Language:
JobContext: {}
Publisher: Estuary
Resources:
GPU: ""
Sharding:
BatchSize: 1
GlobPatternBasePath: /inputs
Verifier: Noop
Wasm: {}
inputs:
- CID: QmY8BAftd48wWRYDf5XnZGkhwqgjpzjyUG3hN1se6SYaFt
StorageSource: IPFS
path: /example_wp_log_R.csv
outputs:
- Name: outputs
StorageSource: IPFS
path: /outputs

If you see that the job has completed and there are no errors, then you can download the results with the following command:

%%bash
rm -rf results && mkdir -p results
bacalhau get $JOB_ID --output-dir results
Fetching results of job '0316d0c2-162d-4c57-9c10-391c908f981d'...
Results for job '0316d0c2-162d-4c57-9c10-391c908f981d' have been written to...
results

After the download has finished you should see the following contents in results directory

%%bash
ls results/combined_results/outputs
output0.pdf
output1.pdf

You can't natively display PDFs in notebooks, so here are some static images of the PDFS:

  • output0.pdf

  • output1.pdf