1 of 10

Workload Onboarding

This directory contains examples relating to performing common tasks with Bacalhau.

Container

Docker Workload Onboarding

How to use docker containers with Bacalhau

Docker Workloads

Bacalhau executes jobs by running them within containers. Bacalhau employs a syntax closely resembling Docker, allowing you to utilize the same containers. The key distinction lies in how input and output data are transmitted to the container via IPFS, enabling scalability on a global level.

This section describes how to migrate a workload based on a Docker container into a format that will work with the Bacalhau client.

You can check out this example tutorial on to see how we used all these steps together.

Requirements

Here are few things to note before getting started:

Container Registry: Ensure that the container is published to a public container registry that is accessible from the Bacalhau network.
Architecture Compatibility: Bacalhau supports only images that match the host node's architecture. Typically, most nodes run on linux/amd64, so containers in arm64 format are not able to run.
Input Flags: The --input ipfs://... flag supports only directories and does not support CID subpaths. The --input https://... flag supports only single files and does not support URL directories. The --input s3://... flag supports S3 keys and prefixes. For example, s3://bucket/logs-2023-04* includes all logs for April 2023.

You can check to see a used by the Bacalhau team

Note: Only about a third of examples have their containers here. The rest are under random docker hub registries.

Runtime Restrictions

To help provide a safe, secure network for all users, we add the following runtime restrictions:

Limited Ingress/Egress Networking:

All ingress/egress networking is limited as described in the documentation. You won't be able to pull data/code/weights/ etc. from an external source.

Data Passing with Docker Volumes:

A job includes the concept of input and output volumes, and the Docker executor implements support for these. This means you can specify your CIDs, URLs, and/or S3 objects as input paths and also write results to an output volume. This can be seen in the following example:

bacalhau docker run \
  -i s3://mybucket/logs-2023-04*:/input \
  -o apples:/output_folder \
  ubuntu \
  bash -c 'ls /input > /output_folder/file.txt'

The above example demonstrates an input volume flag -i s3://mybucket/logs-2023-04*, which mounts all S3 objects in bucket mybucket with logs-2023-04 prefix within the docker container at location /input (root).

Output volumes are mounted to the Docker container at the location specified. In the example above, any content written to /output_folder will be made available within the apples folder in the job results CID.

Once the job has run on the executor, the contents of stdout and stderr will be added to any named output volumes the job has used (in this case apples), and all those entities will be packaged into the results folder which is then published to a remote location by the publisher.

Onboarding Your Workload

Step 1 - Read Data From Your Directory

If you need to pass data into your container you will do this through a Docker volume. You'll need to modify your code to read from a local directory.

We make the assumption that you are reading from a directory called /inputs, which is set as the default.

Step 2 - Write Data to the Your Directory

If you need to return data from your container you will do this through a Docker volume. You'll need to modify your code to write to a local directory.

We make the assumption that you are writing to a directory called /outputs, which is set as the default.

Step 3 - Build and Push Your Image To a Registry

For example:

$ export IMAGE=myuser/myimage:latest
$ docker build -t ${IMAGE} .
$ docker image push ${IMAGE}

Step 4 - Test Your Container

To test your docker image locally, you'll need to execute the following command, changing the environment variables as necessary:

$ export LOCAL_INPUT_DIR=$PWD
$ export LOCAL_OUTPUT_DIR=$PWD
$ export CMD=(sh -c 'ls /inputs; echo do something useful > /outputs/stdout')
$ docker run --rm \
  -v ${LOCAL_INPUT_DIR}:/inputs  \
  -v ${LOCAL_OUTPUT_DIR}:/outputs \
  ${IMAGE} \
  ${CMD}

Let's see what each command will be used for:

$ export LOCAL_INPUT_DIR=$PWD
Exports the current working directory of the host system to the LOCAL_INPUT_DIR variable. This variable will be used for binding a volume and transferring data into the container.

$ export LOCAL_OUTPUT_DIR=$PWD
Exports the current working directory of the host system to the LOCAL_OUTPUT_DIR variable. Similarly, this variable will be used for binding a volume and transferring data from the container.

$ export CMD=(sh -c 'ls /inputs; echo do something useful > /outputs/stdout')
Creates an array of commands CMD that will be executed inside the container. In this case, it is a simple command executing 'ls' in the /inputs directory and writing text to the /outputs/stdout file.

$ docker run ... ${IMAGE} ${CMD}
Launches a Docker container using the specified variables and commands. It binds volumes to facilitate data exchange between the host and the container.

For example:

$ export LOCAL_INPUT_DIR=$PWD
$ export LOCAL_OUTPUT_DIR=$PWD
$ export CMD=(sh -c 'ls /inputs; echo "do something useful" > /outputs/stdout')
$ export IMAGE=ubuntu
$ docker run --rm \
  -v ${LOCAL_INPUT_DIR}:/inputs  \
  -v ${LOCAL_OUTPUT_DIR}:/outputs \
  ${IMAGE} \
  ${CMD}
$ cat stdout

The result of the commands' execution is shown below:

do something useful

Step 5 - Run the Workload on Bacalhau

To launch your workload in a Docker container, using the specified image and working with input data specified via IPFS CID, run the following command:

$ bacalhau docker run --input ipfs://${CID} ${IMAGE} ${CMD}

To check the status of your job, run the following command:

$ bacalhau job list --id-filter JOB_ID

To get more information on your job,run:

$ bacalhau job describe JOB_ID

To download your job, run:

$ bacalhau job get JOB_ID

For example, running:

JOB_ID=$(bacalhau docker run ubuntu echo hello | grep 'Job ID:' | sed 's/.*Job ID: \([^ ]*\).*/\1/')
echo "The job ID is: $JOB_ID"
bacalhau job list --id-filter $JOB_ID
sleep 5

bacalhau job list --id-filter $JOB_ID
bacalhau get $JOB_ID

ls shards

outputs:

CREATED   ID        JOB                      STATE      VERIFIED  PUBLISHED
 10:26:00  24440f0d  Docker ubuntu echo h...  Verifying
 CREATED   ID        JOB                      STATE      VERIFIED  PUBLISHED
 10:26:00  24440f0d  Docker ubuntu echo h...  Published            /ipfs/bafybeiflj3kha...
11:26:09.107 | INF bacalhau/get.go:67 > Fetching results of job '24440f0d-3c06-46af-9adf-cb524aa43961'...
11:26:10.528 | INF ipfs/downloader.go:115 > Found 1 result shards, downloading to temporary folder.
11:26:13.144 | INF ipfs/downloader.go:195 > Combining shard from output volume 'outputs' to final location: '/Users/phil/source/filecoin-project/docs.bacalhau.org'
job-24440f0d-3c06-46af-9adf-cb524aa43961-shard-0-host-QmYgxZiySj3MRkwLSL4X2MF5F9f2PMhAE3LV49XkfNL1o3

The --input flag does not support CID subpaths for ipfs:// content.

Alternatively, you can run your workload with a publicly accessible http(s) URL, which will download the data temporarily into your public storage:

$ export URL=https://download.geofabrik.de/antarctica-latest.osm.pbf
$ bacalhau docker run --input ${URL} ${IMAGE} ${CMD}

$ bacalhau job list

$ bacalhau job get JOB_ID

The --input flag does not support URL directories.

Troubleshooting

If you run into this compute error while running your docker image

Creating job for submission ... done ✅
Finding node(s) for the job ... done ✅
Node accepted the job ... done ✅
Error while executing the job.

This can often be resolved by re-tagging your docker image

Support

Bacalhau Docker Image

How to use Bacalhau Docker Image for task management

This documentation explains how to use the Bacalhau Docker image for task management with Bacalhau client.

Prerequisites

To get started, you need to install the Bacalhau client (see more information ) and Docker.

1. Pull the Bacalhau Docker image

The first step is to pull the Bacalhau Docker image from the .

docker pull ghcr.io/bacalhau-project/bacalhau:latest

Expected output:

latest: Pulling from bacalhau-project/bacalhau
d14ccdd25413: Pull complete
621f190d05c8: Pull complete
Digest: sha256:3cda5619984de9b56c738c50f94188684170f54f7e417f8dcbe74ff8ec8eb434
Status: Downloaded newer image for ghcr.io/bacalhau-project/bacalhau:latest
ghcr.io/bacalhau-project/bacalhau:latest

You can also pull a specific version of the image, e.g.:

docker pull ghcr.io/bacalhau-project/bacalhau:v1.6.0

1. Check the version of Bacalhau client

docker run -t ghcr.io/bacalhau-project/bacalhau:latest version

The output is similar to:

12:00:32.427 | INF pkg/repo/fs.go:93 > Initializing repo at '/root/.bacalhau' for environment 'production'
CLIENT  SERVER  UPDATE MESSAGE 
v1.3.0  v1.4.0

2. Run a Bacalhau Job

For example to run an Ubuntu-based job that prints the message 'Hello from Docker Bacalhau':

bacalhau docker run \
        --id-only \
        --wait \
        ubuntu:latest \
        -- sh -c 'uname -a && echo "Hello from Docker Bacalhau!"'

Structure of the command

--id-only: Output only the job id
--wait: Wait for the job to finish
ubuntu:latest. Ubuntu container
--: Separate Bacalhau parameters from the command to be executed inside the container
sh -c 'uname -a && echo "Hello from Docker Bacalhau!"': The command executed inside the container

The command execution in the terminal is similar to:

j-6ffd54b8-e992-498f-9ee9-766ab09d5daa

j-6ffd54b8-e992-498f-9ee9-766ab09d5daa is a job ID, which represents the result of executing a command inside a Docker container. It can be used to obtain additional information about the executed job or to access the job's results. We store that in an environment variable so that we can reuse it later on (env: JOB_ID=j-6ffd54b8-e992-498f-9ee9-766ab09d5daa)

To print the content of the Job ID, execute the following command:

bacalhau job describe j-6ffd54b8-e992-498f-9ee9-766ab09d5daa

The output is similar to:

ID            = j-6ffd54b8-e992-498f-9ee9-766ab09d5daa
Name          = j-6ffd54b8-e992-498f-9ee9-766ab09d5daa
Namespace     = default
Type          = batch
State         = Completed
Count         = 1
Created Time  = 2024-09-08 14:33:19
Modified Time = 2024-09-08 14:33:20
Version       = 0

Summary
Completed = 1

Job History
 TIME                 REV.  STATE      TOPIC       EVENT         
 2024-09-08 14:33:19  1     Pending    Submission  Job submitted 
 2024-09-08 14:33:19  2     Running                              
 2024-09-08 14:33:20  3     Completed                            

Executions
 ID          NODE ID     STATE      DESIRED  REV.  CREATED     MODIFIED    COMMENT      
 e-bd5746b8  n-e002001e  Completed  Stopped  6     27m21s ago  27m21s ago  Accepted job 

Execution e-bd5746b8 History
 TIME                 REV.  STATE              TOPIC            EVENT        
 2024-09-08 14:33:19  1     New                                              
 2024-09-08 14:33:19  2     AskForBid                                        
 2024-09-08 14:33:19  3     AskForBidAccepted  Requesting Node  Accepted job 
 2024-09-08 14:33:19  4     AskForBidAccepted                                
 2024-09-08 14:33:19  5     BidAccepted                                      
 2024-09-08 14:33:20  6     Completed                                        

Standard Output
Linux 7d5c3dcc7fc2 6.5.0-1024-gcp #26~22.04.1-Ubuntu SMP Fri Jun 14 18:48:45 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Hello from Docker Bacalhau!

3. Submit a Job With Output Files

You always need to mount directories into the container to access files. This is because the container is running in a separate environment from your host machine.

The first part of this example should look familiar, except for the Docker commands.

bacalhau docker run \                                   
        --id-only \
        --wait \
        --gpu 1 \
        ghcr.io/bacalhau-project/examples/stable-diffusion-gpu:0.0.1 -- \
            python main.py --o ./outputs --p "A Docker whale and a cod having a conversation about the state of the ocean"

When a job is submitted, Bacalhau prints the related job_id (j-da29a804-3960-4667-b6e5-73f05e120117):

j-da29a804-3960-4667-b6e5-73f05e120117

4. Check the State of your Jobs

Job status: You can check the status of the job using bacalhau job list.

bacalhau job list

When it reads Completed, that means the job is done, and you can get the results.

Job information: You can find out more information about your job by using bacalhau job describe.

bacalhau job describe j-da29a804-3960-4667-b6e5-73f05e120117

Job download: You can download your job results directly by using bacalhau job get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in the result directory.

bacalhau job get ${JOB_ID} --output-dir result

After the download is complete, you should see the following contents in the results directory.

Support

How To Work With Custom Containers in Bacalhau

Bacalhau operates by executing jobs within containers. This example shows you how to build and use a custom docker container.

Prerequisite

To get started, you need to install the Bacalhau client, see more information
This example requires Docker. If you don't have Docker installed, you can install it from . Docker commands will not work on hosted notebooks like Google Colab, but the Bacalhau commands will.

1. Running Containers

Docker Command

You're likely familiar with executing Docker commands to start a container:

docker run docker/whalesay cowsay sup old fashioned container run

This command runs a container from the docker/whalesay image. The container executes the cowsay sup old fashioned container run command:

_________________________________
< sup old fashioned container run >
 ---------------------------------
    \
     \
      \
                    ##        .
              ## ## ##       ==
           ## ## ## ##      ===
       /""""""""""""""""___/ ===
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
       \______ o          __/
        \    \        __/
          \____\______/

Bacalhau Command

export JOB_ID=$(bacalhau docker run \
    --wait \
    --id-only \ 
    docker/whalesay -- bash -c 'cowsay hello web3 uber-run')

This command also runs a container from the docker/whalesay image, using Bacalhau. We use the bacalhau docker run command to start a job in a Docker container. It contains additional flags such as --wait to wait for job completion and --id-only to return only the job identifier. Inside the container, the bash -c 'cowsay hello web3 uber-run' command is executed.

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

j-7e41b9b9-a9e2-4866-9fce-17020d8ec9e0

You can download your job results directly by using bacalhau job get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (results) and downloaded our job output to be stored in that directory.

rm -rf results && mkdir -p results
bacalhau job get \
--output-dir results \
${JOB_ID}

Viewing your job output

cat ./results/stdout

 _____________________
< hello web3 uber-run >
 ---------------------
    \
     \
      \
                    ##        .
              ## ## ##       ==
           ## ## ## ##      ===
       /""""""""""""""""___/ ===
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
       \______ o          __/
        \    \        __/
          \____\______/

Both commands execute cowsay in the docker/whalesay container, but Bacalhau provides additional features for working with jobs at scale.

Bacalhau Syntax

Bacalhau uses a syntax that is similar to Docker, and you can use the same containers. The main difference is that input and output data is passed to the container via IPFS, to enable planetary scale. In the example above, it doesn't make too much difference except that we need to download the stdout.

The --wait flag tells Bacalhau to wait for the job to finish before returning. This is useful in interactive sessions like this, but you would normally allow jobs to complete in the background and use the bacalhau job list command to check on their status.

Another difference is that by default Bacalhau overwrites the default entry point for the container, so you have to pass all shell commands as arguments to the run command after the -- flag.

2. Building Your Own Custom Container For Bacalhau

To use your own custom container, you must publish the container to a container registry that is accessible from the Bacalhau network. At this time, only public container registries are supported.

To demonstrate this, you will develop and build a simple custom container that comes from an old Docker example. I remember seeing cowsay at a Docker conference about a decade ago. I think it's about time we brought it back to life and distribute it across the Bacalhau network.

# write to the cod.cow
$the_cow = <<"EOC";
   $thoughts
    $thoughts
                               ,,,,_
                            ┌Φ▓╬▓╬▓▓▓W      @▓▓▒,
                           ╠▓╬▓╬╣╬╬▓╬▓▓   ╔╣╬╬▓╬╣▓,
                    __,┌╓═╠╬╠╬╬╬Ñ╬╬╬Ñ╬╬¼,╣╬╬▓╬╬▓╬▓▓▓┐        ╔W_             ,φ▓▓
               ,«@▒╠╠╠╠╩╚╙╙╩Ü╚╚╚╚╩╙╙╚╠╩╚╚╟▓▒╠╠╫╣╬╬╫╬╣▓,   _φ╬▓╬╬▓,        ,φ╣▓▓╬╬
          _,φÆ╩╬╩╙╚╩░╙╙░░╩`=░╙╚»»╦░=╓╙Ü1R░│░╚Ü░╙╙╚╠╠╠╣╣╬≡Φ╬▀╬╣╬╬▓▓▓_   ╓▄▓▓▓▓▓▓╬▌
      _,φ╬Ñ╩▌▐█[▒░░░░R░░▀░`,_`!R`````╙`-'╚Ü░░Ü░░░░░░░│││░╚╚╙╚╩╩╩╣Ñ╩╠▒▒╩╩▀▓▓╣▓▓╬╠▌
     '╚╩Ü╙│░░╙Ö▒Ü░░░H░░R ▒¥╣╣@@@▓▓▓  := '`   `░``````````````````````````]▓▓▓╬╬╠H
       '¬═▄ `\░╙Ü░╠DjK` Å»»╙╣▓▓▓▓╬Ñ     -»`       -`      `  ,;╓▄╔╗∞  ~▓▓▓▀▓▓╬╬╬▌
             '^^^`   _╒Γ   `╙▀▓▓╨                     _, ⁿD╣▓╬╣▓╬▓╜      ╙╬▓▓╬╬▓▓
                 ```└                           _╓▄@▓▓▓╜   `╝╬▓▓╙           ²╣╬▓▓
                        %φ▄╓_             ~#▓╠▓▒╬▓╬▓▓^        `                ╙╙
                         `╣▓▓▓              ╠╬▓╬▓╬▀`
                           ╚▓▌               '╨▀╜
EOC

Next, the Dockerfile adds the script and sets the entry point.

# write the Dockerfile
FROM debian:stretch
RUN apt-get update && apt-get install -y cowsay
# "cowsay" installs to /usr/games
ENV PATH $PATH:/usr/games
RUN echo '#!/bin/bash\ncowsay "${@:1}"' > /usr/bin/codsay && \
    chmod +x /usr/bin/codsay
COPY cod.cow /usr/share/cowsay/cows/default.cow

Now let's build and test the container locally.

docker build -t ghcr.io/bacalhau-project/examples/codsay:latest . 2> /dev/null

docker run --rm ghcr.io/bacalhau-project/examples/codsay:latest codsay I like swimming in data

Once your container is working as expected then you should push it to a public container registry. In this example, I'm pushing to Github's container registry, but we'll skip the step below because you probably don't have permission. Remember that the Bacalhau nodes expect your container to have a linux/amd64 architecture.

docker buildx build --platform linux/amd64,linux/arm64 --push -t ghcr.io/bacalhau-project/examples/codsay:latest .

3. Running Your Custom Container on Bacalhau

Now we're ready to submit a Bacalhau job using your custom container. This code runs a job, downloads the results, and prints the stdout.

The bacalhau docker run command strips the default entry point, so don't forget to run your entry point in the command line arguments.

export JOB_ID=$(bacalhau docker run \
    --wait \
    --id-only \
    ghcr.io/bacalhau-project/examples/codsay:v1.0.0 \
    -- bash -c 'codsay Look at all this data')

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

Download your job results directly by using bacalhau job get command.

rm -rf results && mkdir -p results
bacalhau job get ${JOB_ID}  --output-dir results

View your job output

cat ./results/stdout

_______________________
< Look at all this data >
 -----------------------
   \
    \
                               ,,,,_
                            ┌Φ▓╬▓╬▓▓▓W      @▓▓▒,
                           ╠▓╬▓╬╣╬╬▓╬▓▓   ╔╣╬╬▓╬╣▓,
                    __,┌╓═╠╬╠╬╬╬Ñ╬╬╬Ñ╬╬¼,╣╬╬▓╬╬▓╬▓▓▓┐        ╔W_             ,φ▓▓
               ,«@▒╠╠╠╠╩╚╙╙╩Ü╚╚╚╚╩╙╙╚╠╩╚╚╟▓▒╠╠╫╣╬╬╫╬╣▓,   _φ╬▓╬╬▓,        ,φ╣▓▓╬╬
          _,φÆ╩╬╩╙╚╩░╙╙░░╩`=░╙╚»»╦░=╓╙Ü1R░│░╚Ü░╙╙╚╠╠╠╣╣╬≡Φ╬▀╬╣╬╬▓▓▓_   ╓▄▓▓▓▓▓▓╬▌
      _,φ╬Ñ╩▌▐█[▒░░░░R░░▀░`,_`!R`````╙`-'╚Ü░░Ü░░░░░░░│││░╚╚╙╚╩╩╩╣Ñ╩╠▒▒╩╩▀▓▓╣▓▓╬╠▌
     '╚╩Ü╙│░░╙Ö▒Ü░░░H░░R ▒¥╣╣@@@▓▓▓  := '`   `░``````````````````````````]▓▓▓╬╬╠H
       '¬═▄ `░╙Ü░╠DjK` Å»»╙╣▓▓▓▓╬Ñ     -»`       -`      `  ,;╓▄╔╗∞  ~▓▓▓▀▓▓╬╬╬▌
             '^^^`   _╒Γ   `╙▀▓▓╨                     _, ⁿD╣▓╬╣▓╬▓╜      ╙╬▓▓╬╬▓▓
                 ```└                           _╓▄@▓▓▓╜   `╝╬▓▓╙           ²╣╬▓▓
                        %φ▄╓_             ~#▓╠▓▒╬▓╬▓▓^        `                ╙╙
                         `╣▓▓▓              ╠╬▓╬▓╬▀`
                           ╚▓▌               '╨▀╜

Support

Run CUDA programs on Bacalhau

What is CUDA

In this tutorial, we will look at how to run CUDA programs on Bacalhau. CUDA (Compute Unified Device Architecture) is an extension of C/C++ programming. It is a parallel computing platform and programming model created by NVIDIA. It helps developers speed up their applications by harnessing the power of GPU accelerators.

In addition to accelerating high-performance computing (HPC) and research applications, CUDA has also been widely adopted across consumer and industrial ecosystems. CUDA also makes it easy for developers to take advantage of all the latest GPU architecture innovations

Advantage of GPU over CPU

Architecturally, the CPU is composed of just a few cores with lots of cache memory that can handle a few software threads at a time. In contrast, a GPU is composed of hundreds of cores that can handle thousands of threads simultaneously.

Computations like matrix multiplication could be done much faster on GPU than on CPU

Prerequisite

To get started, you need to install the Bacalhau client, see more information

1. Running CUDA locally

You'll need to have the following installed:

NVIDIA GPU
CUDA drivers installed
nvcc installed

Checking if nvcc is installed:

nvcc --version

Downloading the programs:

mkdir inputs outputs
wget -P inputs https://raw.githubusercontent.com/tristanpenman/cuda-examples/master/00-hello-world.cu
wget -P inputs https://raw.githubusercontent.com/tristanpenman/cuda-examples/master/02-cuda-hello-world-faster.cu

Viewing the programs

00-hello-world.cu:

# View the contents of the standard C++ program
cat inputs/00-hello-world.cu

# Measure the time it takes to compile and run the program
nvcc -o ./outputs/hello ./inputs/00-hello-world.cu; ./outputs/hello

This example represents a standard C++ program that inefficiently utilizes GPU resources due to the use of non-parallel loops.

02-cuda-hello-world-faster.cu:

# View the contents of the CUDA program with vector addition
!cat inputs/02-cuda-hello-world-faster.cu

# Remove any previous output
rm -rf outputs/hello

# Measure the time for compilation and execution
nvcc --expt-relaxed-constexpr -o ./outputs/hello ./inputs/02-cuda-hello-world-faster.cu; ./outputs/hello

In this example we utilize Vector addition using CUDA and allocate the memory in advance and copy the memory to the GPU using cudaMemcpy so that it can utilize the HBM (High Bandwidth memory of the GPU). Compilation and execution occur faster (1.39 seconds) compared to the previous example (8.67 seconds).

2. Running a Bacalhau Job

To submit a job, run the following Bacalhau command:

export JOB_ID=$(bacalhau docker run \
    --gpu 1 \
    --timeout 3600 \
    --wait-timeout-secs 3600 \
    -i https://raw.githubusercontent.com/tristanpenman/cuda-examples/master/02-cuda-hello-world-faster.cu \
    --id-only \
    --wait \
    nvidia/cuda:11.2.2-cudnn8-devel-ubuntu18.04 \
    -- /bin/bash -c 'nvcc --expt-relaxed-constexpr  -o ./outputs/hello ./inputs/02-cuda-hello-world-faster.cu; ./outputs/hello ')

Structure of the Commands

bacalhau docker run: call to Bacalhau
-i https://raw.githubusercontent.com/tristanpenman/cuda-examples/master/02-cuda-hello-world-faster.cu: URL path of the input data volumes downloaded from a URL source.
nvidia/cuda:11.2.0-cudnn8-devel-ubuntu18.04: Docker container for executing CUDA programs (you need to choose the right CUDA docker container). The container should have the tag of "devel" in them.
nvcc --expt-relaxed-constexpr -o ./outputs/hello ./inputs/02-cuda-hello-world-faster.cu: Compilation using the nvcc compiler and save it to the outputs directory as hello
Note that there is ; between the commands: -- /bin/bash -c 'nvcc --expt-relaxed-constexpr -o ./outputs/hello ./inputs/02-cuda-hello-world-faster.cu; ./outputs/hello The ";" symbol allows executing multiple commands sequentially in a single line.
./outputs/hello: Execution hello binary: You can combine compilation and execution commands.

Note that the CUDA version will need to be compatible with the graphics card on the host machine

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on:

3. Checking the State of your Jobs

Job status: You can check the status of the job using bacalhau job list.

bacalhau job list --id-filter ${JOB_ID} --wide

When it says Published or Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau job describe.

bacalhau job describe ${JOB_ID}

Job download: You can download your job results directly by using bacalhau job get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (results) and downloaded our job output to be stored in that directory.

rm -rf results && mkdir -p results
bacalhau job get $JOB_ID --output-dir results

4. Viewing your Job Output

To view the file, run the following command:

cat results/stdout

Support

WebAssembly (Wasm) Workloads

Bacalhau supports running programs that are compiled to WebAssembly (Wasm). With the Bacalhau client, you can upload Wasm programs, retrieve data from public storage, read and write data, receive program arguments, and access environment variables.

Prerequisites and Limitations

Supported WebAssembly System Interface (WASI) Bacalhau can run compiled Wasm programs that expect the WebAssembly System Interface (WASI) Snapshot 1. Through this interface, WebAssembly programs can access data, environment variables, and program arguments.
Networking Restrictions All ingress/egress networking is disabled; you won't be able to pull data/code/weights etc. from an external source. Wasm jobs can say what data they need using URLs or CIDs (Content IDentifier) and can then access the data by reading from the filesystem.
Single-Threading There is no multi-threading as WASI does not expose any interface for it.

Onboarding Your Workload

Step 1: Replace network operations with filesystem reads and writes

If your program typically involves reading from and writing to network endpoints, follow these steps to adapt it for Bacalhau:

Replace Network Operations: Instead of making HTTP requests to external servers (e.g., example.com), modify your program to read data from the local filesystem.
Input Data Handling: Specify the input data location in Bacalhau using the --input flag when running the job. For instance, if your program used to fetch data from example.com, read from the /inputs folder locally, and provide the URL as input when executing the Bacalhau job. For example, --input http://example.com.
Output Handling: Adjust your program to output results to standard output (stdout) or standard error (stderr) pipes. Alternatively, you can write results to the filesystem, typically into an output mount. In the case of Wasm jobs, a default folder at /outputs is available, ensuring that data written there will persist after the job concludes.

By making these adjustments, you can effectively transition your program to operate within the Bacalhau environment, utilizing filesystem operations instead of traditional network interactions.

You can specify additional or different output mounts using the -o flag.

Step 2: Configure your compiler to output WASI-compliant WebAssembly

You will need to compile your program to WebAssembly that expects WASI. Check the instructions for your compiler to see how to do this.

For example, Rust users can specify the wasm32-wasi target to rustup and cargo to get programs compiled for WASI WebAssembly. See the Rust example for more information on this.

Step 3: Run your program

You can run a WebAssembly program on Bacalhau using the bacalhau wasm run command.

bacalhau wasm run

Run Locally Compiled Program:

If your program is locally compiled, specify it as an argument. For instance, running the following command will upload and execute the main.wasm program:

bacalhau wasm run main.wasm

The program you specify will be uploaded to a Bacalhau storage node and will be publicly available if you are using the public demo network.

Consider creating your own private network.

Alternative Program Specification:

You can use a Content IDentifier (CID) for a specific WebAssembly program.

bacalhau wasm run Qmajb9T3jBdMSp7xh2JruNrqg3hniCnM6EUVsBocARPJRQ

Input Data Specification:

Make sure to specify any input data using --input flag.

bacalhau wasm run --input http://example.com

This ensures the necessary data is available for the program's execution.

Program arguments

You can give the Wasm program arguments by specifying them after the program path or CID. If the Wasm program is already compiled and located in the current directory, you can run it by adding arguments after the file name:

bacalhau wasm run echo.wasm hello world

For a specific WebAssembly program, run:

bacalhau wasm run Qmajb9T3jBdMSp7xh2JruNrqg3hniCnM6EUVsBocARPJRQ hello world

Write your program to use program arguments to specify input and output paths. This makes your program more flexible in handling different configurations of input and output volumes.

For example, instead of hard-coding your program to read from /inputs/data.txt, accept a program argument that should contain the path and then specify the path as an argument to bacalhau wasm run:

bacalhau wasm run prog.wasm /inputs/data.txt

Your language of choice should contain a standard way of reading program arguments that will work with WASI.

Environment variables

You can also specify environment variables using the -e flag.

bacalhau wasm run prog.wasm -e HELLO=world

Examples

See the Rust example for a workload that leverages WebAssembly support.

Support

If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel)

Running Rust programs as WebAssembly (WASM)

Bacalhau supports running jobs as a program. This example demonstrates how to compile a project into WebAssembly and run the program on Bacalhau.

Prerequisites

To get started, you need to install the Bacalhau client, see more information .
A working Rust installation with the wasm32-wasi target. For example, you can use to install Rust and configure it to build WASM targets. For those using the notebook, these are installed in hidden cells below.

1. Develop a Rust Program Locally

We can use cargo (which will have been installed by rustup) to start a new project (my-program) and compile it:

cargo init my-program

We can then write a Rust program. Rust programs that run on Bacalhau can read and write files, access a simple clock, and make use of pseudo-random numbers. They cannot memory-map files or run code on multiple threads.

The program below will use the Rust imageproc crate to resize an image through seam carving, based on .

// ./my-program/src/main.rs
use image::{open, GrayImage, Luma, Pixel};
use imageproc::definitions::Clamp;
use imageproc::gradients::sobel_gradient_map;
use imageproc::map::map_colors;
use imageproc::seam_carving::*;
use std::path::Path;

fn main() {
    let input_path = "inputs/image0.JPG";
    let output_dir = "outputs/";

    let input_path = Path::new(&input_path);
    let output_dir = Path::new(&output_dir);

    // Load image and convert to grayscale
    let input_image = open(input_path)
        .expect(&format!("Could not load image at {:?}", input_path))
        .to_rgb8();

    // Save original image in output directory
    let original_path = output_dir.join("original.png");
    input_image.save(&original_path).unwrap();

    // We will reduce the image width by this amount, removing one seam at a time.
    let seams_to_remove: u32 = input_image.width() / 6;

    let mut shrunk = input_image.clone();
    let mut seams = Vec::new();

    // Record each removed seam so that we can draw them on the original image later.
    for i in 0..seams_to_remove {
        if i % 100 == 0 {
            println!("Removing seam {}", i);
        }
        let vertical_seam = find_vertical_seam(&shrunk);
        shrunk = remove_vertical_seam(&mut shrunk, &vertical_seam);
        seams.push(vertical_seam);
    }

    // Draw the seams on the original image.
    let gray_image = map_colors(&input_image, |p| p.to_luma());
    let annotated = draw_vertical_seams(&gray_image, &seams);
    let annotated_path = output_dir.join("annotated.png");
    annotated.save(&annotated_path).unwrap();

    // Draw the seams on the gradient magnitude image.
    let gradients = sobel_gradient_map(&input_image, |p| {
        let mean = (p[0] + p[1] + p[2]) / 3;
        Luma([mean as u32])
    });
    let clamped_gradients: GrayImage = map_colors(&gradients, |p| Luma([Clamp::clamp(p[0])]));
    let annotated_gradients = draw_vertical_seams(&clamped_gradients, &seams);
    let gradients_path = output_dir.join("gradients.png");
    clamped_gradients.save(&gradients_path).unwrap();
    let annotated_gradients_path = output_dir.join("annotated_gradients.png");
    annotated_gradients.save(&annotated_gradients_path).unwrap();

    // Save the shrunk image.
    let shrunk_path = output_dir.join("shrunk.png");
    shrunk.save(&shrunk_path).unwrap();
}

In the main function main() an image is loaded, the original is saved, and then a loop is performed to reduce the width of the image by removing "seams." The results of the process are saved, including the original image with drawn seams and a gradient image with highlighted seams.

We also need to install the imageproc and image libraries and switch off the default features to make sure that multi-threading is disabled (default-features = false). After disabling the default features, you need to explicitly specify only the features that you need:

// ./my-program/Cargo.toml
[package]
name = "my-program"
version = "0.1.0"
edition = "2021"

[dependencies.image]
version = "0.24.4"
default-features = false
features = ["png", "jpeg", "bmp"]

[dependencies.imageproc]
version = "0.23.0"
default-features = false

We can now build the Rust program into a WASM blob using cargo:

cd my-program && cargo build --target wasm32-wasi --release

This command navigates to the my-program directory and builds the project using Cargo with the target set to wasm32-wasi in release mode.

This will generate a WASM file at ./my-program/target/wasm32-wasi/release/my-program.wasm which can now be run on Bacalhau.

2. Running WASM on Bacalhau

Now that we have a WASM binary, we can upload it to IPFS and use it as input to a Bacalhau job.

The -i flag allows specifying a URI to be mounted as a named volume in the job, which can be an IPFS CID, HTTP URL, or S3 object.

For this example, we are using an image of the Statue of Liberty that has been pinned to a storage facility.

export JOB_ID=$(bacalhau wasm run \
    ./my-program/target/wasm32-wasi/release/my-program.wasm _start \
    --id-only \
    -i ipfs://bafybeifdpl6dw7atz6uealwjdklolvxrocavceorhb3eoq6y53cbtitbeu:/inputs)

Structure of the Commands

bacalhau wasm run: call to Bacalhau
./my-program/target/wasm32-wasi/release/my-program.wasm: the path to the WASM file that will be executed
_start: the entry point of the WASM program, where its execution begins
--id-only: this flag indicates that only the identifier of the executed job should be returned
-i ipfs://bafybeifdpl6dw7atz6uealwjdklolvxrocavceorhb3eoq6y53cbtitbeu:/inputs: input data volume that will be accessible within the job at the specified destination path

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on:

You can download your job results directly by using bacalhau job get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (wasm_results) and downloaded our job output to be stored in that directory.

We can now get the results.

rm -rf wasm_results && mkdir -p wasm_results
bacalhau job get ${JOB_ID} --output-dir wasm_results

Viewing Job Output

When we view the files, we can see the original image, the resulting shrunk image, and the seams that were removed.

./wasm_results/outputs/original.png

./wasm_results/outputs/annotated_gradients.png

./wasm_results/outputs/shrunk.png

Support

Python

Running a Python Script

This tutorial serves as an introduction to Bacalhau. In this example, you'll be executing a simple "Hello, World!" Python script hosted on a website on Bacalhau.

Prerequisites

To get started, you need to install the Bacalhau client, see more information

1. Running Python Locally

We'll be using a very simple Python script that displays the . Create a file called hello-world.py:

# hello-world.py
print("Hello, world!")

Running the script to print out the output:

python3 hello-world.py

After the script has run successfully locally we can now run it on Bacalhau.

2. Running a Bacalhau Job

To submit a workload to Bacalhau you can use the bacalhau docker run command. This command allows passing input data into the container using volumes, we will be using the --input URL:path argument for simplicity. This results in Bacalhau mounting a data volume inside the container. By default, Bacalhau mounts the input volume at the path /inputs inside the container.

, so we must run the full command after the -- argument.

export JOB_ID=$(bacalhau docker run \
    --id-only \
    --input https://raw.githubusercontent.com/bacalhau-project/examples/151eebe895151edd83468e3d8b546612bf96cd05/workload-onboarding/trivial-python/hello-world.py \
    python:3.10-slim \
    -- python3 /inputs/hello-world.py)

Structure of the command

bacalhau docker run: call to Bacalhau
--id-only: specifies that only the job identifier (job_id) will be returned after executing the container, not the entire output
--input https://raw.githubusercontent.com/bacalhau-project/examples/151eebe895151edd83468e3d8b546612bf96cd05/workload-onboarding/trivial-python/hello-world.py \: indicates where to get the input data for the container. In this case, the input data is downloaded from the specified URL, which represents the Python script "hello-world.py".
python:3.10-slim: the Docker image that will be used to run the container. In this case, it uses the Python 3.10 image with a minimal set of components (slim).
--: This double dash is used to separate the Bacalhau command options from the command that will be executed inside the Docker container.
python3 /inputs/hello-world.py: running the hello-world.py Python script stored in /inputs.

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

Declarative job description

The same job can be presented in the declarative format. In this case, the description will look like this:

name: Running Trivial Python
type: batch
count: 1
tasks:
  - name: My main task
    Engine:
      type: docker
      params:
        Image: python:3.10-slim
        Entrypoint:
          - /bin/bash
        Parameters:
          - -c
          - python3 /inputs/hello-world.py
    InputSources:
      - Target: /inputs
        Source:
          Type: urlDownload
          Params:
            URL: https://raw.githubusercontent.com/bacalhau-project/examples/151eebe895151edd83468e3d8b546612bf96cd05/workload-onboarding/trivial-python/hello-world.py
            Path: /inputs/hello-world.py

The job description should be saved in .yaml format, e.g. helloworld.yaml, and then run with the command:

bacalhau job run helloworld.yaml

3. Checking the State of your Jobs

Job status: You can check the status of the job using bacalhau job list.

bacalhau job list --id-filter ${JOB_ID} --no-style

When it says Published or Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau job describe.

bacalhau job describe ${JOB_ID}

Job download: You can download your job results directly by using bacalhau job get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (results) and downloaded our job output to be stored in that directory.

rm -rf results && mkdir results
bacalhau job get ${JOB_ID} --output-dir results

4. Viewing your Job Output

To view the file, run the following command:

cat results/stdout

Support

Docker Workload Onboarding

How to use docker containers with Bacalhau

Docker Workloads

This section describes how to migrate a workload based on a Docker container into a format that will work with the Bacalhau client.

You can check out this example tutorial on to see how we used all these steps together.

Requirements

Here are few things to note before getting started:

Container Registry: Ensure that the container is published to a public container registry that is accessible from the Bacalhau network.
Architecture Compatibility: Bacalhau supports only images that match the host node's architecture. Typically, most nodes run on linux/amd64, so containers in arm64 format are not able to run.
Input Flags: The --input ipfs://... flag supports only directories and does not support CID subpaths. The --input https://... flag supports only single files and does not support URL directories. The --input s3://... flag supports S3 keys and prefixes. For example, s3://bucket/logs-2023-04* includes all logs for April 2023.

You can check to see a used by the Bacalhau team

Note: Only about a third of examples have their containers here. The rest are under random docker hub registries.

Runtime Restrictions

To help provide a safe, secure network for all users, we add the following runtime restrictions:

Limited Ingress/Egress Networking:

All ingress/egress networking is limited as described in the documentation. You won't be able to pull data/code/weights/ etc. from an external source.

Data Passing with Docker Volumes:

bacalhau docker run \
  -i s3://mybucket/logs-2023-04*:/input \
  -o apples:/output_folder \
  ubuntu \
  bash -c 'ls /input > /output_folder/file.txt'

Onboarding Your Workload

Step 1 - Read Data From Your Directory

If you need to pass data into your container you will do this through a Docker volume. You'll need to modify your code to read from a local directory.

We make the assumption that you are reading from a directory called /inputs, which is set as the default.

Step 2 - Write Data to the Your Directory

If you need to return data from your container you will do this through a Docker volume. You'll need to modify your code to write to a local directory.

We make the assumption that you are writing to a directory called /outputs, which is set as the default.

Step 3 - Build and Push Your Image To a Registry

At this step, you create (or update) a Docker image that Bacalhau will use to perform your task. You from your code and dependencies, then to a public registry so that Bacalhau can access it. This is necessary for other Bacalhau nodes to run your container and execute the given task.

Most Bacalhau nodes are of an x86_64 architecture, therefore containers should be built for .

For example:

$ export IMAGE=myuser/myimage:latest
$ docker build -t ${IMAGE} .
$ docker image push ${IMAGE}

Step 4 - Test Your Container

To test your docker image locally, you'll need to execute the following command, changing the environment variables as necessary:

$ export LOCAL_INPUT_DIR=$PWD
$ export LOCAL_OUTPUT_DIR=$PWD
$ export CMD=(sh -c 'ls /inputs; echo do something useful > /outputs/stdout')
$ docker run --rm \
  -v ${LOCAL_INPUT_DIR}:/inputs  \
  -v ${LOCAL_OUTPUT_DIR}:/outputs \
  ${IMAGE} \
  ${CMD}

Let's see what each command will be used for:

$ export LOCAL_INPUT_DIR=$PWD
Exports the current working directory of the host system to the LOCAL_INPUT_DIR variable. This variable will be used for binding a volume and transferring data into the container.

$ export LOCAL_OUTPUT_DIR=$PWD
Exports the current working directory of the host system to the LOCAL_OUTPUT_DIR variable. Similarly, this variable will be used for binding a volume and transferring data from the container.

$ export CMD=(sh -c 'ls /inputs; echo do something useful > /outputs/stdout')
Creates an array of commands CMD that will be executed inside the container. In this case, it is a simple command executing 'ls' in the /inputs directory and writing text to the /outputs/stdout file.

$ docker run ... ${IMAGE} ${CMD}
Launches a Docker container using the specified variables and commands. It binds volumes to facilitate data exchange between the host and the container.

Bacalhau will use the if your image contains one. If you need to specify another entrypoint, use the --entrypoint flag to bacalhau docker run.

For example:

$ export LOCAL_INPUT_DIR=$PWD
$ export LOCAL_OUTPUT_DIR=$PWD
$ export CMD=(sh -c 'ls /inputs; echo "do something useful" > /outputs/stdout')
$ export IMAGE=ubuntu
$ docker run --rm \
  -v ${LOCAL_INPUT_DIR}:/inputs  \
  -v ${LOCAL_OUTPUT_DIR}:/outputs \
  ${IMAGE} \
  ${CMD}
$ cat stdout

The result of the commands' execution is shown below:

do something useful

Step 5 - Run the Workload on Bacalhau

To launch your workload in a Docker container, using the specified image and working with input data specified via IPFS CID, run the following command:

$ bacalhau docker run --input ipfs://${CID} ${IMAGE} ${CMD}

To check the status of your job, run the following command:

$ bacalhau job list --id-filter JOB_ID

To get more information on your job,run:

$ bacalhau job describe JOB_ID

To download your job, run:

$ bacalhau job get JOB_ID

For example, running:

JOB_ID=$(bacalhau docker run ubuntu echo hello | grep 'Job ID:' | sed 's/.*Job ID: \([^ ]*\).*/\1/')
echo "The job ID is: $JOB_ID"
bacalhau job list --id-filter $JOB_ID
sleep 5

bacalhau job list --id-filter $JOB_ID
bacalhau get $JOB_ID

ls shards

outputs:

CREATED   ID        JOB                      STATE      VERIFIED  PUBLISHED
 10:26:00  24440f0d  Docker ubuntu echo h...  Verifying
 CREATED   ID        JOB                      STATE      VERIFIED  PUBLISHED
 10:26:00  24440f0d  Docker ubuntu echo h...  Published            /ipfs/bafybeiflj3kha...
11:26:09.107 | INF bacalhau/get.go:67 > Fetching results of job '24440f0d-3c06-46af-9adf-cb524aa43961'...
11:26:10.528 | INF ipfs/downloader.go:115 > Found 1 result shards, downloading to temporary folder.
11:26:13.144 | INF ipfs/downloader.go:195 > Combining shard from output volume 'outputs' to final location: '/Users/phil/source/filecoin-project/docs.bacalhau.org'
job-24440f0d-3c06-46af-9adf-cb524aa43961-shard-0-host-QmYgxZiySj3MRkwLSL4X2MF5F9f2PMhAE3LV49XkfNL1o3

The --input flag does not support CID subpaths for ipfs:// content.

Alternatively, you can run your workload with a publicly accessible http(s) URL, which will download the data temporarily into your public storage:

$ export URL=https://download.geofabrik.de/antarctica-latest.osm.pbf
$ bacalhau docker run --input ${URL} ${IMAGE} ${CMD}

$ bacalhau job list

$ bacalhau job get JOB_ID

The --input flag does not support URL directories.

Troubleshooting

If you run into this compute error while running your docker image

Creating job for submission ... done ✅
Finding node(s) for the job ... done ✅
Node accepted the job ... done ✅
Error while executing the job.

This can often be resolved by re-tagging your docker image

Support

If you have questions or need support or guidance, please reach out to the (#general channel)

Workload Onboarding

Container

Docker Workload Onboarding

Docker Workloads

Requirements

Runtime Restrictions

Onboarding Your Workload

Step 1 - Read Data From Your Directory

Step 2 - Write Data to the Your Directory

Step 3 - Build and Push Your Image To a Registry

Step 4 - Test Your Container

Step 5 - Run the Workload on Bacalhau

Troubleshooting

Support

Bacalhau Docker Image

Prerequisites

1. Pull the Bacalhau Docker image

1. Check the version of Bacalhau client

2. Run a Bacalhau Job

Structure of the command

3. Submit a Job With Output Files

4. Check the State of your Jobs

Support

How To Work With Custom Containers in Bacalhau

Prerequisite

1. Running Containers

Docker Command

Bacalhau Command

Bacalhau Syntax

2. Building Your Own Custom Container For Bacalhau

3. Running Your Custom Container on Bacalhau

Support

Run CUDA programs on Bacalhau

What is CUDA

Advantage of GPU over CPU

Prerequisite

1. Running CUDA locally

Viewing the programs

2. Running a Bacalhau Job

Structure of the Commands

3. Checking the State of your Jobs

4. Viewing your Job Output

Support

WebAssembly (Wasm) Workloads

Prerequisites and Limitations

Onboarding Your Workload

Step 1: Replace network operations with filesystem reads and writes

Step 2: Configure your compiler to output WASI-compliant WebAssembly

Step 3: Run your program

Program arguments

Environment variables

Examples

Support

Running Rust programs as WebAssembly (WASM)

Prerequisites

1. Develop a Rust Program Locally

2. Running WASM on Bacalhau

Structure of the Commands

Viewing Job Output

Support

Python

Running a Python Script

Prerequisites​

1. Running Python Locally​

2. Running a Bacalhau Job​

Structure of the command​

Declarative job description​

3. Checking the State of your Jobs​

4. Viewing your Job Output​

Support​

WebAssembly (Wasm) Workloads

Prerequisites and Limitations

Onboarding Your Workload

Step 1: Replace network operations with filesystem reads and writes

Step 2: Configure your compiler to output WASI-compliant WebAssembly

Step 3: Run your program

Program arguments

Environment variables

Examples

Support

Prerequisites

1. Running Python Locally

2. Running a Bacalhau Job

Structure of the command

Declarative job description

3. Checking the State of your Jobs

4. Viewing your Job Output

Support

Prerequisites

1. Running Python Locally

2. Running a Bacalhau Job

Structure of the command

Declarative job description

3. Checking the State of your Jobs

4. Viewing your Job Output

Support