Bacalhau Docs
GithubSlackBlogEnterprise
v1.6.x
  • Documentation
  • Use Cases
  • CLI & API
  • References
  • Community
v1.6.x
  • Welcome
  • Getting Started
    • How Bacalhau Works
    • Getting Started
      • Step 1: Install the Bacalhau CLI
      • Step 2: Running Your Own Job
      • Step 3: Checking on the Status of Your Job
    • Creating Your Own Bacalhau Network
      • Setting Up a Cluster on Amazon Web Services (AWS) with Terraform πŸš€
      • Setting Up a Cluster on Google Cloud Platform (GCP) With Terraform πŸš€
      • Setting Up a Cluster on Azure with Terraform πŸš€
    • Hardware Setup
    • Container Onboarding
      • Docker Workloads
      • WebAssembly (Wasm) Workloads
  • Setting Up
    • Running Nodes
      • Node Onboarding
      • GPU Installation
      • Job selection policy
      • Access Management
      • Node persistence
      • Configuring Your Input Sources
      • Configuring Transport Level Security
      • Limits and Timeouts
      • Test Network Locally
      • Bacalhau WebUI
      • Private IPFS Network Setup
    • Workload Onboarding
      • Container
        • Docker Workload Onboarding
        • WebAssembly (Wasm) Workloads
        • Bacalhau Docker Image
        • How To Work With Custom Containers in Bacalhau
      • Python
        • Building and Running Custom Python Container
        • Running Pandas on Bacalhau
        • Running a Python Script
        • Running Jupyter Notebooks on Bacalhau
        • Scripting Bacalhau with Python
      • R (language)
        • Building and Running your Custom R Containers on Bacalhau
        • Running a Simple R Script on Bacalhau
      • Run CUDA programs on Bacalhau
      • Running a Prolog Script
      • Reading Data from Multiple S3 Buckets using Bacalhau
      • Running Rust programs as WebAssembly (WASM)
      • Generate Synthetic Data using Sparkov Data Generation technique
    • Networking Instructions
      • Accessing the Internet from Jobs
      • Utilizing NATS.io within Bacalhau
    • GPU Workloads Setup
    • Automatic Update Checking
    • Marketplace Deployments
      • Google Cloud Marketplace
    • Inter-Nodes TLS
  • Guides
    • Configuration Management
    • Write a config.yaml
    • Write a SpecConfig
    • Using Labels and Constraints
  • Examples
    • Table of Contents for Bacalhau Examples
    • Data Engineering
      • Using Bacalhau with DuckDB
      • Ethereum Blockchain Analysis with Ethereum-ETL and Bacalhau
      • Convert CSV To Parquet Or Avro
      • Simple Image Processing
      • Oceanography - Data Conversion
      • Video Processing
      • Bacalhau and BigQuery
    • Data Ingestion
      • Copy Data from URL to Public Storage
      • Pinning Data
      • Running a Job over S3 data
    • Model Inference
      • EasyOCR (Optical Character Recognition) on Bacalhau
      • Running Inference on Dolly 2.0 Model with Hugging Face
      • Speech Recognition using Whisper
      • Stable Diffusion on a GPU
      • Stable Diffusion on a CPU
      • Object Detection with YOLOv5 on Bacalhau
      • Generate Realistic Images using StyleGAN3 and Bacalhau
      • Stable Diffusion Checkpoint Inference
      • Running Inference on a Model stored on S3
    • Model Training
      • Training Pytorch Model with Bacalhau
      • Training Tensorflow Model
      • Stable Diffusion Dreambooth (Finetuning)
    • Molecular Dynamics
      • Running BIDS Apps on Bacalhau
      • Coresets On Bacalhau
      • Genomics Data Generation
      • Gromacs for Analysis
      • Molecular Simulation with OpenMM and Bacalhau
    • Systems Engineering
      • Ad-hoc log query using DuckDB
  • References
    • Jobs Guide
      • Job Specification
        • Job Types
        • Task Specification
          • Engines
            • Docker Engine Specification
            • WebAssembly (WASM) Engine Specification
          • Publishers
            • IPFS Publisher Specification
            • Local Publisher Specification
            • S3 Publisher Specification
          • Sources
            • IPFS Source Specification
            • Local Source Specification
            • S3 Source Specification
            • URL Source Specification
          • Network Specification
          • Input Source Specification
          • Resources Specification
          • ResultPath Specification
        • Constraint Specification
        • Labels Specification
        • Meta Specification
      • Job Templates
      • Queuing & Timeouts
        • Job Queuing
        • Timeouts Specification
      • Job Results
        • State
    • CLI Guide
      • Single CLI commands
        • Agent
          • Agent Overview
          • Agent Alive
          • Agent Node
          • Agent Version
        • Config
          • Config Overview
          • Config Auto-Resources
          • Config Default
          • Config List
          • Config Set
        • Job
          • Job Overview
          • Job Describe
          • Job Executions
          • Job History
          • Job List
          • Job Logs
          • Job Run
          • Job Stop
        • Node
          • Node Overview
          • Node Approve
          • Node Delete
          • Node List
          • Node Describe
          • Node Reject
      • Command Migration
    • API Guide
      • Bacalhau API overview
      • Best Practices
      • Agent Endpoint
      • Orchestrator Endpoint
      • Migration API
    • Node Management
    • Authentication & Authorization
    • Database Integration
    • Debugging
      • Debugging Failed Jobs
      • Debugging Locally
    • Running Locally In Devstack
    • Setting up Dev Environment
  • Help & FAQ
    • Bacalhau FAQs
    • Glossary
    • Release Notes
      • v1.5.0 Release Notes
      • v1.4.0 Release Notes
  • Integrations
    • Apache Airflow Provider for Bacalhau
    • Lilypad
    • Bacalhau Python SDK
    • Observability for WebAssembly Workloads
  • Community
    • Social Media
    • Style Guide
    • Ways to Contribute
Powered by GitBook
LogoLogo

Use Cases

  • Distributed ETL
  • Edge ML
  • Distributed Data Warehousing
  • Fleet Management

About Us

  • Who we are
  • What we value

News & Blog

  • Blog

Get Support

  • Request Enterprise Solutions

Expanso (2025). All Rights Reserved.

On this page
  • Prerequisite
  • 1. Running Containers
  • Docker Command
  • Bacalhau Command
  • Bacalhau Syntax
  • 2. Building Your Own Custom Container For Bacalhau
  • 3. Running Your Custom Container on Bacalhau
  • Support

Was this helpful?

Export as PDF
  1. Setting Up
  2. Workload Onboarding
  3. Container

How To Work With Custom Containers in Bacalhau

PreviousBacalhau Docker ImageNextPython

Was this helpful?

Bacalhau operates by executing jobs within containers. This example shows you how to build and use a custom docker container.

Prerequisite

  1. To get started, you need to install the Bacalhau client, see more information

  2. This example requires Docker. If you don't have Docker installed, you can install it from . Docker commands will not work on hosted notebooks like Google Colab, but the Bacalhau commands will.

1. Running Containers

Docker Command

You're likely familiar with executing Docker commands to start a container:

docker run docker/whalesay cowsay sup old fashioned container run

This command runs a container from the docker/whalesay image. The container executes the cowsay sup old fashioned container run command:

_________________________________
< sup old fashioned container run >
 ---------------------------------
    \
     \
      \
                    ##        .
              ## ## ##       ==
           ## ## ## ##      ===
       /""""""""""""""""___/ ===
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
       \______ o          __/
        \    \        __/
          \____\______/

Bacalhau Command

export JOB_ID=$(bacalhau docker run \
    --wait \
    --id-only \ 
    docker/whalesay -- bash -c 'cowsay hello web3 uber-run')

This command also runs a container from the docker/whalesay image, using Bacalhau. We use the bacalhau docker run command to start a job in a Docker container. It contains additional flags such as --wait to wait for job completion and --id-only to return only the job identifier. Inside the container, the bash -c 'cowsay hello web3 uber-run' command is executed.

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

j-7e41b9b9-a9e2-4866-9fce-17020d8ec9e0

You can download your job results directly by using bacalhau job get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (results) and downloaded our job output to be stored in that directory.

rm -rf results && mkdir -p results
bacalhau job get \
--output-dir results \
${JOB_ID}

Viewing your job output

cat ./results/stdout

 _____________________
< hello web3 uber-run >
 ---------------------
    \
     \
      \
                    ##        .
              ## ## ##       ==
           ## ## ## ##      ===
       /""""""""""""""""___/ ===
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
       \______ o          __/
        \    \        __/
          \____\______/

Both commands execute cowsay in the docker/whalesay container, but Bacalhau provides additional features for working with jobs at scale.

Bacalhau Syntax

Bacalhau uses a syntax that is similar to Docker, and you can use the same containers. The main difference is that input and output data is passed to the container via IPFS, to enable planetary scale. In the example above, it doesn't make too much difference except that we need to download the stdout.

The --wait flag tells Bacalhau to wait for the job to finish before returning. This is useful in interactive sessions like this, but you would normally allow jobs to complete in the background and use the bacalhau job list command to check on their status.

Another difference is that by default Bacalhau overwrites the default entry point for the container, so you have to pass all shell commands as arguments to the run command after the -- flag.

2. Building Your Own Custom Container For Bacalhau

To use your own custom container, you must publish the container to a container registry that is accessible from the Bacalhau network. At this time, only public container registries are supported.

To demonstrate this, you will develop and build a simple custom container that comes from an old Docker example. I remember seeing cowsay at a Docker conference about a decade ago. I think it's about time we brought it back to life and distribute it across the Bacalhau network.

# write to the cod.cow
$the_cow = <<"EOC";
   $thoughts
    $thoughts
                               ,,,,_
                            β”ŒΞ¦β–“β•¬β–“β•¬β–“β–“β–“W      @β–“β–“β–’,
                           ╠▓╬▓╬╣╬╬▓╬▓▓   ╔╣╬╬▓╬╣▓,
                    __,β”Œβ•“β•β• β•¬β• β•¬β•¬β•¬Γ‘β•¬β•¬β•¬Γ‘β•¬β•¬ΒΌ,╣╬╬▓╬╬▓╬▓▓▓┐        β•”W_             ,Ο†β–“β–“
               ,Β«@β–’β• β• β• β• β•©β•šβ•™β•™β•©Γœβ•šβ•šβ•šβ•šβ•©β•™β•™β•šβ• β•©β•šβ•šβ•Ÿβ–“β–’β• β• β•«β•£β•¬β•¬β•«β•¬β•£β–“,   _φ╬▓╬╬▓,        ,φ╣▓▓╬╬
          _,Ο†Γ†β•©β•¬β•©β•™β•šβ•©β–‘β•™β•™β–‘β–‘β•©`=β–‘β•™β•šΒ»Β»β•¦β–‘=β•“β•™Γœ1Rβ–‘β”‚β–‘β•šΓœβ–‘β•™β•™β•šβ• β• β• β•£β•£β•¬β‰‘Ξ¦β•¬β–€β•¬β•£β•¬β•¬β–“β–“β–“_   β•“β–„β–“β–“β–“β–“β–“β–“β•¬β–Œ
      _,Ο†β•¬Γ‘β•©β–Œβ–β–ˆ[β–’β–‘β–‘β–‘β–‘Rβ–‘β–‘β–€β–‘`,_`!R`````β•™`-'β•šΓœβ–‘β–‘Γœβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β”‚β”‚β”‚β–‘β•šβ•šβ•™β•šβ•©β•©β•©β•£Γ‘β•©β• β–’β–’β•©β•©β–€β–“β–“β•£β–“β–“β•¬β• β–Œ
     'β•šβ•©Γœβ•™β”‚β–‘β–‘β•™Γ–β–’Γœβ–‘β–‘β–‘Hβ–‘β–‘R β–’Β₯β•£β•£@@@β–“β–“β–“  := '`   `β–‘``````````````````````````]▓▓▓╬╬╠H
       '¬═▄ `\β–‘β•™Γœβ–‘β• DjK` Å»»╙╣▓▓▓▓╬Ñ     -Β»`       -`      `  ,;β•“β–„β•”β•—βˆž  ~β–“β–“β–“β–€β–“β–“β•¬β•¬β•¬β–Œ
             '^^^`   _β•’Ξ“   `╙▀▓▓╨                     _, ⁿDβ•£β–“β•¬β•£β–“β•¬β–“β•œ      ╙╬▓▓╬╬▓▓
                 ```β””                           _β•“β–„@β–“β–“β–“β•œ   `╝╬▓▓╙           ²╣╬▓▓
                        %Ο†β–„β•“_             ~#▓╠▓▒╬▓╬▓▓^        `                β•™β•™
                         `β•£β–“β–“β–“              ╠╬▓╬▓╬▀`
                           β•šβ–“β–Œ               'β•¨β–€β•œ
EOC

Next, the Dockerfile adds the script and sets the entry point.

# write the Dockerfile
FROM debian:stretch
RUN apt-get update && apt-get install -y cowsay
# "cowsay" installs to /usr/games
ENV PATH $PATH:/usr/games
RUN echo '#!/bin/bash\ncowsay "${@:1}"' > /usr/bin/codsay && \
    chmod +x /usr/bin/codsay
COPY cod.cow /usr/share/cowsay/cows/default.cow

Now let's build and test the container locally.

docker build -t ghcr.io/bacalhau-project/examples/codsay:latest . 2> /dev/null
docker run --rm ghcr.io/bacalhau-project/examples/codsay:latest codsay I like swimming in data

Once your container is working as expected then you should push it to a public container registry. In this example, I'm pushing to Github's container registry, but we'll skip the step below because you probably don't have permission. Remember that the Bacalhau nodes expect your container to have a linux/amd64 architecture.

docker buildx build --platform linux/amd64,linux/arm64 --push -t ghcr.io/bacalhau-project/examples/codsay:latest .

3. Running Your Custom Container on Bacalhau

Now we're ready to submit a Bacalhau job using your custom container. This code runs a job, downloads the results, and prints the stdout.

The bacalhau docker run command strips the default entry point, so don't forget to run your entry point in the command line arguments.

export JOB_ID=$(bacalhau docker run \
    --wait \
    --id-only \
    ghcr.io/bacalhau-project/examples/codsay:v1.0.0 \
    -- bash -c 'codsay Look at all this data')

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

Download your job results directly by using bacalhau job get command.

rm -rf results && mkdir -p results
bacalhau job get ${JOB_ID}  --output-dir results

View your job output

cat ./results/stdout

_______________________
< Look at all this data >
 -----------------------
   \
    \
                               ,,,,_
                            β”ŒΞ¦β–“β•¬β–“β•¬β–“β–“β–“W      @β–“β–“β–’,
                           ╠▓╬▓╬╣╬╬▓╬▓▓   ╔╣╬╬▓╬╣▓,
                    __,β”Œβ•“β•β• β•¬β• β•¬β•¬β•¬Γ‘β•¬β•¬β•¬Γ‘β•¬β•¬ΒΌ,╣╬╬▓╬╬▓╬▓▓▓┐        β•”W_             ,Ο†β–“β–“
               ,Β«@β–’β• β• β• β• β•©β•šβ•™β•™β•©Γœβ•šβ•šβ•šβ•šβ•©β•™β•™β•šβ• β•©β•šβ•šβ•Ÿβ–“β–’β• β• β•«β•£β•¬β•¬β•«β•¬β•£β–“,   _φ╬▓╬╬▓,        ,φ╣▓▓╬╬
          _,Ο†Γ†β•©β•¬β•©β•™β•šβ•©β–‘β•™β•™β–‘β–‘β•©`=β–‘β•™β•šΒ»Β»β•¦β–‘=β•“β•™Γœ1Rβ–‘β”‚β–‘β•šΓœβ–‘β•™β•™β•šβ• β• β• β•£β•£β•¬β‰‘Ξ¦β•¬β–€β•¬β•£β•¬β•¬β–“β–“β–“_   β•“β–„β–“β–“β–“β–“β–“β–“β•¬β–Œ
      _,Ο†β•¬Γ‘β•©β–Œβ–β–ˆ[β–’β–‘β–‘β–‘β–‘Rβ–‘β–‘β–€β–‘`,_`!R`````β•™`-'β•šΓœβ–‘β–‘Γœβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β”‚β”‚β”‚β–‘β•šβ•šβ•™β•šβ•©β•©β•©β•£Γ‘β•©β• β–’β–’β•©β•©β–€β–“β–“β•£β–“β–“β•¬β• β–Œ
     'β•šβ•©Γœβ•™β”‚β–‘β–‘β•™Γ–β–’Γœβ–‘β–‘β–‘Hβ–‘β–‘R β–’Β₯β•£β•£@@@β–“β–“β–“  := '`   `β–‘``````````````````````````]▓▓▓╬╬╠H
       '¬═▄ `β–‘β•™Γœβ–‘β• DjK` Å»»╙╣▓▓▓▓╬Ñ     -Β»`       -`      `  ,;β•“β–„β•”β•—βˆž  ~β–“β–“β–“β–€β–“β–“β•¬β•¬β•¬β–Œ
             '^^^`   _β•’Ξ“   `╙▀▓▓╨                     _, ⁿDβ•£β–“β•¬β•£β–“β•¬β–“β•œ      ╙╬▓▓╬╬▓▓
                 ```β””                           _β•“β–„@β–“β–“β–“β•œ   `╝╬▓▓╙           ²╣╬▓▓
                        %Ο†β–„β•“_             ~#▓╠▓▒╬▓╬▓▓^        `                β•™β•™
                         `β•£β–“β–“β–“              ╠╬▓╬▓╬▀`
                           β•šβ–“β–Œ               'β•¨β–€β•œ

Support

If you have questions or need support or guidance, please reach out to the (#general channel).

Bacalhau team via Slack
here
here