How to use docker containers with Bacalhau
Bacalhau executes jobs by running them within containers. Bacalhau employs a syntax closely resembling Docker, allowing you to utilize the same containers. The key distinction lies in how input and output data are transmitted to the container via IPFS, enabling scalability on a global level.
This section describes how to migrate a workload based on a Docker container into a format that will work with the Bacalhau client.
You can check out this example tutorial on how to work with custom containers in Bacalhau to see how we used all these steps together.
Here are few things to note before getting started:
Container Registry: Ensure that the container is published to a public container registry that is accessible from the Bacalhau network.
Architecture Compatibility: Bacalhau supports only images that match the host node's architecture. Typically, most nodes run on linux/amd64
, so containers in arm64
format are not able to run.
Input Flags: The --input ipfs://...
flag supports only directories and does not support CID subpaths. The --input https://...
flag supports only single files and does not support URL directories. The --input s3://...
flag supports S3 keys and prefixes. For example, s3://bucket/logs-2023-04*
includes all logs for April 2023.
You can check to see a list of example public containers used by the Bacalhau team
Note: Only about a third of examples have their containers here. If you can't find one, feel free to contact the team.
To help provide a safe, secure network for all users, we add the following runtime restrictions:
Limited Ingress/Egress Networking:
All ingress/egress networking is limited as described in the networking documentation. You won't be able to pull data/code/weights/
etc. from an external source.
Data Passing with Docker Volumes:
A job includes the concept of input and output volumes, and the Docker executor implements support for these. This means you can specify your CIDs, URLs, and/or S3 objects as input
paths and also write results to an output
volume. This can be seen in the following example:
The above example demonstrates an input volume flag -i s3://mybucket/logs-2023-04*
, which mounts all S3 objects in bucket mybucket
with logs-2023-04
prefix within the docker container at location /input
(root).
Output volumes are mounted to the Docker container at the location specified. In the example above, any content written to /output_folder
will be made available within the apples
folder in the job results CID.
Once the job has run on the executor, the contents of stdout
and stderr
will be added to any named output volumes the job has used (in this case apples
), and all those entities will be packaged into the results folder which is then published to a remote location by the publisher.
If you need to pass data into your container you will do this through a Docker volume. You'll need to modify your code to read from a local directory.
We make the assumption that you are reading from a directory called /inputs
, which is set as the default.
You can specify which directory the data is written to with the --input
CLI flag.
If you need to return data from your container you will do this through a Docker volume. You'll need to modify your code to write to a local directory.
We make the assumption that you are writing to a directory called /outputs
, which is set as the default.
You can specify which directory the data is written to with the --output-volumes
CLI flag.
At this step, you create (or update) a Docker image that Bacalhau will use to perform your task. You build your image from your code and dependencies, then push it to a public registry so that Bacalhau can access it. This is necessary for other Bacalhau nodes to run your container and execute the task.
Most Bacalhau nodes are of an x86_64
architecture, therefore containers should be built for x86_64
systems.
For example:
To test your docker image locally, you'll need to execute the following command, changing the environment variables as necessary:
Let's see what each command will be used for:
Exports the current working directory of the host system to the LOCAL_INPUT_DIR
variable. This variable will be used for binding a volume and transferring data into the container.
Exports the current working directory of the host system to the LOCAL_OUTPUT_DIR variable. Similarly, this variable will be used for binding a volume and transferring data from the container.
Creates an array of commands CMD that will be executed inside the container. In this case, it is a simple command executing 'ls' in the /inputs directory and writing text to the /outputs/stdout file.
Launches a Docker container using the specified variables and commands. It binds volumes to facilitate data exchange between the host and the container.
Bacalhau will use the default ENTRYPOINT if your image contains one. If you need to specify another entrypoint, use the --entrypoint
flag to bacalhau docker run
.
For example:
The result of the commands' execution is shown below:
Data is identified by its content identifier (CID) and can be accessed by anyone who knows the CID. You can use either of these methods to upload your data:
You can choose to
You can mount your data anywhere on your machine, and Bacalhau will be able to run against that data
To launch your workload in a Docker container, using the specified image and working with input
data specified via IPFS CID, run the following command.
To check the status of your job, run the following command.
To get more information on your job, you can run the following command.
To download your job, run.
To put this all together into one would look like the following.
This outputs the following.
The --input
flag does not support CID subpaths for ipfs://
content.
Alternatively, you can run your workload with a publicly accessible http(s) URL, which will download the data temporarily into your public storage:
The --input
flag does not support URL directories.
If you run into this compute error while running your docker image
This can often be resolved by re-tagging your docker image
If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel).
Bacalhau supports running programs that are compiled to WebAssembly (Wasm). With the Bacalhau client, you can upload Wasm programs, retrieve data from public storage, read and write data, receive program arguments, and access environment variables.
Supported WebAssembly System Interface (WASI) Bacalhau can run compiled Wasm programs that expect the WebAssembly System Interface (WASI) Snapshot. WebAssembly programs can access data, environment variables, and program arguments through this interface.
Networking Restrictions All ingress/egress networking is disabled; you won't be able to pull data/code/weights
etc. from an external source. Wasm jobs say what data they need using URLs or CIDs (Content IDentifier) and then access the data by reading from the filesystem.
Single-Threading There is no multi-threading as WASI does not expose any interface.
If your program typically involves reading from and writing to network endpoints, follow these steps to adapt it for Bacalhau:
Replace Network Operations: Instead of making HTTP requests to external servers (e.g., example.com), modify your program to read data from the local filesystem.
Input Data Handling: Specify the input data location in Bacalhau using the --input
flag when running the job. For instance, if your program used to fetch data from example.com
, read from the /inputs
folder locally, and provide the URL as input when executing the Bacalhau job. For example, --input http://example.com
.
Output Handling: Adjust your program to output results to standard output (stdout
) or standard error (stderr
) pipes. Alternatively, you can write results to the filesystem, typically into an output mount. In the case of Wasm jobs, a default folder at /outputs
is available, ensuring that data written there will persist after the job concludes.
By making these adjustments, you can effectively transition your program to operate within the Bacalhau environment, utilizing filesystem operations instead of traditional network interactions.
You can specify additional or different output mounts using the -o
flag.
You will need to compile your program to WebAssembly that expects WASI. Check the instructions for your compiler to see how to do this.
For example, Rust users can specify the wasm32-wasi
target rustup
and cargo
to get programs compiled for WASI WebAssembly. See the Rust example for more information on this.
Data is identified by its content identifier (CID) and can be accessed by anyone who knows the CID. You can use either of these methods to upload your data:
You can mount your data anywhere on your machine, and Bacalhau will be able to run against that data
You can run a WebAssembly program on Bacalhau using the bacalhau wasm run
command.
Run Locally Compiled Program:
If your program is locally compiled, specify it as an argument. For instance, running the following command will upload and execute the main.wasm
program:
The program you specify will be uploaded to a Bacalhau storage node and will be publicly available.
Alternative Program Specification:
You can use a Content IDentifier (CID) for a specific WebAssembly program.
Input Data Specification:
Make sure to specify any input data using --input
flag.
This ensures the necessary data is available for the program's execution.
You can give the Wasm program arguments by specifying them after the program path or CID. If the Wasm program is already compiled and located in the current directory, you can run it by adding arguments after the file name:
For a specific WebAssembly program, run:
Write your program to use program arguments to specify input and output paths. This makes your program more flexible in handling different configurations of input and output volumes.
For example, instead of hard-coding your program to read from /inputs/data.txt
, accept a program argument that should contain the path and then specify the path as an argument to bacalhau wasm run
:
Your language of choice should contain a standard way of reading program arguments that will work with WASI.
You can also specify environment variables using the -e
flag.
See the Rust example for a workload that leverages WebAssembly support.
If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel).