Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This directory contains examples relating to performing common tasks with Bacalhau.
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Well done on deploying your Bacalhau cluster! Now that the deployment is finished, this document will help with the next steps. It provides important information on how to interact with and manage the cluster. You'll find details on the outputs from the deployment, including how to set up and connect a Bacalhau Client, and how to authorize and connect a Bacalhau Compute node to the cluster. This guide gives everything needed to start using the Bacalhau setup
After completing the deployment, several outputs will be presented. Below is a description of each output and instructions on how to configure your Bacalhau node using them.
Description: The IP address of the Requester node for the deployment and the endpoint where the Bacalhau API is served.
Usage: Configure the Bacalhau Client to connect to this IP address in the following ways:
Setting the --api-host
CLI Flag:
Setting the BACALHAU_API_HOST
environment variable:
Modifying the Bacalhau Configuration File:
Description: The token used to authorize a client when accessing the Bacalhau API.
Usage: The Bacalhau client prompts for this token when a command is first issued to the Bacalhau API. For example:
Description: The token used to authorize a Bacalhau Compute node to connect to the Requester Node.
Usage: A Bacalhau Compute node can be connected to the Requester Node using the following command:
Welcome to the guide for setting up your own Bacalhau cluster across multiple Azure regions! This guide will walk you through creating a robust, distributed compute cluster that's perfect for running your Bacalhau workloads.
Think of this as building your own distributed supercomputer! Your cluster will provision compute nodes spread across different Azure regions for global coverage.
You'll need a few things ready:
Terraform (version 1.0.0 or newer)
A running Bacalhau orchestrator node
Azure CLI installed and set up
An active Azure subscription
Your subscription ID handy
An SSH key pair for securely accessing your nodes
First, create a terraform.tfvars.json
file with your Azure details:
Open up terraform.tfvars.json
and fill in your Azure details:
Update your config/config.yaml
with your orchestrator information. Specifically, these lines:
Let Terraform get everything ready:
Launch your cluster:
The infrastructure is organized into modules:
Network: Creates VNets and subnets in each region
Security Group: Sets up NSGs with rules for SSH, HTTP, and NATS
Instance: Provisions VMs with cloud-init configuration
Once everything's up and running, let's make sure it works!
First, make sure you have the Bacalhau CLI installed. You can read more about installing the CLI here.
Setup your configuration to point at your orchestrator node:
Check on the health of your nodes:
Run a simple test job:
Check on your jobs:
Get your results:
Having issues? Here are some common solutions:
Double-check your Azure permissions
Make sure your subscription is active
Verify that all needed resource providers are registered
Look at the logs on a node: journalctl -u bacalhau-startup.service
Check Docker logs on a node: docker logs <container-id>
Make sure that port 4222 isn't blocked
Verify your NATS connection settings
Check if nodes are properly registered
Make sure compute is enabled in your config
When you're done, clean everything up with:
If you need to peek under the hood, here's how:
Find your node IPs:
SSH into a node:
Check on Docker:
Go into the container on the node:
Here's what each important file does in your setup:
main.tf
: Your main Terraform configuration
variables.tf
: Where input variables are defined
outputs.tf
: What information Terraform will show you
modules/network
: Handles VNet and subnet creation
modules/securityGroup
: Manages network security groups
modules/instance
: Provisions VMs with cloud-init
cloud-init/init-vm.yml
: Sets up your VM environment, installs packages, and gets services running
config/docker-compose.yml
: Runs Bacalhau in a privileged container with all the right volumes and health checks
For ensuring that you have configured your Azure CLI correctly, here are some commands you can use:
If you get stuck or have questions:
Open an issue in our GitHub repository
Join our Slack
We're here to help you get your cluster running smoothly! 🌟
Welcome to the guide for setting up your own Bacalhau cluster across multiple Google Cloud Platform (GCP) regions! This guide will walk you through creating a robust, distributed compute cluster that's perfect for running your Bacalhau workloads.
Think of this as building your own distributed supercomputer! Your cluster will provision compute nodes spread across different GCP regions for global coverage.
You'll need a few things ready:
Terraform (version 1.0.0 or newer)
A running Bacalhau orchestrator node
Google Cloud SDK installed and set up
An active GCP billing account
Your organization ID handy
An SSH key pair for securely accessing your nodes
Make sure you are logged in with GCP. This could involve both of the following commands:
Clone the examples repo to your machine and go into the GCP directory.
Now, make a copy of the example environment file:
Open up env.json
and fill in your GCP details (more on this below!)
Update your config/config.yaml
with your orchestrator information. Specifically, these lines:
L
Let Terraform get everything ready:
Launch your cluster:
The entire process takes about 8 minutes, but should end with something like the below:
You're good to go!
The env.json
file is where all the magic happens. Here's what you'll need to fill in:
bootstrap_project_id
: Your existing GCP project (just used for setup)
base_project_name
: What you want to call your new project
gcp_billing_account_id
: Where the charges should go
gcp_user_email
: Your GCP email address
org_id
: Your organization's ID
app_tag
: A friendly name for your resources (like "bacalhau-demo")
bacalhau_data_dir
: Where job data should be stored
bacalhau_node_dir
: Where node configs should live
username
: Your SSH username
public_key
: Path to your SSH public key
You can set up nodes in different regions with custom configurations:
Once everything's up and running, let's make sure it works!
First configure the CLI to use your cluster:
Check on the health of your nodes:
Run a simple test job:
Check on your jobs:
Get your results:
Having issues? Here are some common solutions:
Double-check your GCP permissions
Make sure your billing account is active
Verify that all needed APIs are turned on in GCP
Look at the logs on a node: journalctl -u bacalhau-startup.service
Check Docker logs on a node: docker logs <container-id>
Make sure that port 4222 isn't blocked
Verify your NATS connection settings
Check if nodes are properly registered
Make sure compute is enabled in your config
When you're done, clean everything up with:
If you need to peek under the hood, here's how:
Find your node IPs:
SSH into a node:
Check on Docker:
Go into the container on the node:
Here's what each important file does in your setup:
main.tf
: Your main Terraform configuration
variables.tf
: Where input variables are defined
outputs.tf
: What information Terraform will show you
config/config.yaml
: How your Bacalhau nodes are configured
scripts/startup.sh
: Gets your nodes ready to run
scripts/bacalhau-startup.service
: Manages the Bacalhau service
cloud-init/init-vm.yml
: Sets up your VM environment, installs packages, and gets services running
config/docker-compose.yml
: Runs Bacalhau in a privileged container with all the right volumes and health checks
The neat thing is that most of your configuration happens in just one file: env.json
. Though if you want to get fancy, there's lots more you can customize!
If you get stuck or have questions:
We're here to help you get your cluster running smoothly! 🌟
This tutorial describes how to add new nodes to an existing private network. Two basic scenarios will be covered:
Adding a machine as a new node.
Adding a as a new node.
You should have an established private network consisting of at least one requester node.
You should have a new host (physical/virtual machine, cloud instance or docker container) with installed.
Let's assume that you already have a private network with at least one requester node. You will need to:
Set the token in the Compute.Auth.Token
configuration key
Set the orchestrators IP address in the Compute.Orchestrators
configuration key
Execute bacalhau serve
specifying the node type via --orchestrator
flag
To automate the process using Terraform follow these steps:
Determine the IP address of your requester node
Write a terraform script, which does the following:
Adds a new instance
Installs bacalhau
on it
Launches a compute node
Execute the script
When running a node, you can choose which jobs you want to run by using configuration options, environment variables or flags to specify a job selection policy.
If you want more control over making the decision to take on jobs, you can use the JobAdmissionControl.ProbeExec
and JobAdmissionControl.ProbeHTTP
configuration keys.
These are external programs that are passed the following data structure so that they can make a decision about whether to take on a job:
The exec
probe is a script to run that will be given the job data on stdin
, and must exit with status code 0 if the job should be run.
The http
probe is a URL to POST the job data to. The job will be rejected if the HTTP request returns a non-positive status code (e.g. >= 400).
For example, the following response will reject the job:
If the HTTP response is not a JSON blob, the content is ignored and any non-error status code will accept the job.
First, make sure you have the Bacalhau CLI installed. You can read more about installing the CLI .
If you're using the Expanso Cloud hosted orchestrator (Recommended!), you can look at your nodes on the dashboard in real-time.
Open an issue in our
Join our
Let's assume you already have all the necessary cloud infrastructure set up with a private network with at least one requester node. In this case, you can add new nodes manually (, , ) or use a tool like to automatically create and add any number of nodes to your network.
Configure terraform for
If you have questions or need support or guidance, please reach out to the (#general channel).
If the HTTP response is a JSON blob, it should match the and will be used to respond to the bid directly:
How to enable GPU support on your Bacalhau node
Bacalhau supports GPUs out of the box and defaults to allowing execution on all GPUs installed on the node.
Bacalhau makes the assumption that you have installed all the necessary drivers and tools on your node host and have appropriately configured them for use by Docker.
In general for GPUs from any vendor, the Bacalhau client requires:
Verify installation by Running a Sample Workload
nvidia-smi
installed and functional
rocm-smi
tool installed and functional
See the Running ROCm Docker containers for guidance on how to run Docker workloads on AMD GPU.
xpu-smi
tool installed and functional
See the Running on GPU under docker for guidance on how to run Docker workloads on Intel GPU.
Access to GPUs can be controlled using resource limits. To limit the number of GPUs that can be used per job, set a job resource limit. To limit access to GPUs from all jobs, set a total resource limit.
How to configure authentication and authorization on your Bacalhau node.
Bacalhau includes a flexible auth system that supports multiple methods of auth that are appropriate for different deployment environments.
With no specific authentication configuration supplied, Bacalhau runs in "anonymous mode" – which allows unidentified users limited control over the system. "Anonymous mode" is only appropriate for testing or evaluation setups.
In anonymous mode, Bacalhau will allow:
Users identified by a self-generated private key to submit any job and cancel their own jobs.
Users not identified by any key to access other read-only endpoints, such as to read job lists, describe jobs, and query node or agent information.
Bacalhau auth is controlled by policies. Configuring the auth system is done by supplying a different policy file.
Restricting API access to only users that have authenticated requires specifying a new authorization policy. You can download a policy that restricts anonymous access and install it by using:
Once the node is restarted, accessing the node APIs will require the user to be authenticated, but by default will still allow users with a self-generated key to authenticate themselves.
Restricting the list of keys that can authenticate to only a known set requires specifying a new authentication policy. You can download a policy that restricts key-based access and install it by using:
Then, modify the allowed_clients
variable in challange_ns_no_anon.rego
to include acceptable client IDs, found by running bacalhau agent node
.
Once the node is restarted, only keys in the allowed list will be able to access any API.
Users can authenticate using a username and password instead of specifying a private key for access. Again, this requires installation of an appropriate policy on the server.
Passwords are not stored in plaintext and are salted. The downloaded policy expects password hashes and salts generated by scrypt
. To generate a salted password, the helper script in pkg/authn/ask/gen_password
can be used:
This will ask for a password and generate a salt and hash to authenticate with it. Add the encoded username, salt and hash into the ask_ns_password.rego
.
In principle, Bacalhau can implement any auth scheme that can be described in a structured way by a policy file.
Policies are written in a language called Rego, also used by Kubernetes. Users who want to write their own policies should get familiar with the Rego language.
Bacalhau will pass information pertinent to the current request into every authentication policy query as a field on the input
variable. The exact information depends on the type of authentication used.
challenge
authenticationchallenge
authentication uses identifies the user by the presence of a private key. The user is asked to sign an input phrase to prove they have the key they are identifying with.
Policies used for challenge
authentication do not need to actually implement the challenge verification logic as this is handled by the core code. Instead, they will only be invoked if this verification passes.
Policies for this type will need to implement these rules:
bacalhau.authn.token
: if the user should be authenticated, an access token they should use in subsequent requests. If the user should not be authenticated, should be undefined.
They should expect as fields on the input
variable:
clientId
: an ID derived from the user's private key that identifies them uniquely
nodeId
: the ID of the requester node that this user is authenticating with
signingKey
: the private key (as a JWK) that should be used to sign any access tokens to be returned
The simplest possible policy might therefore be this policy that returns the same opaque token for all users:
A more realistic example that returns a signed JWT is in challenge_ns_anon.rego.
ask
authenticationask
authentication uses credentials supplied manually by the user as identification. For example, an ask
policy could require a username and password as input and check these against a known list. ask
policies do all the verification of the supplied credentials.
Policies for this type will need to implement these rules:
bacalhau.authn.token
: if the user should be authenticated, an access token they should use in subsequent requests. If the user should not be authenticated, should be undefined.
bacalhau.authn.schema
: a static JSON schema that should be used to collect information about the user. The type
of declared fields may be used to pick the input method, and if a field is marked as writeOnly
then it will be collected in a secure way (e.g. not shown on screen). The schema
rule does not receive any input
data.
They should expect as fields on the input
variable:
ask
: a map of field names from the JSON schema to strings supplied by the user. The policy should validate these credentials.
nodeId
: the ID of the requester node that this user is authenticating with
signingKey
: the private key (as a JWK) that should be used to sign any access tokens to be returned
The simplest possible policy might therefore be one that asks for no data and returns the same opaque token for every user:
A more realistic example that returns a signed JWT is in ask_ns_example.rego.
Authorization policies do not vary depending on the type of authentication used – Bacalhau uses one authz policy for all API requests.
Authz policies are invoked for every API request. Authz policies should check the validity of any supplied access tokens and issue an authz decision for the requested API endpoint. It is not required that authz policies enforce that an access token is present – they may choose to grant access to unauthorized users.
Policies will need to implement these rules:
bacalhau.authz.token_valid
: true if the access token in the request is "valid" (but does not necessarily grant access for this request), or false if it is invalid for every request (e.g. because it has expired) and should be discarded.
bacalhau.authz.allow
: true if the user should be permitted to carry out the input request, false otherwise.
They should expect as fields on the input
variable for both rules:
http
: details of the user's HTTP request:
host
: the hostname used in the HTTP request
method
: the HTTP method (e.g. GET
, POST
)
path
: the path requested, as an array of path components without slashes
query
: a map of URL query parameters to their values
headers
: a map of HTTP header names to arrays representing their values
body
: a blob of any content submitted as the body
constraints
: details about the receiving node that should be used to validate any supplied tokens:
cert
: keys that the input token should have been signed with
iss
: the name of a node that this node will recognize as the issuer of any signed tokens
aud
: the name of this node that is receiving the request
Notably, the constraints
data is appropriate to be passed directly to the Rego io.jwt.decode_verify
method which will validate the access token as a JWT against the given constraints.
The simplest possible authz policy might be this one that allows all users to access all endpoints:
A more realistic example (which is the Bacalhau "anonymous mode" default) is in policy_ns_anon.rego.
Welcome to the guide for setting up your own Bacalhau cluster across multiple AWS regions! This guide will walk you through creating a robust, distributed compute cluster that's perfect for running your Bacalhau workloads.
Think of this as building your own distributed supercomputer! Your cluster will provision compute nodes spread across different AWS regions for global coverage.
You'll need a few things ready:
Terraform (version 1.0.0 or newer)
AWS CLI installed and configured
An active AWS account with appropriate permissions
Your AWS credentials configured
An SSH key pair for securely accessing your nodes
A Bacalhau network
First, set up an orchestrator node. We recommend using Expanso Cloud for this! But you can always set up your own
Create your environment configuration file:
Fill in your AWS details in env.tfvars.json
:
Configure your desired regions in locations.yaml
. Here's an example (we have a full list of these in all_locations.yaml):
Make sure the AMI exists in the region you need it to! You can confirm this by executing the following command:
Update your Bacalhau config/config.yaml (the defaults are mostly fine, just update the Orchestrator, and Token lines):
Deploy your cluster using the Python deployment script:
Terraform on AWS requires switching to different workspaces when deploying to different availability zones. As a result, we had to setup a separate deploy.py
script which switches to each workspace for you under the hood, to make it easier.
env.tfvars.json
: Your main configuration file containing AWS-specific settings`
locations.yaml
: Defines which regions to deploy to and instance configurations
config/config.yaml
: Bacalhau node configuration
app_name
: Name for your cluster resources
app_tag
: Tag for resource management
bacalhau_installation_id
: Unique identifier for your cluster
username
: SSH username for instances
public_key_path
: Path to your SSH public key
private_key_path
: Path to your SSH private key
bacalhau_config_file_path
: Path to the config file for this compute node (should point at the orchestrator and have the right token)
Each region entry requires:
region
: AWS region (e.g., us-west-2)
zone
: Availability zone (e.g., us-west-2a)
instance_type
: EC2 instance type (e.g., t3.medium)
instance_ami
: AMI ID for the region
node_count
: Number of instances to deploy
Once everything's up and running, let's make sure it works!
First, make sure you have the Bacalhau CLI installed. You can read more about installing the CLI here.
Configure your Bacalhau client:
List your compute nodes:
Run a test job:
Check job status:
Verify AWS credentials are properly configured:
Check IAM permissions
Ensure you have quota available in target regions
SSH into a node:
Check Bacalhau service logs:
Check Docker container status:
Verify security group rules (ports 22, 80, and 4222 should be open)
Check VPC and subnet configurations
Ensure internet gateway is properly attached
If nodes aren't joining the network:
Check NATS connection string in config.yaml
Verify security group allows port 4222
Ensure nodes can reach the orchestrator
If jobs aren't running:
Check compute is enabled in node config
Verify Docker is running properly
Check available disk space
If deployment fails:
Look for errors in Terraform output
Check AWS service quotas
Verify AMI availability in chosen regions
Remove all resources:
Check node health:
If you get stuck or have questions:
Open an issue in our GitHub repository
Join our Slack
We're here to help you get your cluster running smoothly! 🌟
How to use docker containers with Bacalhau
Bacalhau executes jobs by running them within containers. Bacalhau employs a syntax closely resembling Docker, allowing you to utilize the same containers. The key distinction lies in how input and output data are transmitted to the container via IPFS, enabling scalability on a global level.
This section describes how to migrate a workload based on a Docker container into a format that will work with the Bacalhau client.
You can check out this example tutorial on how to work with custom containers in Bacalhau to see how we used all these steps together.
Here are few things to note before getting started:
Container Registry: Ensure that the container is published to a public container registry that is accessible from the Bacalhau network.
Architecture Compatibility: Bacalhau supports only images that match the host node's architecture. Typically, most nodes run on linux/amd64
, so containers in arm64
format are not able to run.
Input Flags: The --input ipfs://...
flag supports only directories and does not support CID subpaths. The --input https://...
flag supports only single files and does not support URL directories. The --input s3://...
flag supports S3 keys and prefixes. For example, s3://bucket/logs-2023-04*
includes all logs for April 2023.
You can check to see a list of example public containers used by the Bacalhau team
Note: Only about a third of examples have their containers here. The rest are under random docker hub registries.
To help provide a safe, secure network for all users, we add the following runtime restrictions:
Limited Ingress/Egress Networking:
All ingress/egress networking is limited as described in the networking documentation. You won't be able to pull data/code/weights/
etc. from an external source.
Data Passing with Docker Volumes:
A job includes the concept of input and output volumes, and the Docker executor implements support for these. This means you can specify your CIDs, URLs, and/or S3 objects as input
paths and also write results to an output
volume. This can be seen in the following example:
The above example demonstrates an input volume flag -i s3://mybucket/logs-2023-04*
, which mounts all S3 objects in bucket mybucket
with logs-2023-04
prefix within the docker container at location /input
(root).
Output volumes are mounted to the Docker container at the location specified. In the example above, any content written to /output_folder
will be made available within the apples
folder in the job results CID.
Once the job has run on the executor, the contents of stdout
and stderr
will be added to any named output volumes the job has used (in this case apples
), and all those entities will be packaged into the results folder which is then published to a remote location by the publisher.
If you need to pass data into your container you will do this through a Docker volume. You'll need to modify your code to read from a local directory.
We make the assumption that you are reading from a directory called /inputs
, which is set as the default.
If you need to return data from your container you will do this through a Docker volume. You'll need to modify your code to write to a local directory.
We make the assumption that you are writing to a directory called /outputs
, which is set as the default.
At this step, you create (or update) a Docker image that Bacalhau will use to perform your task. You build your image from your code and dependencies, then push it to a public registry so that Bacalhau can access it. This is necessary for other Bacalhau nodes to run your container and execute the given task.
Most Bacalhau nodes are of an x86_64
architecture, therefore containers should be built for x86_64
systems.
For example:
To test your docker image locally, you'll need to execute the following command, changing the environment variables as necessary:
Let's see what each command will be used for:
Bacalhau will use the default ENTRYPOINT if your image contains one. If you need to specify another entrypoint, use the --entrypoint
flag to bacalhau docker run
.
For example:
The result of the commands' execution is shown below:
To launch your workload in a Docker container, using the specified image and working with input
data specified via IPFS CID, run the following command:
To check the status of your job, run the following command:
To get more information on your job,run:
To download your job, run:
For example, running:
outputs:
The --input
flag does not support CID subpaths for ipfs://
content.
Alternatively, you can run your workload with a publicly accessible http(s) URL, which will download the data temporarily into your public storage:
The --input
flag does not support URL directories.
If you run into this compute error while running your docker image
This can often be resolved by re-tagging your docker image
If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel)
Requester nodes store job state and history in a boltdb-backed store (pkg/jobstore/boltdb).
The location of the database file can be specified using the BACALHAU_JOB_STORE_PATH
environment variable, which will specify which file to use to store the database. When not specified, the file will be {$BACALHAU_DIR}/{NODE_ID}-requester.db
.
By default, compute nodes store their execution information in an bolddb-backed store (pkg/compute/store/boltdb).
The location of the database file (for a single node) can be specified using the BACALHAU_COMPUTE_STORE_PATH
environment variable, which will specify which file to use to store the database. When not specified, the file will be {$BACALHAU_DIR}/{NODE_ID}-compute.db
.
As compute nodes restart, they will find they have existing state in the boltdb database. At startup the database currently iterates the executions to calculate the counters for each state. This will be a good opportunity to do some compaction of the records in the database, and cleanup items no longer in use.
Currently only batch jobs are possible, and so for each of the listed states below, no action is taken at restart. In future it would make sense to remove records older than a certain age, or moved them to failed, depending on their current state. For other job types (to be implemented) this may require restarting jobs, resetting jobs,
ExecutionStateCreated
No action
ExecutionStateBidAccepted
No action
ExecutionStateRunning
No action
ExecutionStateWaitingVerification
No action
ExecutionStateResultAccepted
No action
ExecutionStatePublishing
No action
ExecutionStateCompleted
No action
ExecutionStateFailed
No action
ExecutionStateCancelled
No action
The databases can be inspected using the bbolt tool. The bbolt tool can be installed to $GOBIN with:
Once installed, and assuming the database file is stored in $FILE you can use bbolt to:
Confiuration key
Default value
Meaning
JobAdmissionControl.Locality
Anywhere
Only accept jobs that reference data we have locally ("local") or anywhere ("anywhere").
JobAdmissionControl.ProbeExec
unused
Use the result of an external program to decide if we should take on the job.
JobAdmissionControl.ProbeHTTP
unused
Use the result of a HTTP POST to decide if we should take on the job.
JobAdmissionControl.RejectStatelessJobs
False
JobAdmissionControl.AcceptNetworkedJobs
False
Accept jobs that require network connections.
These are the configuration keys that control the capacity of the Bacalhau node, and the limits for jobs that might be run.
Compute.AllocatedCapacity.CPU
Specifies the amount of CPU a compute node allocates for running jobs. It
can be expressed as a percentage (e.g., 85%
) or a Kubernetes resource string
Compute.AllocatedCapacity.Disk
Specifies the amount of Disk space a compute node allocates for running
jobs. It can be expressed as a percentage (e.g., 85%
) or a Kubernetes resource string (e.g., 10Gi
)
Compute.AllocatedCapacity.GPU
Specifies the amount of GPU a compute node allocates for running jobs. It can be expressed as a percentage (e.g., 85%
) or a Kubernetes resource string (e.g., 1
).
Note: When using percentages, the result is always rounded up to the nearest whole GPU
Compute.AllocatedCapacity.Memory
Specifies the amount of Memory a compute node allocates for running jobs. It can be expressed as a percentage (e.g., 85%
) or a Kubernetes resource string (e.g., 1Gi
)
It is also possible to additionally specify the number of resources to be allocated to each job by default, if the required number of resources is not specified in the job itself. JobDefaults.<
>.Task.Resources.<Resource Type>
configuration keys are used for this purpose. E.g. to provide each job with 2Gb of RAM the following key is used: JobDefaults.Ops.Task.Resources.Memory
:
See the complete configuration keys list for more details.
Resource limits are not supported for Docker jobs running on Windows. Resource limits will be applied at the job bid stage based on reported job requirements but will be silently unenforced. Jobs will be able to access as many resources as requested at runtime.
Running a Windows-based node is not officially supported, so your mileage may vary. Some features (like resource limits) are not present in Windows-based nodes.
Bacalhau currently makes the assumption that all containers are Linux-based. Users of the Docker executor will need to manually ensure that their Docker engine is running and configured appropriately to support Linux containers, e.g. using the WSL-based backend.
Bacalhau can limit the total time a job spends executing. A job that spends too long executing will be cancelled, and no results will be published.
By default, a Bacalhau node does not enforce any limit on job execution time. Both node operators and job submitters can supply a maximum execution time limit. If a job submitter asks for a longer execution time than permitted by a node operator, their job will be rejected.
Applying job timeouts allows node operators to more fairly distribute the work submitted to their nodes. It also protects users from transient errors that result in their jobs waiting indefinitely.
Job submitters can pass the --timeout
flag to any Bacalhau job submission CLI to set a maximum job execution time. The supplied value should be a whole number of seconds with no unit.
The timeout can also be added to an existing job spec by adding the Timeout
property to the Spec
.
Node operators can use configuration keys to specify default and maximum job execution time limits. The supplied values should be a numeric value followed by a time unit (one of s
for seconds, m
for minutes or h
for hours).
Here is a list of the relevant properties:
JobDefaults.Batch.Task.Timeouts.ExecutionTimeout
Default value for batch job execution timeouts on your current compute node. It will be assigned to batch jobs with no timeout requirement defined
JobDefaults.Ops.Task.Timeouts.ExecutionTimeout
Default value for ops job execution timeouts on your current compute node. It will be assigned to ops jobs with no timeout requirement defined
JobDefaults.Batch.Task.Timeouts.TotalTimeout
Default value for the maximum execution timeout this compute node supports for batch jobs. Jobs with higher timeout requirements will not be bid on
JobDefaults.Ops.Task.Timeouts.TotalTimeout
Default value for the maximum execution timeout this compute node supports for ops jobs. Jobs with higher timeout requirements will not be bid on
Note, that timeouts can not be configured for Daemon and Service jobs.
How to run the WebUI.
The Bacalhau WebUI offers an intuitive interface for interacting with the Bacalhau network. This guide provides comprehensive instructions for setting up and utilizing the WebUI.
For contributing to the WebUI's development, please refer to the Bacalhau WebUI GitHub Repository.
Ensure you have a Bacalhau v1.5.0
or later installed.
To enable the WebUI, use the WebUI.Enabled
configuration key:
By default, WebUI uses host=0.0.0.0
and port=8438
. This can be configured via WebUI.Listen
configuration key:
Once started, the WebUI is accessible at the specified address, localhost:8438
by default.
The updated WebUI allows you to view a list of jobs, including job status, run time, type, and a message in case the job failed.
Clicking on the id of a job in the list opens the job details page, where you can see the history of events related to the job, the list of nodes on which the job was executed and the real-time logs of the job.
On the Nodes page you can see a list of nodes connected to your network, including node type, membership and connection statuses, amount of resources - total and currently available, and a list of labels of the node.
Clicking on the node id opens the node details page, where you can see the status and settings of the node, the number of running and scheduled jobs.
Bacalhau has two ways to make use of external storage providers: Sources and Publishers. Sources are storage resources consumed as inputs to jobs. And Publishers are storage resources created with the results of jobs.
Bacalhau allows you to use S3 or any S3-compatible storage service as an input source. Users can specify files or entire prefixes stored in S3 buckets to be fetched and mounted directly into the job execution environment. This capability ensures that your jobs have immediate access to the necessary data. See the for more details.
To use the S3 source, you will have to specify the mandatory name of the S3 bucket and the optional parameters Key, Filter, Region, Endpoint, VersionID and ChechsumSHA256.
Below is an example of how to define an S3 input source in YAML format:
To start, you'll need to connect the Bacalhau node to an IPFS server so that you can run jobs that consume CIDs as inputs. You can either install IPFS and run it locally, or you can connect to a remote IPFS server.
In both cases, you should have an IPFS multiaddress for the IPFS server that should look something like this:
The multiaddress above is just an example - you'll need to get the multiaddress of the IPFS server you want to connect to.
You can then configure your Bacalhau node to use this IPFS server by adding the address to the InputSources.Types.IPFS.Endpoint
configuration key:
See the for more details.
Below is an example of how to define an IPFS input source in YAML format:
To use a local data source, you will have to:
Enable the use of local data when configuring the node itself by using the Compute.AllowListedLocalPaths
configuration key, specifying the file path and access mode. For example
In the job description specify parameters SourcePath - the absolute path on the compute node where your data is located and ReadWrite - the access mode.
Below is an example of how to define a Local input source in YAML format:
To use a URL data source, you will have to specify only URL parameter, as in the part of the declarative job description below:
Bacalhau's S3 Publisher provides users with a secure and efficient method to publish job results to any S3-compatible storage service. To use an S3 publisher you will have to specify required parameters Bucket and Key and optional parameters Region, Endpoint, VersionID, ChecksumSHA256. See the for more details.
Here’s an example of the part of the declarative job description that outlines the process of using the S3 Publisher with Bacalhau:
The IPFS publisher works using the same setup as above - you'll need to have an IPFS server running and a multiaddress for it. Then you'll configure that multiaddress using the InputSources.Types.IPFS.Endpoint
configuration key. Then you can use bacalhau job get <job-ID>
with no further arguments to download the results.
To use the IPFS publisher you will have to specify CID which can be used to access the published content. See the for more details.
And part of the declarative job description with an IPFS publisher will look like this:
The Local Publisher should not be used for Production use as it is not a reliable storage option. For production use, we recommend using a more reliable option such as an S3-compatible storage service.
Another possibility to store the results of a job execution is on a compute node. In such case the results will be published to the local compute node, and stored as compressed tar file, which can be accessed and retrieved over HTTP from the command line using the get command. To use the Local publisher you will have to specify the only URL parameter with a HTTP URL to the location where you would like to save the result. See the for more details.
Here is an example of part of the declarative job description with a local publisher:
In this tutorial, we will look at how to run CUDA programs on Bacalhau. CUDA (Compute Unified Device Architecture) is an extension of C/C++ programming. It is a parallel computing platform and programming model created by NVIDIA. It helps developers speed up their applications by harnessing the power of GPU accelerators.
In addition to accelerating high-performance computing (HPC) and research applications, CUDA has also been widely adopted across consumer and industrial ecosystems. CUDA also makes it easy for developers to take advantage of all the latest GPU architecture innovations
Architecturally, the CPU is composed of just a few cores with lots of cache memory that can handle a few software threads at a time. In contrast, a GPU is composed of hundreds of cores that can handle thousands of threads simultaneously.
Computations like matrix multiplication could be done much faster on GPU than on CPU
To get started, you need to install the Bacalhau client, see more information
You'll need to have the following installed:
NVIDIA GPU
CUDA drivers installed
nvcc
installed
Checking if nvcc
is installed:
Downloading the programs:
00-hello-world.cu
:
This example represents a standard C++ program that inefficiently utilizes GPU resources due to the use of non-parallel loops.
02-cuda-hello-world-faster.cu
:
In this example we utilize Vector addition using CUDA and allocate the memory in advance and copy the memory to the GPU using cudaMemcpy so that it can utilize the HBM (High Bandwidth memory of the GPU). Compilation and execution occur faster (1.39 seconds) compared to the previous example (8.67 seconds).
To submit a job, run the following Bacalhau command:
bacalhau docker run
: call to Bacalhau
-i https://raw.githubusercontent.com/tristanpenman/cuda-examples/master/02-cuda-hello-world-faster.cu
: URL path of the input data volumes downloaded from a URL source.
nvidia/cuda:11.2.0-cudnn8-devel-ubuntu18.04
: Docker container for executing CUDA programs (you need to choose the right CUDA docker container). The container should have the tag of "devel" in them.
nvcc --expt-relaxed-constexpr -o ./outputs/hello ./inputs/02-cuda-hello-world-faster.cu
: Compilation using the nvcc compiler and save it to the outputs directory as hello
Note that there is ;
between the commands: -- /bin/bash -c 'nvcc --expt-relaxed-constexpr -o ./outputs/hello ./inputs/02-cuda-hello-world-faster.cu; ./outputs/hello
The ";" symbol allows executing multiple commands sequentially in a single line.
./outputs/hello
: Execution hello binary: You can combine compilation and execution commands.
Note that the CUDA version will need to be compatible with the graphics card on the host machine
When a job is submitted, Bacalhau prints out the related job_id
. We store that in an environment variable so that we can reuse it later on:
Job status: You can check the status of the job using bacalhau job list
.
When it says Published
or Completed
, that means the job is done, and we can get the results.
Job information: You can find out more information about your job by using bacalhau job describe
.
Job download: You can download your job results directly by using bacalhau job get
. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (results
) and downloaded our job output to be stored in that directory.
To view the file, run the following command:
Bacalhau supports running programs that are compiled to . With the Bacalhau client, you can upload Wasm programs, retrieve data from public storage, read and write data, receive program arguments, and access environment variables.
Supported WebAssembly System Interface (WASI) Bacalhau can run compiled Wasm programs that expect the WebAssembly System Interface (WASI) Snapshot 1. Through this interface, WebAssembly programs can access data, environment variables, and program arguments.
Networking Restrictions All ingress/egress networking is disabled; you won't be able to pull data/code/weights
etc. from an external source. Wasm jobs can say what data they need using URLs or CIDs (Content IDentifier) and can then access the data by reading from the filesystem.
Single-Threading There is no multi-threading as WASI does not expose any interface for it.
If your program typically involves reading from and writing to network endpoints, follow these steps to adapt it for Bacalhau:
Replace Network Operations: Instead of making HTTP requests to external servers (e.g., example.com), modify your program to read data from the local filesystem.
Input Data Handling: Specify the input data location in Bacalhau using the --input
flag when running the job. For instance, if your program used to fetch data from example.com
, read from the /inputs
folder locally, and provide the URL as input when executing the Bacalhau job. For example, --input http://example.com
.
Output Handling: Adjust your program to output results to standard output (stdout
) or standard error (stderr
) pipes. Alternatively, you can write results to the filesystem, typically into an output mount. In the case of Wasm jobs, a default folder at /outputs
is available, ensuring that data written there will persist after the job concludes.
By making these adjustments, you can effectively transition your program to operate within the Bacalhau environment, utilizing filesystem operations instead of traditional network interactions.
You can specify additional or different output mounts using the -o
flag.
You will need to compile your program to WebAssembly that expects WASI. Check the instructions for your compiler to see how to do this.
You can run a WebAssembly program on Bacalhau using the bacalhau wasm run
command.
Run Locally Compiled Program:
If your program is locally compiled, specify it as an argument. For instance, running the following command will upload and execute the main.wasm
program:
The program you specify will be uploaded to a Bacalhau storage node and will be publicly available if you are using the public demo network.
Consider creating your own private network.
Alternative Program Specification:
You can use a Content IDentifier (CID) for a specific WebAssembly program.
Input Data Specification:
Make sure to specify any input data using --input
flag.
This ensures the necessary data is available for the program's execution.
You can give the Wasm program arguments by specifying them after the program path or CID. If the Wasm program is already compiled and located in the current directory, you can run it by adding arguments after the file name:
For a specific WebAssembly program, run:
Write your program to use program arguments to specify input and output paths. This makes your program more flexible in handling different configurations of input and output volumes.
For example, instead of hard-coding your program to read from /inputs/data.txt
, accept a program argument that should contain the path and then specify the path as an argument to bacalhau wasm run
:
Your language of choice should contain a standard way of reading program arguments that will work with WASI.
You can also specify environment variables using the -e
flag.
How to use Bacalhau Docker Image for task management
This documentation explains how to use the Bacalhau Docker image for task management with Bacalhau client.
To get started, you need to install the Bacalhau client (see more information ) and Docker.
The first step is to pull the Bacalhau Docker image from the .
Expected output:
You can also pull a specific version of the image, e.g.:
The output is similar to:
For example to run an Ubuntu-based job that prints the message 'Hello from Docker Bacalhau':
--id-only
: Output only the job id
--wait
: Wait for the job to finish
ubuntu:latest.
Ubuntu container
--
: Separate Bacalhau parameters from the command to be executed inside the container
sh -c 'uname -a && echo "Hello from Docker Bacalhau!"'
: The command executed inside the container
The command execution in the terminal is similar to:
j-6ffd54b8-e992-498f-9ee9-766ab09d5daa
is a job ID
, which represents the result of executing a command inside a Docker container. It can be used to obtain additional information about the executed job or to access the job's results. We store that in an environment variable so that we can reuse it later on (env: JOB_ID=j-6ffd54b8-e992-498f-9ee9-766ab09d5daa
)
To print the content of the Job ID, execute the following command:
The output is similar to:
You always need to mount directories into the container to access files. This is because the container is running in a separate environment from your host machine.
The first part of this example should look familiar, except for the Docker commands.
When a job is submitted, Bacalhau prints the related job_id
(j-da29a804-3960-4667-b6e5-73f05e120117
):
Job status: You can check the status of the job using bacalhau job list
.
When it reads Completed
, that means the job is done, and you can get the results.
Job information: You can find out more information about your job by using bacalhau job describe
.
Job download: You can download your job results directly by using bacalhau job get
. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in the result
directory.
After the download is complete, you should see the following contents in the results directory.
This tutorial serves as an introduction to Bacalhau. In this example, you'll be executing a simple "Hello, World!" Python script hosted on a website on Bacalhau.
To get started, you need to install the Bacalhau client, see more information
We'll be using a very simple Python script that displays the . Create a file called hello-world.py
:
Running the script to print out the output:
After the script has run successfully locally we can now run it on Bacalhau.
To submit a workload to Bacalhau you can use the bacalhau docker run
command. This command allows passing input data into the container using volumes, we will be using the --input URL:path
argument for simplicity. This results in Bacalhau mounting a data volume inside the container. By default, Bacalhau mounts the input volume at the path /inputs
inside the container.
, so we must run the full command after the --
argument.
bacalhau docker run
: call to Bacalhau
--id-only
: specifies that only the job identifier (job_id) will be returned after executing the container, not the entire output
--input https://raw.githubusercontent.com/bacalhau-project/examples/151eebe895151edd83468e3d8b546612bf96cd05/workload-onboarding/trivial-python/hello-world.py \
: indicates where to get the input data for the container. In this case, the input data is downloaded from the specified URL, which represents the Python script "hello-world.py".
python:3.10-slim
: the Docker image that will be used to run the container. In this case, it uses the Python 3.10 image with a minimal set of components (slim).
--
: This double dash is used to separate the Bacalhau command options from the command that will be executed inside the Docker container.
python3 /inputs/hello-world.py
: running the hello-world.py
Python script stored in /inputs
.
When a job is submitted, Bacalhau prints out the related job_id
. We store that in an environment variable so that we can reuse it later on.
The same job can be presented in the declarative format. In this case, the description will look like this:
The job description should be saved in .yaml
format, e.g. helloworld.yaml
, and then run with the command:
Job status: You can check the status of the job using bacalhau job list
.
When it says Published
or Completed
, that means the job is done, and we can get the results.
Job information: You can find out more information about your job by using bacalhau job describe
.
Job download: You can download your job results directly by using bacalhau job get
. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (results
) and downloaded our job output to be stored in that directory.
To view the file, run the following command:
How to configure TLS for the requester node APIs
By default, the requester node APIs used by the Bacalhau CLI are accessible over HTTP, but it is possible to configure it to use Transport Level Security (TLS) so that they are accessible over HTTPS instead. There are several ways to obtain the necessary certificates and keys, and Bacalhau supports obtaining them via ACME and Certificate Authorities or even self-signing them.
Once configured, you must ensure that instead of using http://IP:PORT you use https://IP:PORT to access the Bacalhau API
Automatic Certificate Management Environment (ACME) is a protocol that allows for automating the deployment of Public Key Infrastructure, and is the protocol used to obtain a free certificate from the Certificate Authority.
Using the --autocert [hostname]
parameter to the CLI (in the serve
and devstack
commands), a certificate is obtained automatically from Lets Encrypt. The provided hostname should be a comma-separated list of hostnames, but they should all be publicly resolvable as Lets Encrypt will attempt to connect to the server to verify ownership (using the challenge). On the very first request this can take a short time whilst the first certificate is issued, but afterwards they are then cached in the bacalhau repository.
Alternatively, you may set these options via the environment variable, BACALHAU_AUTO_TLS
. If you are using a configuration file, you can set the values inNode.ServerAPI.TLS.AutoCert
instead.
As a result of the Lets Encrypt verification step, it is necessary for the server to be able to handle requests on port 443. This typically requires elevated privileges, and rather than obtain these through a privileged account (such as root), you should instead use setcap to grant the executable the right to bind to ports <1024.
A cache of ACME data is held in the config repository, by default ~/.bacalhau/autocert-cache
, and this will be used to manage renewals to avoid rate limits.
Obtaining a TLS certificate from a Certificate Authority (CA) without using the Automated Certificate Management Environment (ACME) protocol involves a manual process that typically requires the following steps:
Choose a Certificate Authority: First, you need to select a trusted Certificate Authority that issues TLS certificates. Popular CAs include DigiCert, GlobalSign, Comodo (now Sectigo), and others. You may also consider whether you want a free or paid certificate, as CAs offer different pricing models.
Generate a Certificate Signing Request (CSR): A CSR is a text file containing information about your organization and the domain for which you need the certificate. You can generate a CSR using various tools or directly on your web server. Typically, this involves providing details such as your organization's name, common name (your domain name), location, and other relevant information.
Submit the CSR: Access your chosen CA's website and locate their certificate issuance or order page. You'll typically find an option to "Submit CSR" or a similar option. Paste the contents of your CSR into the provided text box.
Verify Domain Ownership: The CA will usually require you to verify that you own the domain for which you're requesting the certificate. They may send an email to one of the standard domain-related email addresses (e.g., admin@yourdomain.com, webmaster@yourdomain.com). Follow the instructions in the email to confirm domain ownership.
Complete Additional Verification: Depending on the CA's policies and the type of certificate you're requesting (e.g., Extended Validation or EV certificates), you may need to provide additional documentation to verify your organization's identity. This can include legal documents or phone calls from the CA to confirm your request.
Payment and Processing: If you're obtaining a paid certificate, you'll need to make the payment at this stage. Once the CA has received your payment and completed the verification process, they will issue the TLS certificate.
Once you have obtained your certificates, you will need to put two files in a location that bacalhau can read them. You need the server certificate, often called something like server.cert
or server.cert.pem
, and the server key which is often called something like server.key
or server.key.pem
.
Once you have these two files available, you must start bacalhau serve
which two new flags. These are tlscert
and tlskey
flags, whose arguments should point to the relevant file. An example of how it is used is:
Alternatively, you may set these options via the environment variables, BACALHAU_TLS_CERT
and BACALHAU_TLS_KEY
. If you are using a configuration file, you can set the values inNode.ServerAPI.TLS.ServerCertificate
and Node.ServerAPI.TLS.ServerKey
instead.
Once you have generated the necessary files, the steps are much like above, you must start bacalhau serve
which two new flags. These are tlscert
and tlskey
flags, whose arguments should point to the relevant file. An example of how it is used is:
Alternatively, you may set these options via the environment variables, BACALHAU_TLS_CERT
and BACALHAU_TLS_KEY
. If you are using a configuration file, you can set the values inNode.ServerAPI.TLS.ServerCertificate
and Node.ServerAPI.TLS.ServerKey
instead.
If you use self-signed certificates, it is unlikely that any clients will be able to verify the certificate when connecting to the Bacalhau APIs. There are three options available to work around this problem:
Provide a CA certificate file of trusted certificate authorities, which many software libraries support in addition to system authorities.
Install the CA certificate file in the system keychain of each machine that needs access to the Bacalhau APIs.
Instruct the software library you are using not to verify HTTPS requests.
Bacalhau operates by executing jobs within containers. This example shows you how to build and use a custom docker container.
To get started, you need to install the Bacalhau client, see more information
This example requires Docker. If you don't have Docker installed, you can install it from . Docker commands will not work on hosted notebooks like Google Colab, but the Bacalhau commands will.
You're likely familiar with executing Docker commands to start a container:
This command runs a container from the docker/whalesay
image. The container executes the cowsay sup old fashioned container run
command:
This command also runs a container from the docker/whalesay
image, using Bacalhau. We use the bacalhau docker run
command to start a job in a Docker container. It contains additional flags such as --wait
to wait for job completion and --id-only
to return only the job identifier. Inside the container, the bash -c 'cowsay hello web3 uber-run'
command is executed.
When a job is submitted, Bacalhau prints out the related job_id
. We store that in an environment variable so that we can reuse it later on.
You can download your job results directly by using bacalhau job get
. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (results
) and downloaded our job output to be stored in that directory.
Viewing your job output
Both commands execute cowsay in the docker/whalesay
container, but Bacalhau provides additional features for working with jobs at scale.
Bacalhau uses a syntax that is similar to Docker, and you can use the same containers. The main difference is that input and output data is passed to the container via IPFS, to enable planetary scale. In the example above, it doesn't make too much difference except that we need to download the stdout.
The --wait
flag tells Bacalhau to wait for the job to finish before returning. This is useful in interactive sessions like this, but you would normally allow jobs to complete in the background and use the bacalhau job list
command to check on their status.
Another difference is that by default Bacalhau overwrites the default entry point for the container, so you have to pass all shell commands as arguments to the run
command after the --
flag.
To use your own custom container, you must publish the container to a container registry that is accessible from the Bacalhau network. At this time, only public container registries are supported.
To demonstrate this, you will develop and build a simple custom container that comes from an old Docker example. I remember seeing cowsay at a Docker conference about a decade ago. I think it's about time we brought it back to life and distribute it across the Bacalhau network.
Next, the Dockerfile adds the script and sets the entry point.
Now let's build and test the container locally.
Once your container is working as expected then you should push it to a public container registry. In this example, I'm pushing to Github's container registry, but we'll skip the step below because you probably don't have permission. Remember that the Bacalhau nodes expect your container to have a linux/amd64
architecture.
Now we're ready to submit a Bacalhau job using your custom container. This code runs a job, downloads the results, and prints the stdout.
The bacalhau docker run
command strips the default entry point, so don't forget to run your entry point in the command line arguments.
When a job is submitted, Bacalhau prints out the related job_id
. We store that in an environment variable so that we can reuse it later on.
Download your job results directly by using bacalhau job get
command.
View your job output
Bacalhau supports running jobs as a program. This example demonstrates how to compile a project into WebAssembly and run the program on Bacalhau.
To get started, you need to install the Bacalhau client, see more information .
A working Rust installation with the wasm32-wasi
target. For example, you can use to install Rust and configure it to build WASM targets. For those using the notebook, these are installed in hidden cells below.
We can use cargo
(which will have been installed by rustup
) to start a new project (my-program
) and compile it:
We can then write a Rust program. Rust programs that run on Bacalhau can read and write files, access a simple clock, and make use of pseudo-random numbers. They cannot memory-map files or run code on multiple threads.
The program below will use the Rust imageproc
crate to resize an image through seam carving, based on .
In the main function main()
an image is loaded, the original is saved, and then a loop is performed to reduce the width of the image by removing "seams." The results of the process are saved, including the original image with drawn seams and a gradient image with highlighted seams.
We also need to install the imageproc
and image
libraries and switch off the default features to make sure that multi-threading is disabled (default-features = false
). After disabling the default features, you need to explicitly specify only the features that you need:
We can now build the Rust program into a WASM blob using cargo
:
This command navigates to the my-program
directory and builds the project using Cargo with the target set to wasm32-wasi
in release mode.
This will generate a WASM file at ./my-program/target/wasm32-wasi/release/my-program.wasm
which can now be run on Bacalhau.
Now that we have a WASM binary, we can upload it to IPFS and use it as input to a Bacalhau job.
The -i
flag allows specifying a URI to be mounted as a named volume in the job, which can be an IPFS CID, HTTP URL, or S3 object.
For this example, we are using an image of the Statue of Liberty that has been pinned to a storage facility.
bacalhau wasm run
: call to Bacalhau
./my-program/target/wasm32-wasi/release/my-program.wasm
: the path to the WASM file that will be executed
_start
: the entry point of the WASM program, where its execution begins
--id-only
: this flag indicates that only the identifier of the executed job should be returned
-i ipfs://bafybeifdpl6dw7atz6uealwjdklolvxrocavceorhb3eoq6y53cbtitbeu:/inputs
: input data volume that will be accessible within the job at the specified destination path
When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on:
You can download your job results directly by using bacalhau job get
. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory (wasm_results
) and downloaded our job output to be stored in that directory.
We can now get the results.
When we view the files, we can see the original image, the resulting shrunk image, and the seams that were removed.
Reject jobs that don't specify any .
The Local input source allows Bacalhau jobs to access files and directories that are already present on the compute node. This is especially useful for utilizing locally stored datasets, configuration files, logs, or other necessary resources without the need to fetch them from a remote source, ensuring faster job initialization and execution. See the for more details.
The URL Input Source provides a straightforward method for Bacalhau jobs to access and incorporate data available over HTTP/HTTPS. By specifying a URL, users can ensure the required data, whether a single file or a web page content, is retrieved and prepared in the job's execution environment, enabling direct and efficient data utilization. See the for more details.
If you have questions or need support or guidance, please reach out to the (#general channel).
For example, Rust users can specify the wasm32-wasi
target to rustup
and cargo
to get programs compiled for WASI WebAssembly. See for more information on this.
See for a workload that leverages WebAssembly support.
If you have questions or need support or guidance, please reach out to the (#general channel)
If have questions or need support or guidance, please reach out to the (#general channel).
If you have questions or need support or guidance, please reach out to the (#general channel).
If you wish, it is possible to use Bacalhau with a self-signed certificate which does not rely on an external Certificate Authority. This is an involved process and so is not described in detail here although there is which should provide a good starting point.
If you have questions or need support or guidance, please reach out to the (#general channel).
If you have questions or need support or guidance, please reach out to the (#general channel).
Bacalhau has an update checking service to automatically detect whether a newer version of the software is available.
Users who are both running CLI commands and operating nodes will be regularly informed that a new release can be downloaded and installed.
Bacalhau will run an update check regularly when client commands are executed. If an update is available, explanatory text will be printed at the end of the command.
To force a manual update check, run the bacalhau version
command, which will explicitly list the latest software release alongside the server and client versions.
Bacalhau will run an update check regularly as part of the normal operation of the node.
If an update is available, an INFO level message will be printed to the log.
Bacalhau has some configuration options for controlling how often checks are performed. By default, an update check will run no more than once every 24 hours. Users can opt out of automatic update checks using the configuration described below.
UpdateConfig.Interval
BACALHAU_UPDATE_CHECKFREQUENCY
24h0m0s
The minimum amount of time between automated update checks. Set as any duration of hours, minutes or seconds, e.g. 24h
or 10m
. When set to 0 update checks
are not performed
It's important to note that disabling the automatic update checks may lead to potential issues, arising from mismatched versions of different actors within Bacalhau.
To output update check config, run bacalhau config list
:
If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel).
By default, Bacalhau jobs do not have any access to the internet. This is to keep both compute providers and users safe from malicious activities.
To run Docker jobs on Bacalhau to access the internet, you'll need to specify one of the following:
full: unfiltered networking for any protocol --network=full
http: HTTP(S)-only networking to a specified list of domains --network=http
none: no networking at all, the default --network=none
Specifying none
will still allow Bacalhau to download and upload data before and after the job using a Publisher.
Jobs using http
must specify the domains they want to access when the job is submitted.
So, putting it together the job run should look like this:
Jobs will be provided with http_proxy
and https_proxy
environment variables which contain a TCP address of an HTTP proxy to connect through. Most tools and libraries will use these environment variables by default. If not, they must be used by user code to configure HTTP proxy usage.
The required networking can be specified using the --network
flag. For http
networking, the required domains can be specified using the --domain
flag, multiple times for as many domains as required. Specifying a domain starting with a .
means that all sub-domains will be included. For example, specifying .example.com
will cover some.thing.example.com
as well as example.com
.
If you are seeing the following (or any DNS) error, it's you likely forgot the --network
flag!
Bacalhau jobs are explicitly prevented from starting other Bacalhau jobs, even if a Bacalhau requester node is specified on the HTTP allowlist.
Submitting a job is the first part, the second part is ensuring your network can handle networking. You can set nodes up to accept jobs using an Admission Controller setting in the node config. For example:
Bacalhau clusters are composed of requester nodes, and compute nodes. The requester nodes are responsible for managing the compute nodes that make up the cluster. This functionality is only currently available when using NATS for the network transport.
The two main areas of functionality for the requester nodes are, managing the membership of compute nodes that require approval to take part in the cluster, and monitoring the health of the compute nodes. They are also responsible for collecting information provided by the compute nodes on a regular schedule.
As compute nodes start, they register their existence with the requester nodes. Once registered, they will maintain a sentinel file to note that they are already registered, this avoids unnecessary registration attempts.
Once registered, the requester node will need to approve the compute node before it can take part in the cluster. This is to ensure that the requester node is aware of all the compute nodes that are part of the cluster. In future, we may provide mechanisms for auto-approval of nodes joining the cluster, but currently all compute nodes registering default to the PENDING state.
Listing the current nodes in the system will show requester nodes automatically APPROVED, and compute nodes in the PENDING state.
Nodes can be rejected using their node id, and optionally specifying a reason with the -m flag.
Nodes can be approved using their node id.
There is currently no support for auto-eviction of nodes, but they can be manually removed from the cluster using the node delete
command. Note, if they are manually removed, they are able to manually re-register, so this is most useful when you know the node will not be coming back.
After all of these actions, the node list looks like
Compute nodes will provide information about themselves to the requester nodes on a regular schedule. This information is used to help the requester nodes make decisions about where to schedule workloads.
These updates are broken down into:
Node Information: This is the information about the node itself, such as the hostname, CPU architecture, and any labels associated with the node. This information is persisted to the Node Info Store.
Resource Information: This is the information about the resources available on the node, such as the amount of memory, storage and CPU available. This information is held in memory and used to make scheduling decisions. It is not persisted to disk as it is considered transient.
Health Information: This heartbeat is used to determine if the node is still healthy, and if it is not, the requester node will mark the node as unhealthy. Eventually, the node will be marked as Unknown if it does not recover. This information is held in memory and used to make scheduling decisions. Like the resource information, it is not persisted to disk as it is considered transient.
Various configuration options are available to control the frequency of these updates, and the timeout for the health check. These can be set in the configuration file.
For the compute node, these settings are:
Node Information: InfoUpdateFrequency
- The interval between updates of the node information.
Resource Information: ResourceUpdateFrequency
- The interval between updates of the resource information.
Heartbeat: HeartbeatFrequency
- The interval between heartbeats sent by the compute node.
Heartbeat: HeartbeatTopic
- The name of the pubsub topic that heartbeat messages are sent via.
For the requester node, these settings are:
Heartbeat HeartbeatFrequency
- How often the heartbeat server will check the priority queue of node heartbeats.
Heartbeat HeartbeatTopic
- The name of the pubsub topic that heartbeat messages are sent via. Should be the same as the compute node value.
Node health NodeDisconnectedAfter
- The interval after which the node will be considered disconnected if a heartbeat has not been received.
As compute nodes are added and removed from the cluster, the requester nodes will emit events to the NATS PubSub system. These events can be consumed by other systems to react to changes in the cluster membership.
Securing node-to-node communication with TLS
Secure communication between Bacalhau Compute Nodes and Orchestrators is crucial, especially when operating across untrusted networks. This guide demonstrates how to implement TLS encryption to protect inter-node communication and ensure data security.
Bacalhau Compute Nodes initiate communication with the orchestrator through NATS, a high-performance messaging system. The orchestrator node hosts the NATS server, which compute nodes automatically connect to upon startup.
As a distributed system, Bacalhau supports TLS encryption to secure these communication channels. While this guide demonstrates the implementation using self-signed certificates, the same principles apply when using company-issued or publicly trusted certificates.
In this step, we'll guide you through generating the required certificates, focusing on self-signed certificate creation.
First, we need to generate a self-signed root certificate authority (CA) certificate, which will be used to sign all subsequent certificates. You can use standard tools like openssl
or mkcert
for this process. We recommend setting a long expiration date for the root CA and securely backing up both the certificate and its private key.
This step will produce two essential components: the self-signed root CA certificate and its corresponding private key.
In this step, we'll generate the certificate that enables TLS connections for the NATS server.
First, identify the DNS name or IP address used to connect to the orchestrator. This is typically found in the compute nodes' configuration under the "Orchestrators" field. For example:
If your config specifies nats://10.0.5.16:4222,
use the IP address 10.0.5.16
If your config specifies nats://my-bacalhau-orchestrator-node:4222
, use the DNS name my-bacalhau-orchestrator-node
Next, generate a server certificate signed by the Root CA (created in step 1). This certificate must include your chosen IP address or DNS name in its Subject Alternative Name field. Additionally, always include the IP address "127.0.0.1" in the Subject Alternative Names to support communications initiated from the orchestrator node itself.
This step will produce two critical files: the server certificate and its corresponding private key. Store both files securely in a protected location.
In this step, we'll configure both orchestrator nodes and compute nodes with the generated certificates.
First, copy the following files to the orchestrator node:
The root certificate from step 1 (certificate file only, not the private key)
The server certificate from step 2
The server's private key from step 2
The orchestrator node should now have three files: the root certificate, server certificate, and server key file. Next, enable TLS support by adding the TLS configuration section to the orchestrator's configuration file. Example:
Next, prepare each compute node by copying the root certificate file (excluding the private key) to the node. Then, update each compute node's configuration to trust this certificate authority for secure server connections. Example:
After restarting the Bacalhau processes on all nodes, secure TLS communication will be established for all node-to-node interactions.
Bacalhau supports GPU workloads. In this tutorial, learn how to run a job using GPU workloads with the Bacalhau client.
The Bacalhau network must have an executor node with a GPU exposed
Your container must include the CUDA runtime (cudart) and must be compatible with the CUDA version running on the node
To submit a job request, use the --gpu
flag under the docker run
command to select the number of GPUs your job requires. For example:
The following limitations currently exist within Bacalhau. Bacalhau supports:
NVIDIA, Intel or AMD GPUs only
GPUs for the Docker executor only
Bacalhau authenticates and authorizes users in a multi-step flow.
We know our potential users have many possible requirements around auth and exist across the entire spectrum from "no auth needed because its a simple local deployment" to "enterprise-grade security for publicly accessible nodes". Hence, the auth system needs to be unopinionated about how authentication and authorization gets achieved.
The auth system has therefore been designed with a few goals in mind:
Flexible authentication: it should be easy for users to add their own authentication method, including simple methods like using shared secrets and more complex methods up to OAuth and OIDC.
Flexible authorization: it should be possible for users to be authorized based on a number of different modes, including group-based auth, RBAC and ABAC. The exact permissions of each should be customizable. The system should not require, for example, a particular model of "namespaces" or "workspaces" because these don't necessarily fit all use cases.
Future proofing: the auth system should not require core-level upgrades to support advancements in cryptography. The hash functions and key sizes that are considered "secure" change over time, so the Bacalhau core should not be forced to have an opinion on this by the auth system and should not have to play "whack-a-mole" with supporting different configurations for different customers. Instead, it should be possible for customers to apply a policy that makes sense for them and upgrade security at their own pace.
Performance: any calls to remote servers or complex algorithms to decide logic should happen once in the authentication process, and then subsequent calls to the API should introduce little overhead from authorization.
Auth server is a set of API endpoints that are trusted to make auth decisions. This is something built into the requester node and doesn't need to be a separate service, but could also be implemented as an external service if desired.
User agent is a tool that acts on behalf of the user, running in a trusted way locally to them. The user agent submits API calls to the requester node on their behalf – so the CLI, Web UI and SDK are all user agents. We use the term "user agent" to differentiate from a "client", which in the OAuth sense means a third-party service that the user does not have complete trust in.
Bacalhau implements flexible authentication and authorization using policies which are written using a machine-executable policy format called Rego.
Each authentication policy receives authentication credentials as input and outputs access tokens that will supplied to future API calls.
Each authorization policy receives access tokens as input and outputs decisions about allowable access to APIs and job submission.
These two policies work together to define the entire authentication and authorization scheme.
The basic list of steps is:
Get the list of acceptable authn methods
Pick one and execute it, collecting any credentials from the user
Submit the credentials to the authn API
Receive an access token and use it in all future requests
User agents make a request to their configured auth server to retrieve a list of authentication methods, keyed by name.
Each authentication method object describes:
a type of authentication, identified by a specific key
parameters to be used in running the authentication method, specific to that type
Each "type" can be used to implement a number of different authentication methods. The types broadly correlate with behavior that the user agent needs to take to run the authentication flow, such that there can be a single piece of user agent code that is capable of running each type, with different input parameters.
The supported types are:
challenge
authenticationThis method is used to identify users via a private key that they hold. The authentication response contains a InputPhrase
that the user should sign and return to the endpoint.
ask
authenticationThis method requires the user to manually input some information. This method can be used to implement username and password authentication, shared secret authentication, and even 2FA or security question auth.
The required information is represented by a JSON Schema in the object itself. The implementation should parse the JSON Schema and ask the user questions to populate an object that is valid by it.
The user agent decides which authentication method to use (e.g. by asking the user, or by knowing it has an appropriate key) and operates the flow.
Once all the data for the method has been successfully collected, the user agent POSTs the data to the auth endpoint for the method. The endpoint is the base auth endpoint plus the name of the method, e.g. /api/v1/auth/<method>
. So to submit data for a "userpass" method, the user agent would POST to /api/v1/auth/userpass
.
The auth server processes the request by inputting the auth credentials into a auth policy. If the auth policy finds the passed data acceptable, it returns an access token that the user can use in subsequent calls.
(Aside: there is actually no specification on the structure of the access token. The user agent should treat it as an opaque blob that it receives from the auth server and submits to the API server. Currently, all of the core Bacalhau code also does not have any opinion of the auth token – it is not assumed to be any specific type of object, and all parsing and handling is handled by the Rego policies. However, all of the currently implemented Rego policies output and expect JWTs, and it is recommended that users continue to use this convention. The rest of this document will assume access tokens are JWTs.)
The signed JWT is returned to the user agent. The user agent takes appropriate steps to keep the access token secret.
In principle, the auth policy can return any JWT it wishes, which will be interpreted later in the API auth policy – it is up to the authn policy and the authz policy to work together to apply auth. The policy to run is identified by the Node.Auth.Methods
variable, which is a map of method names to policy paths.
However, the default authn and authz policies make decisions using namespaces. Here, the authn policy returns a set of namespaces with associated access permissions, and the authz policy controls access based on them.
In this default case, the JWT includes the fields:
iss
(issuer)The node ID of the auth server.
sub
(subject)A network-unique user ID, derived from the auth credentials. The sub
does not need to identify the same user across different authentication methods, but should ideally be the same if the user logs in via the same auth method again.
ist
(issued at)The timestamp when the token was issued.
exp
(expires at)The timestamp after which the token is no longer valid.
ns
(namespaces)A map of namespaces to permission bits.
The key in the map is a namespace name that the user has some level of access of. Namespace names are ephemeral – i.e. there does not need to be a persistent or coordinated store of namespaces shared across the whole cluster. Instead, the format of namespace names is an interface for the network operator to decide.
For example, the default policy will just give the user access to a namespace identified by the sub
field (e.g. their username). But in principle, more complex setups involving groups could be used.
Namespace names can be a *
, which by convention will match any set of characters, like a filesystem glob. But it is up to the various auth policies to actually implement this. So a JWT claim containing "*"
would give default permissions for all namespaces.
The value in the map is an unsigned integer encoding permission bits. If the following bits are set:
0b00000001
: user can describe jobs in the namespace
0b00000010
: user can create jobs in the namespace
0b00000100
: user can download results from the namespace
0b00001000
: user can cancel jobs in the namespace
The user agent includes an Authorization
header with the access token it wishes to use passed as a bearer token:
Note that the Authorization
header is strictly optional – access for unauthorized users is controlled using the policy, and may be allowed. The API call is allowed to proceed if the authorization policy returns a positive decision.
The requester node executes the API authorization policy and passes details of the API call. The default policy is one where the namespaces of the token are checked if present, and non-namespaced APIs require a valid signed token.
As above, custom policies are allowed. The policy to execute is defined by the Node.Auth.AccessPolicyPath
config variable. For non-namespaced APIs, such as node APIs, the policy may make a blanket decision simply using whether the user has an authorization token or not, or may choose to make a decision depending on the type of authorization. For namespaced APIs, such as job APIs, the policy should examine the namespaces in the JWT token and respond accordingly.
The authz server will return a 403 Forbidden
error if the user is not allowed to carry out the requested action. It will also return a 401 Unauthorized
error if the token the user passed is not valid for any future request. In the latter case, the user agent should discard the token and execute the above flow again to get a new one.
There are a number of roadmap items that will enhance the auth system:
The Web UI currently does not have any authn/z capability, and so can only work with the default Bacalhau configuration which does not limit unauthenticated users from querying read-only API endpoints.
To upgrade the Web UI to work in authenticated cases, it will be necessary to implement the algorithms noted above. In short:
The Web UI will need to query the auth API endpoint for available authn methods.
It should then pick an appropriate authn method, either by asking the user, choosing based on known available data (e.g. existing presence of a private key), or by picking the only available option.
It should then run the authn flow for that type:
For challenge
types, it will need a private key. It should probably generate and store one persistently rather than asking the user to upload theirs.
For ask
types, it will need to parse the input JSON Schema and present a web form to collect the necessary authn credentials.
Once it has successfully authenticated, it should persistently store the access token and add it to all subsequent API requests.
external
authentication typeThis type will power future OAuth2/OIDC authentication. The principle is that:
The type will specify a remote endpoint to redirect the user to. The CLI will open a browser to this endpoint (or otherwise advise the user to do this) and the Web UI will just issue a redirect to this endpoint.
The user completes authentication at the remote service and is then redirected back to a supplied endpoint with valid credentials.
The CLI may need to run a temporary web server to receive the redirect (this is how CLI tools like gcloud
currently handle the OIDC flow). The Web UI will need to specify a redirect that it can subsequently decode credentials for.
Also specified in the authentication method data will be any query parameters that the CLI/WebUI needs to populate with the redirect path. E.g. the specific OIDC scheme might specify the return location as a ?redirect
url query parameter, and the authentication type should specify the name of this parameter.
There doesn't need to be an optional step where the user exchanges the identity token they received from the remote auth server for a Bacalhau auth token. Instead, the system could just use the returned credential directly.
However, this may be a beneficial step for mapping OIDC credentials into e.g. a JWT that specifies available namespaces. So there should probably be a step where the token received from the OIDC flow is passed to the authn method endpoint, and a policy has the chance to return a different token. In the basic case, it can check the validity of the token and return it unchanged.
The returned credential will be a JWT or similar access token. The user agent should use this credential to query the API as above. The authz policy should be configured to recognize these access tokens and apply authz control based on their content, as for the other methods.
Different jobs may require different amounts of resources to execute. Some jobs may have specific hardware requirements, such as GPU. This page describes how to specify hardware requirements for your job.
Please bear in mind that each executor is implemented independently and these docs might be slightly out of date. Double check the man page for the executor you are using with bacalhau [executor] --help
.
The following table describes how to specify hardware requirements for the Docker executor.
--cpu
500m
Job CPU cores (e.g. 500m, 2, 8)
--memory
1Gb
Job Memory requirement (e.g. 500Mb, 2Gb, 8Gb).
--gpu
0
Job GPU requirement (e.g. 1).
When you specify hardware requirements, the job will be offered out to the network to see if there are any nodes that can satisfy the requirements. If there are, the job will be scheduled on the node and the executor will be started.
Bacalhau supports GPU workloads. Learn how to run a job using GPU workloads with the Bacalhau client.
The Bacalhau network must have an executor node with a GPU exposed
Your container must include the CUDA runtime (cudart) and must be compatible with the CUDA version running on the node
Use following command to see available resources amount:
To submit a request for a job that requires more than the standard set of resources, add the --cpu
and --memory
flags. For example, for a job that requires 2 CPU cores and 4Gb of RAM, use --cpu=2 --memory=4Gb
, e.g.:
To submit a GPU job request, use the --gpu
flag under the docker run
command to select the number of GPUs your job requires. For example:
The following limitations currently exist within Bacalhau.
Maximum CPU and memory limits depend on the participants in the network
For GPU:
NVIDIA, Intel or AMD GPUs only
Only the Docker Executor supports GPUs
This guide provides a comprehensive overview of Bacalhau's label and constraint system, which enables fine-grained control over job scheduling and resource allocation.
Labels in Bacalhau are key-value pairs attached to nodes that describe their characteristics, capabilities, and properties. Constraints are rules you define when submitting jobs to ensure they run on nodes with specific labels.
Labels are defined when starting a Bacalhau node using the -c Labels
flag:
You can also define labels in a YAML configuration file:
Then start the node with:
Check node labels using:
Bacalhau supports various operators for precise node selection:
=
region=us-east
Exact match
!=
env!=staging
Not equal
exists
gpu
Key exists
!
!temporary
Key doesn't exist
in
zone in (a,b,c)
Value in set
gt
mem-gb gt 32
Greater than
lt
cpu-cores lt 16
Less than
Here are common patterns for submitting jobs with constraints:
Follow these patterns for consistent label naming:
Use lowercase alphanumeric characters
Separate words with hyphens
Use descriptive prefixes for categorization
Examples:
Organize labels hierarchically for better management:
If your job fails with no matching nodes:
Check available nodes and their labels:
Verify your constraints aren't too restrictive:
Ensure required nodes are online:
Remember that label changes require node restarts. After updating labels:
Gracefully stop the node
Apply new configuration
Restart the node
Verify labels with bacalhau node list
Effective use of Bacalhau's label and constraint system enables precise control over workload placement and resource utilization. Follow these best practices:
Use consistent naming conventions
Document your label taxonomy
Regularly audit and clean up unused labels
Test constraints before production deployment
Monitor constraint patterns for optimization opportunities
For additional support, consult the Bacalhau documentation or community resources.
Definitions and usage for Bacalhau terminology
A Compute Node in the Bacalhau platform is responsible for executing jobs and producing results. These nodes are part of a private network that allows workload distribution and communication between computers. Compute Nodes handle various types of jobs based on their capabilities and resources. They work in tandem with Requester Nodes, which manage user requests, discover and rank Compute Nodes and monitor job lifecycles.
A CLI (Command Line Interface) in the Bacalhau platform is a tool that allows users to interact with Bacalhau through text-based commands entered into a terminal or command prompt. The CLI provides a set of commands for managing and executing various tasks on the platform, including submitting jobs, monitoring job status, managing nodes and configuring the environment.
A Data Source in Bacalhau refers to the origin of the data used in jobs. This can include various types of storage such as IPFS, S3, local files or URLs. Data sources are specified in the job configuration and are essential for providing the necessary input data for job execution.
Docker in Bacalhau refers to the use of Docker containers to package and run applications. Docker provides a standardized unit of software, enabling users to create and manage containers efficiently. Bacalhau supports running Docker workloads, allowing users to utilize containerized applications seamlessly on the platform.
The InterPlanetary File System (IPFS) is a protocol and peer-to-peer network for storing and sharing data in a distributed file system. In Bacalhau, IPFS is used as a data source and a way to distribute job inputs and outputs, leveraging its decentralized nature for efficient data management.
A Job in the Bacalhau platform is a unit of work that a user submits for execution. Jobs can be simple tasks or complex workflows involving multiple steps. They are defined by specifications that include the job type, resources required and input/output data. Jobs are managed by Requester Nodes, which ensure they are distributed to appropriate Compute Nodes for execution.
Job Results are the output generated after a job has been executed on a Compute Node. These results can include processed data, logs and any other relevant output files. Results are often stored in specified locations such as IPFS or S3, allowing users to retrieve and utilize them after job completion.
A Node in the Bacalhau is a fundamental component of the network, responsible for executing and managing jobs. A Node is the Bacalhau entity installed Nodes can be classified into different types based on their roles, such as Compute Nodes and Requester Nodes. Each node operates as part of a decentralized network, allowing distributed processing and resource management.
Node Management in Bacalhau involves configuring and maintaining the nodes within the network, including both Compute Nodes and Requester Nodes. This includes tasks like onboarding new nodes, managing node resources, setting access controls and ensuring nodes meet operational standards for job execution.
In the context of the Bacalhau, a Network refers to the interconnected system of nodes that collaborate to execute jobs, manage data and maintain communication. This network is decentralized, meaning it does not rely on a central authority, which enhances its robustness, scalability and efficiency.
The Network Specification in Bacalhau defines the network requirements and settings for job execution. This includes configurations for network access, data transfer protocols and connectivity between nodes. Proper network specification ensures that jobs can communicate effectively and access necessary resources.
Workload Onboarding in Bacalhau is the process of preparing and integrating different types of workloads for execution on the platform. This involves setting up environments for various programming languages, configuring containers and ensuring workloads are optimized for execution across the distributed network of Compute Nodes.
WebAssembly (WASM) in Bacalhau is a binary instruction format for a stack-based virtual machine. WASM is designed for safe and efficient execution, making it a suitable target for compilation from high-level languages. Bacalhau supports running WASM workloads, enabling efficient execution of lightweight and portable code.
A Requester Node in the Bacalhau platform is responsible for handling user requests, discovering and ranking Compute Nodes, forwarding jobs to these nodes and monitoring the lifecycle of the jobs. Requester Nodes play a crucial role in managing the flow of tasks and ensuring they are executed efficiently by the appropriate Compute Nodes in the network.
Amazon Simple Storage Service (S3) is a scalable object storage service. Bacalhau supports S3 as a data source, allowing users to store and retrieve input and output data for jobs. S3's integration with Bacalhau provides robust and reliable storage options for large-scale data processing tasks.
How to run a Bacalhau devstack locally
You can run a stand-alone Bacalhau network on your computer with the following guide.
The devstack
command of bacalhau
will start a 4 node cluster with 3 compute and 1 requester nodes.
This is useful to kick the tires and/or developing on the codebase. It's also the tool used by some tests.
x86_64
or ARM64
architecture
Ubuntu 20.0+ has most often been used for development and testing
Latest Bacalhau release
You can install the Bacalhau CLI by running this command in a terminal:
See the installation guide for more installation options.
This will start a 4 node Bacalhau cluster.
Once everything has started up - you will see output like the following:
Open an additional terminal window to be used for submitting jobs. Copy and paste environment variables from previous message into this window, e.g.:
You are now ready to submit a job to your local devstack.
This will submit a simple job to a single node:
This should output something like the following:
This should output info about job execution and results:
Use bacalhau job get command to download job results:
Results will be downloaded to the current directory. Job results should have the following structure:
If you execute cat stdout
it should read hello devstack test
. If you write any files in your job, they will appear in volumes/output
.
If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel).
By default, for security purposes, Bacalhau jobs run with networking turned off. In order for your compute node to accept (and run) networked jobs, you need to enable it on a per compute node basis. To do so, you need to set the following:
You will then be able to run jobs with the following criteria:
First, you need to describe each node with a labels in a key=value
format. Later labels can be used by the job as conditions for choosing the node on which to run on. For example:
If you want to assign multiple targets, you can do so with key=value,key=value.
The Compute.Orchestrator
field in the config tells the Bacalhau compute node.
You can add the protocol and port, and it will apply this inline. E.g.
Or:
By default, the WebUI for Bacalhau is disabled for security reasons. To enable the WebUI, run the Bacalhau requester node with the following configuration:
You can use the --input
or -i
flag multiple times with multiple different CIDs, URLs or S3 objects, and give each of them a path to be mounted at.
For example, doing bacalhau run cat/main.wasm -i ipfs://CID1:/input1 -i ipfs://CID2:/input2
will result in both the input1
and input2
folders being available to your running WASM with the CID contents. You can use -i
as many times as you need.
Yes! We offer a Bacalhu Docker-in-Docker container. You can set something like this up by running the following container:
These two files are for configuring the Bacalhau node. The first is the orchestrator configuration, and will look something like this:
And the second is just a list of arbitrary key-value pairs for labeling the node. For example:
We recommend building all requirements into your container or WASM before running it. However, if you need to download and install after starting the run, make sure you have the following configuration setting set:
Type the following:
When downloading content to run your code against, it is written to a read-only directory. Unfortunately, by default, SQLite requires the directory to be writable so that it can create utility files during its use.
If you run your command with the immutable
setting set to 1, then it will work. From the sqlite3 command you can use .open 'file:/inputs/database.db?immutable=1'
where you should replace "database.db" with your downloaded database filename.
How to write the config.yaml file to configure your nodes
On installation, Bacalhau creates a .bacalhau
directory that includes a config.yaml
file tailored for your specific settings. This configuration file is the central repository for custom settings for your Bacalhau nodes.
When initializing a Bacalhau node, the system determines its configuration by following a specific hierarchy. First, it checks the default settings, then the config.yaml
file, followed by environment variables, and finally, any command line flags specified during execution. Configurations are set and overridden in that sequence. This layered approach allows the default Bacalhau settings to provide a baseline, while environment variables and command-line flags offer added flexibility. However, the config.yaml
file offers a reliable way to predefine all necessary settings before node creation across environments, ensuring consistency and ease of management.
Modifications to the config.yaml
file are not dynamically applied to existing nodes. A restart of the Bacalhau node is required for any changes to take effect.
Your config.yaml
file starts off empty. However, you can see all available settings using the following command
This command showcases over a hundred configuration parameters related to users, security, metrics, updates, and node configuration, providing a comprehensive overview of the customization options available for your Bacalhau setup.
Let’s go through the different options and how your configuration file is structured.
The bacalhau config list
command displays your configuration paths, segmented with periods to indicate each part you are configuring.
Consider these configuration settings: NameProvider
and Labels
. These settings help set name and labels for your Bacalhau node.
In your config.yaml
, these settings will be formatted like this:
Here are your Bacalhau configuration options in alphabetical order:
Efficient job management and resource optimization are significant considerations. In our continued effort to support scalable distributed computing and data processing, we are excited to introduce job queuing in Bacalhau v1.4.0
.
The Job Queuing feature was only added to the Bacalhau in version 1.4 and is not supported in previous versions. Consider upgrading to the latest version to optimize resource usage with Job Queuing.
Job Queuing allows to deal with the situation when there are no suitable nodes available on the network to execute a job. In this case, a user-defined period of time can be configured for the job, during which the job will wait for suitable nodes to become available or free in the network. This feature enables better flexibility and reliability in managing your distributed workloads.
The job queuing feature is not automatically enabled, and it needs to be explicitly set in your or requester node using the QueueTimeout
parameter. This parameter activates the queuing feature and defines the amount of time your job should wait for available nodes in the network.
Node availability in your network is determined by capacity as well as job constraints such as label selectors, engines or publishers. For example, jobs will be queued if all nodes are currently busy, as well as if idle nodes do not match parameters in your job specification.
Bacalhau compute nodes regularly update their node, resource and health information every 30 seconds to the requester nodes in the network. During this update period, multiple jobs may be allocated to a node, oversubscribing and potentially exceeding its immediate available capacity. A local job queue is created at the compute node, efficiently handling the high demand as resources become available over time.
At the requester node level, you can set default queuing behavior for all jobs by defining the QueueTimeout
parameter in the node's configuration file. Alternatively, within the job specification, you can include the QueueTimeout
parameter directly in the configuration YAML. This flexibility allows you to tailor the queuing behavior to meet the specific needs of your distributed computing environment, ensuring that jobs are efficiently managed and resources are optimally utilized.
Here’s an example requester node configuration that sets the default job queuing time for an hour
The QueueBackoff
parameter determines the interval between retry attempts by the requester node to assign queued jobs.
Here’s a sample job specification setting the QueueTimeout
for this specific job, overwriting any node defaults.
You can also define timeouts for your jobs directly through the CLI using the --queue-timeout
flag. This method provides a convenient way to specify queuing behavior on a per-job basis, allowing you to manage job execution dynamically without modifying configuration files.
For example, here is how you can submit a job with a specified queue timeout using the CLI:
Timeouts in Bacalhau are generally governed by the TotalTimeout
value for your yaml specifications and the --timeout
flag for your CLI commands. The default total timeout value is 30 minutes. Declaring any queue timeout that is larger than that without changing the total timeout value will result in a validation error.
Jobs will be queued when all available nodes are busy and when there is no node that matches your job specifications. Let’s take a look at how queuing will be executed within your network.
Queued Jobs will initially display the Queued
status. Using the bacalhau job describe
command will showcase both the state of the job and the reason behind queuing.
For busy nodes:
For no matching nodes in the network:
Once appropriate node resources become available, these jobs will transition to either a Running
or Completed
status, allowing more jobs to be assigned to matching nodes.
As Bacalhau continues to evolve, our commitment to making distributed computing and data processing more accessible and efficient remains strong. We want to hear what you think about this feature so that we can make Bacalhau better and meet all the diverse needs and requirements of you, our users.
For questions, feedback, please reach out in our Slack.
To view the configuration that bacalhau will receive when a command is executed against it, users can run the command. Users who wish to see Bacalhau’s config represented as YAML may run bacalhau config list --output=yaml
.
In Bacalhau v1.5.0, there have been changes to how Bacalhau handles configuration:
The bacalhau repo ~/.bacalhau
is not the default for the Bacalhau config file.
Bacalhau searches for a default config file. The location is OS-dependent:
Linux: ~/.config/bacalhau/config.yaml
OSX: ~/.config/Application\ Support/bacalhau/config.yaml
Windows: $AppData\bacalhau\config.yaml
. Usually, this is something like C:\Users\username\bacalhau\config.yaml
As described above, bacalhau still has the concept of a default config file, which, for the sake of simplicity, we’ll say lives in ~/.config/bacalhau/config.yaml
. There are two ways this file can be modified:
A text editor vim ~/.config/bacalhau/config.yaml
.
The command.
The --config
(or -c
) flag allows flexible configuration of bacalhau through various methods. You can use this flag multiple times to combine different configuration sources. To specify a config file to bacalhau, users may use the --config
flag, passing a path to a config file for bacalhau to use. When this flag is provided bacalhau will not search for a default config, and will instead use the configuration provided to it by the --config
flag.
In Bacalhau, configuration keys are structured identifiers used to configure and customize the behavior of the application. They represent specific settings that control various aspects of Bacalhau's functionality, such as network parameters, API endpoints, node operations, and user interface options. The configuration file is organized in a tree-like structure using nested mappings (dictionaries) in YAML format. Each level of indentation represents a deeper level in the hierarchy.
Example: part of the config file
In this YAML configuration file:
Top-Level Keys (Categories): API
, Orchestrator
Sub-Level Keys (Subcategories): Under API
, we have Host
and Port
; Under Orchestrator
we have Host
, Port
and NodeManager
Leaf Nodes (Settings): Host
, Port
, NameProvider
, DataDir
, DisconnectTimeout
— these contain the actual configuration values.
Config keys use dot notation to represent the path from the root of the configuration hierarchy down to a specific leaf node. Each segment in the key corresponds to a level in the hierarchy. Syntax is Category.Subcategory(s)...LeafNode
config set
, config list
and --config
The bacalhau config list
returns all keys and their corresponding value. The bacalhau config set
command accepts a key and a value to set it to. The --config
flag accepts a key and a value that will be applied to Bacalhau when it runs.
API Host
Using bacalhau config set
in the Default Config File:Run bacalhau config list
to find the appropriate key
Run the bacalhau config set
command
Observe how bacalhau config list
reflects the new setting
Observe the change has been reflected in the default config file
How to Modify the API Host Using bacalhau config set
a Custom Config File
Run the config set command with the flag
Observe the created config file
Observe the default config and output of bacalhau config list
does not reflect this change.
How to Start Bacalhau With a Custom Config File
--config
Flag The --config
(or -c
) flag allows flexible configuration of bacalhau through various methods. You can use this flag multiple times to combine different configuration sources.
or using the short form:
YAML Config Files: Specify paths to YAML configuration files. Example:
Key-Value Pairs: Set specific configuration values using dot notation. Example:
Boolean Flags: Enable boolean options by specifying the key alone. Example:
When multiple configuration options are provided, they are applied in the following order of precedence (highest to lowest):
Command-line key-value pairs and boolean flags
YAML configuration files
Default values
Within each category, options specified later override earlier ones.
Using a single config file:
Merging multiple config files:
Overriding specific values:
Combining file and multiple overrides:
In the last example, WebUI.Enabled
will be set to true
, API.Host
will be 192.168.1.5
, and other values will be loaded from config.yaml
if present.
Remember, later options override earlier ones, allowing for flexible configuration management.
bacalhau completion
CommandThe bacalhau completion
command will generate shell completion for your shell. You can use the command like:
After running the above command, commands like bacalhau config set
and bacalhau --config
will have auto-completion for all possible configuration values along with their descriptions
If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel).
The message above contains the environment variables you need for a new terminal window. You can paste these into a new terminal so that bacalhau
will use your local devstack. Execute the command to see the devstack cluster structure:
Use command to view results:
Yes! You can run programs using WebAssembly instead. See the for information on how to do that.
If your job writes to stdout, or stderr, while it is running, you can also view the output with the command.
Yes. Given a valid job ID
, you can use the to cancel the job, and stop it from running.
API.Auth.AccessPolicyPath
AccessPolicyPath is the path to a file or directory that will be loaded as the policy to apply to all inbound API requests. If unspecified, a policy that permits access to all API endpoints to both authenticated and unauthenticated users (the default as of v1.2.0) will be used.
API.Auth.Methods
Methods maps "method names" to authenticator implementations. A method name is a human-readable string chosen by the person configuring the system that is shown to users to help them pick the authentication method they want to use. There can be multiple usages of the same Authenticator type but with different configs and parameters, each identified with a unique method name. For example, if an implementation wants to allow users to log in with Github or Bitbucket, they might both use an authenticator implementation of type "oidc", and each would appear once on this provider with key / method name "github" and "bitbucket". By default, only a single authentication method that accepts authentication via client keys will be enabled.
API.Host
Host specifies the hostname or IP address on which the API server listens or the client connects.
API.Port
Port specifies the port number on which the API server listens or the client connects.
API.TLS.AutoCert
AutoCert specifies the domain for automatic certificate generation.
API.TLS.AutoCertCachePath
AutoCertCachePath specifies the directory to cache auto-generated certificates.
API.TLS.CAFile
CAFile specifies the path to the Certificate Authority file.
API.TLS.CertFile
CertFile specifies the path to the TLS certificate file.
API.TLS.Insecure
Insecure allows insecure TLS connections (e.g., self-signed certificates).
API.TLS.KeyFile
KeyFile specifies the path to the TLS private key file.
API.TLS.SelfSigned
SelfSigned indicates whether to use a self-signed certificate.
API.TLS.UseTLS
UseTLS indicates whether to use TLS for client connections.
Compute.AllocatedCapacity.CPU
CPU specifies the amount of CPU a compute node allocates for running jobs. It can be expressed as a percentage (e.g., "85%") or a Kubernetes resource string (e.g., "100m").
Compute.AllocatedCapacity.Disk
Disk specifies the amount of Disk space a compute node allocates for running jobs. It can be expressed as a percentage (e.g., "85%") or a Kubernetes resource string (e.g., "10Gi").
Compute.AllocatedCapacity.GPU
GPU specifies the amount of GPU a compute node allocates for running jobs. It can be expressed as a percentage (e.g., "85%") or a Kubernetes resource string (e.g., "1"). Note: When using percentages, the result is always rounded up to the nearest whole GPU.
Compute.AllocatedCapacity.Memory
Memory specifies the amount of Memory a compute node allocates for running jobs. It can be expressed as a percentage (e.g., "85%") or a Kubernetes resource string (e.g., "1Gi").
Compute.AllowListedLocalPaths
AllowListedLocalPaths specifies a list of local file system paths that the compute node is allowed to access.
Compute.Auth.Token
Token specifies the key for compute nodes to be able to access the orchestrator.
Compute.Enabled
Enabled indicates whether the compute node is active and available for job execution.
Compute.Heartbeat.InfoUpdateInterval
InfoUpdateInterval specifies the time between updates of non-resource information to the orchestrator.
Compute.Heartbeat.Interval
Interval specifies the time between heartbeat signals sent to the orchestrator.
Compute.Heartbeat.ResourceUpdateInterval
ResourceUpdateInterval specifies the time between updates of resource information to the orchestrator.
Compute.Orchestrators
Orchestrators specifies a list of orchestrator endpoints that this compute node connects to.
Compute.TLS.CACert
CACert specifies the CA file path that the compute node trusts when connecting to orchestrator.
Compute.TLS.RequireTLS
RequireTLS specifies if the compute node enforces encrypted communication with orchestrator.
DataDir
DataDir specifies a location on disk where the bacalhau node will maintain state.
DisableAnalytics
DisableAnalytics, when true, disables sharing anonymous analytics data with the Bacalhau development team
Engines.Disabled
Disabled specifies a list of engines that are disabled.
Engines.Types.Docker.ManifestCache.Refresh
Refresh specifies the refresh interval for cache entries.
Engines.Types.Docker.ManifestCache.Size
Size specifies the size of the Docker manifest cache.
Engines.Types.Docker.ManifestCache.TTL
TTL specifies the time-to-live duration for cache entries.
InputSources.Disabled
Disabled specifies a list of storages that are disabled.
InputSources.MaxRetryCount
ReadTimeout specifies the maximum number of attempts for reading from a storage.
InputSources.ReadTimeout
ReadTimeout specifies the maximum time allowed for reading from a storage.
InputSources.Types.IPFS.Endpoint
Endpoint specifies the multi-address to connect to for IPFS. e.g /ip4/127.0.0.1/tcp/5001
JobAdmissionControl.AcceptNetworkedJobs
AcceptNetworkedJobs indicates whether to accept jobs that require network access.
JobAdmissionControl.Locality
Locality specifies the locality of the job input data.
JobAdmissionControl.ProbeExec
ProbeExec specifies the command to execute for probing job submission.
JobAdmissionControl.ProbeHTTP
ProbeHTTP specifies the HTTP endpoint for probing job submission.
JobAdmissionControl.RejectStatelessJobs
RejectStatelessJobs indicates whether to reject stateless jobs, i.e. jobs without inputs.
JobDefaults.Batch.Priority
Priority specifies the default priority allocated to a batch or ops job. This value is used when the job hasn't explicitly set its priority requirement.
JobDefaults.Batch.Task.Publisher.Params
Params specifies the publisher configuration data.
JobDefaults.Batch.Task.Publisher.Type
Type specifies the publisher type. e.g. "s3", "local", "ipfs", etc.
JobDefaults.Batch.Task.Resources.CPU
CPU specifies the default amount of CPU allocated to a task. It uses Kubernetes resource string format (e.g., "100m" for 0.1 CPU cores). This value is used when the task hasn't explicitly set its CPU requirement.
JobDefaults.Batch.Task.Resources.Disk
Disk specifies the default amount of disk space allocated to a task. It uses Kubernetes resource string format (e.g., "1Gi" for 1 gibibyte). This value is used when the task hasn't explicitly set its disk space requirement.
JobDefaults.Batch.Task.Resources.GPU
GPU specifies the default number of GPUs allocated to a task. It uses Kubernetes resource string format (e.g., "1" for 1 GPU). This value is used when the task hasn't explicitly set its GPU requirement.
JobDefaults.Batch.Task.Resources.Memory
Memory specifies the default amount of memory allocated to a task. It uses Kubernetes resource string format (e.g., "256Mi" for 256 mebibytes). This value is used when the task hasn't explicitly set its memory requirement.
JobDefaults.Batch.Task.Timeouts.ExecutionTimeout
ExecutionTimeout is the maximum time allowed for task execution
JobDefaults.Batch.Task.Timeouts.TotalTimeout
TotalTimeout is the maximum total time allowed for a task
JobDefaults.Daemon.Priority
Priority specifies the default priority allocated to a service or daemon job. This value is used when the job hasn't explicitly set its priority requirement.
JobDefaults.Daemon.Task.Resources.CPU
CPU specifies the default amount of CPU allocated to a task. It uses Kubernetes resource string format (e.g., "100m" for 0.1 CPU cores). This value is used when the task hasn't explicitly set its CPU requirement.
JobDefaults.Daemon.Task.Resources.Disk
Disk specifies the default amount of disk space allocated to a task. It uses Kubernetes resource string format (e.g., "1Gi" for 1 gibibyte). This value is used when the task hasn't explicitly set its disk space requirement.
JobDefaults.Daemon.Task.Resources.GPU
GPU specifies the default number of GPUs allocated to a task. It uses Kubernetes resource string format (e.g., "1" for 1 GPU). This value is used when the task hasn't explicitly set its GPU requirement.
JobDefaults.Daemon.Task.Resources.Memory
Memory specifies the default amount of memory allocated to a task. It uses Kubernetes resource string format (e.g., "256Mi" for 256 mebibytes). This value is used when the task hasn't explicitly set its memory requirement.
JobDefaults.Ops.Priority
Priority specifies the default priority allocated to a batch or ops job. This value is used when the job hasn't explicitly set its priority requirement.
JobDefaults.Ops.Task.Publisher.Params
Params specifies the publisher configuration data.
JobDefaults.Ops.Task.Publisher.Type
Type specifies the publisher type. e.g. "s3", "local", "ipfs", etc.
JobDefaults.Ops.Task.Resources.CPU
CPU specifies the default amount of CPU allocated to a task. It uses Kubernetes resource string format (e.g., "100m" for 0.1 CPU cores). This value is used when the task hasn't explicitly set its CPU requirement.
JobDefaults.Ops.Task.Resources.Disk
Disk specifies the default amount of disk space allocated to a task. It uses Kubernetes resource string format (e.g., "1Gi" for 1 gibibyte). This value is used when the task hasn't explicitly set its disk space requirement.
JobDefaults.Ops.Task.Resources.GPU
GPU specifies the default number of GPUs allocated to a task. It uses Kubernetes resource string format (e.g., "1" for 1 GPU). This value is used when the task hasn't explicitly set its GPU requirement.
JobDefaults.Ops.Task.Resources.Memory
Memory specifies the default amount of memory allocated to a task. It uses Kubernetes resource string format (e.g., "256Mi" for 256 mebibytes). This value is used when the task hasn't explicitly set its memory requirement.
JobDefaults.Ops.Task.Timeouts.ExecutionTimeout
ExecutionTimeout is the maximum time allowed for task execution
JobDefaults.Ops.Task.Timeouts.TotalTimeout
TotalTimeout is the maximum total time allowed for a task
JobDefaults.Service.Priority
Priority specifies the default priority allocated to a service or daemon job. This value is used when the job hasn't explicitly set its priority requirement.
JobDefaults.Service.Task.Resources.CPU
CPU specifies the default amount of CPU allocated to a task. It uses Kubernetes resource string format (e.g., "100m" for 0.1 CPU cores). This value is used when the task hasn't explicitly set its CPU requirement.
JobDefaults.Service.Task.Resources.Disk
Disk specifies the default amount of disk space allocated to a task. It uses Kubernetes resource string format (e.g., "1Gi" for 1 gibibyte). This value is used when the task hasn't explicitly set its disk space requirement.
JobDefaults.Service.Task.Resources.GPU
GPU specifies the default number of GPUs allocated to a task. It uses Kubernetes resource string format (e.g., "1" for 1 GPU). This value is used when the task hasn't explicitly set its GPU requirement.
JobDefaults.Service.Task.Resources.Memory
Memory specifies the default amount of memory allocated to a task. It uses Kubernetes resource string format (e.g., "256Mi" for 256 mebibytes). This value is used when the task hasn't explicitly set its memory requirement.
Labels
Labels are key-value pairs used to describe and categorize the nodes.
Logging.Level
Level sets the logging level. One of: trace, debug, info, warn, error, fatal, panic.
Logging.LogDebugInfoInterval
LogDebugInfoInterval specifies the interval for logging debug information.
Logging.Mode
Mode specifies the logging mode. One of: default, json.
NameProvider
NameProvider specifies the method used to generate names for the node. One of: hostname, aws, gcp, uuid, puuid.
Orchestrator.Advertise
Advertise specifies URL to advertise to other servers.
Orchestrator.Auth.Token
Token specifies the key for compute nodes to be able to access the orchestrator
Orchestrator.Cluster.Advertise
Advertise specifies the address to advertise to other cluster members.
Orchestrator.Cluster.Host
Host specifies the hostname or IP address for cluster communication.
Orchestrator.Cluster.Name
Name specifies the unique identifier for this orchestrator cluster.
Orchestrator.Cluster.Peers
Peers is a list of other cluster members to connect to on startup.
Orchestrator.Cluster.Port
Port specifies the port number for cluster communication.
Orchestrator.Enabled
Enabled indicates whether the orchestrator node is active and available for job submission.
Orchestrator.EvaluationBroker.MaxRetryCount
MaxRetryCount specifies the maximum number of times an evaluation can be retried before being marked as failed.
Orchestrator.EvaluationBroker.VisibilityTimeout
VisibilityTimeout specifies how long an evaluation can be claimed before it's returned to the queue.
Orchestrator.Host
Host specifies the hostname or IP address on which the Orchestrator server listens for compute node connections.
Orchestrator.NodeManager.DisconnectTimeout
DisconnectTimeout specifies how long to wait before considering a node disconnected.
Orchestrator.NodeManager.ManualApproval
ManualApproval, if true, requires manual approval for new compute nodes joining the cluster.
Orchestrator.Port
Host specifies the port number on which the Orchestrator server listens for compute node connections.
Orchestrator.Scheduler.HousekeepingInterval
HousekeepingInterval specifies how often to run housekeeping tasks.
Orchestrator.Scheduler.HousekeepingTimeout
HousekeepingTimeout specifies the maximum time allowed for a single housekeeping run.
Orchestrator.Scheduler.QueueBackoff
QueueBackoff specifies the time to wait before retrying a failed job.
Orchestrator.Scheduler.WorkerCount
WorkerCount specifies the number of concurrent workers for job scheduling.
Orchestrator.SupportReverseProxy
SupportReverseProxy configures the orchestrator node to run behind a reverse proxy
Orchestrator.TLS.CACert
CACert specifies the CA file path that the orchestrator node trusts when connecting to NATS server.
Orchestrator.TLS.ServerCert
ServerCert specifies the certificate file path given to NATS server to serve TLS connections.
Orchestrator.TLS.ServerKey
ServerKey specifies the private key file path given to NATS server to serve TLS connections.
Orchestrator.TLS.ServerTimeout
ServerTimeout specifies the TLS timeout, in seconds, set on the NATS server.
Publishers.Disabled
Disabled specifies a list of publishers that are disabled.
Publishers.Types.IPFS.Endpoint
Endpoint specifies the multi-address to connect to for IPFS. e.g /ip4/127.0.0.1/tcp/5001
Publishers.Types.Local.Address
Address specifies the endpoint the publisher serves on.
Publishers.Types.Local.Port
Port specifies the port the publisher serves on.
Publishers.Types.S3.PreSignedURLDisabled
PreSignedURLDisabled specifies whether pre-signed URLs are enabled for the S3 provider.
Publishers.Types.S3.PreSignedURLExpiration
PreSignedURLExpiration specifies the duration before a pre-signed URL expires.
ResultDownloaders.Disabled
Disabled is a list of downloaders that are disabled.
ResultDownloaders.Timeout
Timeout specifies the maximum time allowed for a download operation.
ResultDownloaders.Types.IPFS.Endpoint
Endpoint specifies the multi-address to connect to for IPFS. e.g /ip4/127.0.0.1/tcp/5001
StrictVersionMatch
StrictVersionMatch indicates whether to enforce strict version matching.
UpdateConfig.Interval
Interval specifies the time between update checks, when set to 0 update checks are not performed.
WebUI.Backend
Backend specifies the address and port of the backend API server. If empty, the Web UI will use the same address and port as the API server.
WebUI.Enabled
Enabled indicates whether the Web UI is enabled.
WebUI.Listen
Listen specifies the address and port on which the Web UI listens.