Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
How to configure your Bacalhau node.
Bacalhau employs the viper and cobra libraries for configuration management. Users can configure their Bacalhau node through a combination of command-line flags, environment variables, and the dedicated configuration file.
Bacalhau manages its configuration, metadata, and internal state within a specialized repository named .bacalhau
. Serving as the heart of the Bacalhau node, this repository holds the data and settings that determine node behavior. It's located on the filesystem, and by default, Bacalhau initializes this repository at $HOME/.bacalhau
, where $HOME
is the home directory of the user running the bacalhau process.
To customize this location, users can:
Set the BACALHAU_DIR
environment variable to specify their desired path.
Utilize the --repo
command line flag to specify their desired path.
Upon executing a Bacalhau command for the first time, the system will initialize the .bacalhau
repository. If such a repository already exists, Bacalhau will seamlessly access its contents.
Structure of a Newly Initialized .bacalhau
Repository
.bacalhau
repository:This repository comprises four directories and seven files:
user_id.pem
:
This file houses the Bacalhau node user's cryptographic private key, used for signing requests sent to a Requester Node.
Format: PEM.
repo.version
:
Indicates the version of the Bacalhau node's repository.
Format: JSON, e.g., {"Version":1}
.
libp2p_private_key
:
Stores the Bacalhau node's libp2p private key, essential for its network identity. The NodeID of a Bacalhau node is derived from this key.
Format: Base64 encoded RSA private key.
config.yaml
:
Contains configuration settings for the Bacalhau node.
Format: YAML.
update.json
:
A file containing the date/time when the last version check was made.
Format: JSON, e.g., {"LastCheck":"2024-01-24T11:06:14.631816Z"}
tokens.json
:
A file containing the tokens obtained through authenticating with bacalhau clusters.
QmdGUjsMHEgtAfdtw7U62yPEcAZFtA33tKMsczLToegZtv-compute
:
Contains the BoltDB executions.db
database, which aids the Compute node in state persistence. Additionally, the jobStats.json
file records the Compute Node's completed jobs tally.
Note: The segment QmdGUjsMHEgtAfdtw7U62yPEcAZFtA33tKMsczLToegZtv
is a unique NodeID for each Bacalhau node, derived from the libp2p_private_key
.
QmdGUjsMHEgtAfdtw7U62yPEcAZFtA33tKMsczLToegZtv-requester
:
Contains the BoltDB jobs.db
database for the Requester node's state persistence.
Note: NodeID derivation is similar to the Compute directory.
executor_storages
:
Storage for data handled by Bacalhau storage drivers.
plugins
:
Houses binaries that allow the Compute node to execute specific tasks.
Note: This feature is currently experimental and isn't active during standard node operations.
Within a .bacalhau
repository, a config.yaml
file may be present. This file serves as the configuration source for the bacalhau node and adheres to the YAML format.
Although the config.yaml
file is optional, its presence allows Bacalhau to load custom configurations; otherwise, Bacalhau is configured with built-in default values, environment variables and command line flags.
Modifications to the config.yaml
file will not be dynamically loaded by the Bacalhau node. A restart of the node is required for any changes to take effect. Bacalhau determines its configuration based on the following precedence order, with each item superseding the subsequent:
Command-line Flag
Environment Variable
Config File
Defaults
config.yaml
and Bacalhau Environment VariablesBacalhau establishes a direct relationship between the value-bearing keys within the config.yaml
file and corresponding environment variables. For these keys that have no further sub-keys, the environment variable name is constructed by capitalizing each segment of the key, and then joining them with underscores, prefixed with BACALHAU_
.
For example, a YAML key with the path Node.IPFS.Connect
translates to the environment variable BACALHAU_NODE_IPFS_CONNECT
and is represented in a file like:
There is no corresponding environment variable for either Node
or Node.IPFS
. Config values may also have other environment variables that set them for simplicity or to maintain backwards compatibility.
Bacalhau leverages the BACALHAU_ENVIRONMENT
environment variable to determine the specific environment configuration when initializing a repository. Notably, if a .bacalhau
repository has already been initialized, the BACALHAU_ENVIRONMENT
setting will be ignored.
By default, if the BACALHAU_ENVIRONMENT
variable is not explicitly set by the user, Bacalhau will adopt the production
environment settings.
Below is a breakdown of the configurations associated with each environment:
1. Production (public network)
Environment Variable: BACALHAU_ENVIRONMENT=production
Configurations:
Node.ClientAPI.Host
: "bootstrap.production.bacalhau.org"
Node.Client.API.Host
: 1234
...other configurations specific to this environment...
2. Staging (staging network)
Environment Variable: BACALHAU_ENVIRONMENT=staging
Configurations:
Node.ClientAPI.Host
: "bootstrap.staging.bacalhau.org"
Node.Client.API.Host
: 1234
...other configurations specific to this environment...
3. Development (development network)
Environment Variable: BACALHAU_ENVIRONMENT=development
Configurations:
Node.ClientAPI.Host
: "bootstrap.development.bacalhau.org"
Node.Client.API.Host
: 1234
...other configurations specific to this environment...
4. Local (private or local networks)
Environment Variable: BACALHAU_ENVIRONMENT=local
Configurations:
Node.ClientAPI.Host
: "0.0.0.0"
Node.Client.API.Host
: 1234
...other configurations specific to this environment...
Note: The above configurations provided for each environment are not exhaustive. Consult the specific environment documentation for a comprehensive list of configurations.
Or
How to configure compute/requester persistence
Both compute nodes, and requester nodes, maintain state. How that state is maintained is configurable, although the defaults are likely adequate for most use-cases. This page describes how to configure the persistence of compute and requester nodes should the defaults not be suitable.
The computes nodes maintain information about the work that has been allocated to them, including:
The current state of the execution, and
The original job that resulted in this allocation
This information is used by the compute and requester nodes to ensure allocated jobs are completed successfully. By default, compute nodes store their state in a bolt-db database and this is located in the bacalhau repository along with configuration data. For a compute node whose ID is "abc", the database can be found in ~/.bacalhau/abc-compute/executions.db
.
In some cases, it may be preferable to maintain the state in memory, with the caveat that should the node restart, all state will be lost. This can be configured using the environment variables in the table below.
Environment Variable | Flag alternative | Value | Effect |
---|---|---|---|
When running a requester node, it maintains state about the jobs it has been requested to orchestrate and schedule, the evaluation of those jobs, and the executions that have been allocated. By default, this state is stored in a bolt db database that, with a node ID of "xyz" can be found in ~/.bacalhau/xyz-requester/jobs.db
.
Environment Variable | Flag alternative | Value | Effect |
---|---|---|---|
How to enable GPU support on your Bacalhau node
Bacalhau supports GPUs out of the box and defaults to allowing execution on all GPUs installed on the node.
Bacalhau makes the assumption that you have installed all the necessary drivers and tools on your node host and have appropriately configured them for use by Docker.
In general for GPUs from any vendor, the Bacalhau client requires:
Verify installation by
nvidia-smi
installed and functional
rocm-smi
tool installed and functional
See the for guidance on how to run Docker workloads on AMD GPU.
xpu-smi
tool installed and functional
These are the flags that control the capacity of the Bacalhau node, and the limits for jobs that might be run.
The --limit-total-*
flags control the total system resources you want to give to the network. If left blank, the system will attempt to detect these values automatically.
The --limit-job-*
flags control the maximum amount of resources a single job can consume for it to be selected for execution.
Resource limits are not supported for Docker jobs running on Windows. Resource limits will be applied at the job bid stage based on reported job requirements but will be silently unenforced. Jobs will be able to access as many resources as requested at runtime.
Running a Windows-based node is not officially supported, so your mileage may vary. Some features (like ) are not present in Windows-based nodes.
Bacalhau currently makes the assumption that all containers are Linux-based. Users of the Docker executor will need to manually ensure that their Docker engine is running and to support Linux containers, e.g. using the WSL-based backend.
Bacalhau can limit the total time a job spends executing. A job that spends too long executing will be cancelled, and no results will be published.
By default, a Bacalhau node does not enforce any limit on job execution time. Both node operators and job submitters can supply a maximum execution time limit. If a job submitter asks for a longer execution time than permitted by a node operator, their job will be rejected.
Job submitters can pass the --timeout
flag to any Bacalhau job submission CLI to set a maximum job execution time. The supplied value should be a whole number of seconds with no unit.
The timeout can also be added to an existing job spec by adding the Timeout
property to the Spec
.
Node operators can pass the --max-job-execution-timeout
flag to bacalhau serve
to configure the maximum job time limit. The supplied value should be a numeric value followed by a time unit (one of s
for seconds, m
for minutes or h
for hours).
Node operators can also use configuration properties to configure execution limits.
Compute nodes will use the properties:
Requester nodes will use the properties:
Bacalhau has two ways to make use of external storage providers: Sources and Publishers. Sources storage resources consumed as inputs to jobs. And Publishers storage resources created with the results of jobs.
Bacalhau allows you to use S3 or any S3-compatible storage service as an input source. Users can specify files or entire prefixes stored in S3 buckets to be fetched and mounted directly into the job execution environment. This capability ensures that your jobs have immediate access to the necessary data. See the for more details.
To use the S3 source, you will have to to specify the mandatory name of the S3 bucket and the optional parameters Key, Filter, Region, Endpoint, VersionID and ChechsumSHA256.
Below is an example of how to define an S3 input source in YAML format:
To start, you'll need to connect the Bacalhau node to an IPFS server so that you can run jobs that consume CIDs as inputs. You can either and run it locally, or you can connect to a remote IPFS server.
In both cases, you should have an for the IPFS server that should look something like this:
The multiaddress above is just an example - you'll need to get the multiaddress of the IPFS server you want to connect to.
You can then configure your Bacalhau node to use this IPFS server by passing the --ipfs-connect
argument to the serve
command:
Or, set the Node.IPFS.Connect
property in the Bacalhau configuration file. See the for more details.
Below is an example of how to define an IPFS input source in YAML format:
To use a local data source, you will have to to:
Enable the use of local data when configuring the node itself by using the --allow-listed-local-paths
flag for bacalhau serve, specifying the file path and access mode. For example
In the job description specify parameters SourcePath - the absolute path on the compute node where your data is located and ReadWrite - the access mode.
Below is an example of how to define a Local input source in YAML format:
To use a URL data source, you will have to to specify only URL parameter, as in the part of the declarative job description below:
Bacalhau's S3 Publisher provides users with a secure and efficient method to publish job results to any S3-compatible storage service. To use an S3 publisher you will have to specify required parameters Bucket and Key and optional parameters Region, Endpoint, VersionID, ChecksumSHA256. See the for more details.
Here’s an example of the part of the declarative job description that outlines the process of using the S3 Publisher with Bacalhau:
The IPFS publisher works using the same setup as - you'll need to have an IPFS server running and a multiaddress for it. Then you'll pass that multiaddress using the --ipfs-connect
argument to the serve
command. If you are publishing to a public IPFS node, you can use bacalhau job get
with no further arguments to download the results. However, you may experience a delay in results becoming available as indexing of new data by public nodes takes time.
To use the IPFS publisher you will have to specify CID which can be used to access the published content. See the for more details.
To speed up the download or to retrieve results from a private IPFS node, pass the swarm multiaddress to bacalhau job get
to download results.
Pass the swarm key to bacalhau job get
if the IPFS swarm is a private swarm.
And part of the declarative job description with an IPFS publisher will look like this:
The Local Publisher should not be used for Production use as it is not a reliable storage option. For production use, we recommend using a more reliable option such as an S3-compatible storage service.
Here is an example of part of the declarative job description with a local publisher:
When running a node, you can choose which jobs you want to run by using configuration options, environment variables or flags to specify a job selection policy.
If you want more control over making the decision to take on jobs, you can use the --job-selection-probe-exec
and --job-selection-probe-http
flags.
These are external programs that are passed the following data structure so that they can make a decision about whether or not to take on a job:
The exec
probe is a script to run that will be given the job data on stdin
, and must exit with status code 0 if the job should be run.
The http
probe is a URL to POST the job data to. The job will be rejected if the HTTP request returns a non-positive status code (e.g. >= 400).
For example, the following response will reject the job:
If the HTTP response is not a JSON blob, the content is ignored and any non-error status code will accept the job.
Before you join the main Bacalhau network, you can test locally.
To test, you can use the bacalhau devstack
command, which offers a way to get a 3 node cluster running locally.
By settings PREDICTABLE_API_PORT=1
, the first node of our 3 node cluster will always listen on port 20000
In another window, export the following environment variables so that the Bacalhau client binary connects to our local development cluster:
You can now interact with Bacalhau - all jobs are running by the local devstack cluster.
How to configure authentication and authorization on your Bacalhau node.
Bacalhau includes a flexible auth system that supports multiple methods of auth that are appropriate for different deployment environments.
With no specific authentication configuration supplied, Bacalhau runs in "anonymous mode" – which allows unidentified users limited control over the system. "Anonymous mode" is only appropriate for testing or evaluation setups.
In anonymous mode, Bacalhau will allow:
Users identified by a self-generated private key to submit any job and cancel their own jobs.
Users not identified by any key to access other read-only endpoints, such as to read job lists, describe jobs, and query node or agent information.
Bacalhau auth is controlled by policies. Configuring the auth system is done by supplying a different policy file.
Restricting API access to only users that have authenticated requires specifying a new authorization policy. You can download a policy that restricts anonymous access and install it by using:
Once the node is restarted, accessing the node APIs will require the user to be authenticated, but by default will still allow users with a self-generated key to authenticate themselves.
Restricting the list of keys that can authenticate to only a known set requires specifying a new authentication policy. You can download a policy that restricts key-based access and install it by using:
Then, modify the allowed_clients
variable in challange_ns_no_anon.rego
to include acceptable client IDs, found by running bacalhau agent node
.
Once the node is restarted, only keys in the allowed list will be able to access any API.
Users can authenticate using a username and password instead of specifying a private key for access. Again, this requires installation of an appropriate policy on the server.
Passwords are not stored in plaintext and are salted. The downloaded policy expects password hashes and salts generated by scrypt
. To generate a salted password, the helper script in pkg/authn/ask/gen_password
can be used:
This will ask for a password and generate a salt and hash to authenticate with it. Add the encoded username, salt and hash into the ask_ns_password.rego
.
In principle, Bacalhau can implement any auth scheme that can be described in a structured way by a policy file.
Bacalhau will pass information pertinent to the current request into every authentication policy query as a field on the input
variable. The exact information depends on the type of authentication used.
challenge
authenticationchallenge
authentication uses identifies the user by the presence of a private key. The user is asked to sign an input phrase to prove they have the key they are identifying with.
Policies used for challenge
authentication do not need to actually implement the challenge verification logic as this is handled by the core code. Instead, they will only be invoked if this verification passes.
Policies for this type will need to implement these rules:
bacalhau.authn.token
: if the user should be authenticated, an access token they should use in subsequent requests. If the user should not be authenticated, should be undefined.
They should expect as fields on the input
variable:
clientId
: an ID derived from the user's private key that identifies them uniquely
nodeId
: the ID of the requester node that this user is authenticating with
signingKey
: the private key (as a JWK) that should be used to sign any access tokens to be returned
The simplest possible policy might therefore be this policy that returns the same opaque token for all users:
ask
authenticationask
authentication uses credentials supplied manually by the user as identification. For example, an ask
policy could require a username and password as input and check these against a known list. ask
policies do all the verification of the supplied credentials.
Policies for this type will need to implement these rules:
bacalhau.authn.token
: if the user should be authenticated, an access token they should use in subsequent requests. If the user should not be authenticated, should be undefined.
bacalhau.authn.schema
: a static JSON schema that should be used to collect information about the user. The type
of declared fields may be used to pick the input method, and if a field is marked as writeOnly
then it will be collected in a secure way (e.g. not shown on screen). The schema
rule does not receive any input
data.
They should expect as fields on the input
variable:
ask
: a map of field names from the JSON schema to strings supplied by the user. The policy should validate these credentials.
nodeId
: the ID of the requester node that this user is authenticating with
signingKey
: the private key (as a JWK) that should be used to sign any access tokens to be returned
The simplest possible policy might therefore be one that asks for no data and returns the same opaque token for every user:
Authorization policies do not vary depending on the type of authentication used – Bacalhau uses one authz policy for all API requests.
Authz policies are invoked for every API request. Authz policies should check the validity of any supplied access tokens and issue an authz decision for the requested API endpoint. It is not required that authz policies enforce that an access token is present – they may choose to grant access to unauthorized users.
Policies will need to implement these rules:
bacalhau.authz.token_valid
: true if the access token in the request is "valid" (but does not necessarily grant access for this request), or false if it is invalid for every request (e.g. because it has expired) and should be discarded.
bacalhau.authz.allow
: true if the user should be permitted to carry out the input request, false otherwise.
They should expect as fields on the input
variable for both rules:
http
: details of the user's HTTP request:
host
: the hostname used in the HTTP request
method
: the HTTP method (e.g. GET
, POST
)
path
: the path requested, as an array of path components without slashes
query
: a map of URL query parameters to their values
headers
: a map of HTTP header names to arrays representing their values
body
: a blob of any content submitted as the body
constraints
: details about the receiving node that should be used to validate any supplied tokens:
cert
: keys that the input token should have been signed with
iss
: the name of a node that this node will recognize as the issuer of any signed tokens
aud
: the name of this node that is receiving the request
Notably, the constraints
data is appropriate to be passed directly to the Rego io.jwt.decode_verify
method which will validate the access token as a JWT against the given constraints.
The simplest possible authz policy might be this one that allows all users to access all endpoints:
This tutorial describes how to add new nodes to an existing private network. Two basic scenarios will be covered:
Adding a machine as a new node.
Adding a as a new node.
You should have an established private network consisting of at least one requester node. See the guide to set one up.
You should have a new host (physical/virtual machine, cloud instance or docker container) with installed.
Let's assume that you already have a private network with at least one requester node. In this case, the process of adding new nodes follows the section. You will need to:
Set the token in the node.network.authsecret
parameter
Execute bacalhau serve
specifying the node type
and orchestrator
address via flags. You can find an example of such a command in the logs of the requester node, here is how it might look like:
Remember that in this example you need to replace all 127.0.0.1 and 0.0.0.0.0 addresses with the actual public IP address of your node.
To automate the process using Terraform follow these steps:
Determine the IP address of your requester node
Write a terraform script, which does the following:
Adds a new instance
Installs bacalhau
on it
Launches a compute node
Execute the script
How to configure TLS for the requester node APIs
By default, the requester node APIs used by the Bacalhau CLI are accessible over HTTP, but it is possible to configure it to use Transport Level Security (TLS) so that they are accessible over HTTPS instead. There are several ways to obtain the necessary certificates and keys, and Bacalhau supports obtaining them via ACME and Certificate Authorities or even self-signing them.
Once configured, you must ensure that instead of using http://IP:PORT you use https://IP:PORT to access the Bacalhau API
Automatic Certificate Management Environment (ACME) is a protocol that allows for automating the deployment of Public Key Infrastructure, and is the protocol used to obtain a free certificate from the Certificate Authority.
Using the --autocert [hostname]
parameter to the CLI (in the serve
and devstack
commands), a certificate is obtained automatically from Lets Encrypt. The provided hostname should be a comma-separated list of hostnames, but they should all be publicly resolvable as Lets Encrypt will attempt to connect to the server to verify ownership (using the challenge). On the very first request this can take a short time whilst the first certificate is issued, but afterwards they are then cached in the bacalhau repository.
Alternatively, you may set these options via the environment variable, BACALHAU_AUTO_TLS
. If you are using a configuration file, you can set the values inNode.ServerAPI.TLS.AutoCert
instead.
As a result of the Lets Encrypt verification step, it is necessary for the server to be able to handle requests on port 443. This typically requires elevated privileges, and rather than obtain these through a privileged account (such as root), you should instead use setcap to grant the executable the right to bind to ports <1024.
A cache of ACME data is held in the config repository, by default ~/.bacalhau/autocert-cache
, and this will be used to manage renewals to avoid rate limits.
Obtaining a TLS certificate from a Certificate Authority (CA) without using the Automated Certificate Management Environment (ACME) protocol involves a manual process that typically requires the following steps:
Choose a Certificate Authority: First, you need to select a trusted Certificate Authority that issues TLS certificates. Popular CAs include DigiCert, GlobalSign, Comodo (now Sectigo), and others. You may also consider whether you want a free or paid certificate, as CAs offer different pricing models.
Generate a Certificate Signing Request (CSR): A CSR is a text file containing information about your organization and the domain for which you need the certificate. You can generate a CSR using various tools or directly on your web server. Typically, this involves providing details such as your organization's name, common name (your domain name), location, and other relevant information.
Submit the CSR: Access your chosen CA's website and locate their certificate issuance or order page. You'll typically find an option to "Submit CSR" or a similar option. Paste the contents of your CSR into the provided text box.
Verify Domain Ownership: The CA will usually require you to verify that you own the domain for which you're requesting the certificate. They may send an email to one of the standard domain-related email addresses (e.g., admin@yourdomain.com, webmaster@yourdomain.com). Follow the instructions in the email to confirm domain ownership.
Complete Additional Verification: Depending on the CA's policies and the type of certificate you're requesting (e.g., Extended Validation or EV certificates), you may need to provide additional documentation to verify your organization's identity. This can include legal documents or phone calls from the CA to confirm your request.
Payment and Processing: If you're obtaining a paid certificate, you'll need to make the payment at this stage. Once the CA has received your payment and completed the verification process, they will issue the TLS certificate.
Once you have obtained your certificates, you will need to put two files in a location that bacalhau can read them. You need the server certificate, often called something like server.cert
or server.cert.pem
, and the server key which is often called something like server.key
or server.key.pem
.
Once you have these two files available, you must start bacalhau serve
which two new flags. These are tlscert
and tlskey
flags, whose arguments should point to the relevant file. An example of how it is used is:
Alternatively, you may set these options via the environment variables, BACALHAU_TLS_CERT
and BACALHAU_TLS_KEY
. If you are using a configuration file, you can set the values inNode.ServerAPI.TLS.ServerCertificate
and Node.ServerAPI.TLS.ServerKey
instead.
Once you have generated the necessary files, the steps are much like above, you must start bacalhau serve
which two new flags. These are tlscert
and tlskey
flags, whose arguments should point to the relevant file. An example of how it is used is:
Alternatively, you may set these options via the environment variables, BACALHAU_TLS_CERT
and BACALHAU_TLS_KEY
. If you are using a configuration file, you can set the values inNode.ServerAPI.TLS.ServerCertificate
and Node.ServerAPI.TLS.ServerKey
instead.
If you use self-signed certificates, it is unlikely that any clients will be able to verify the certificate when connecting to the Bacalhau APIs. There are three options available to work around this problem:
Provide a CA certificate file of trusted certificate authorities, which many software libraries support in addition to system authorities.
Install the CA certificate file in the system keychain of each machine that needs access to the Bacalhau APIs.
Instruct the software library you are using not to verify HTTPS requests.
See the for guidance on how to run Docker workloads on Intel GPU.
Access to GPUs can be controlled using . To limit the number of GPUs that can be used per job, set a job resource limit. To limit access to GPUs from all jobs, set a total resource limit.
Applying job timeouts allows node operators to more fairly distribute the work submitted to their nodes. It also protects users from transient errors that result in their jobs waiting indefinitely.
Config property | Meaning |
---|
Config property | Meaning |
---|
The Local input source allows Bacalhau jobs to access files and directories that are already present on the compute node. This is especially useful for utilizing locally stored datasets, configuration files, logs, or other necessary resources without the need to fetch them from a remote source, ensuring faster job initialization and execution. See the for more details.
The URL Input Source provides a straightforward method for Bacalhau jobs to access and incorporate data available over HTTP/HTTPS. By specifying a URL, users can ensure the required data, whether a single file or a web page content, is retrieved and prepared in the job's execution environment, enabling direct and efficient data utilization. See the for more details.
Another possibility to store the results of a job execution is on a compute node. In such case the results will be published to the local compute node, and stored as compressed tar file, which can be accessed and retrieved over HTTP from the command line using the get command. To use the Local publisher you will have to specify the only URL parameter with a HTTP URL to the location where you would like to save the result. See the for more details.
If the HTTP response is a JSON blob, it should match the and will be used to respond to the bid directly:
Policies are written in a language called , also used by Kubernetes. Users who want to write their own policies should get familiar with the Rego language.
A more realistic example that returns a signed JWT is in .
A more realistic example that returns a signed JWT is in .
A more realistic example (which is the Bacalhau "anonymous mode" default) is in .
Let's assume you already have all the necessary cloud infrastructure set up with a private network with at least one requester node. In this case, you can add new nodes manually (, , ) or use a tool like to automatically create and add any number of nodes to your network. The process of adding new nodes manually follows the section.
Configure terraform for
If you have questions or need support or guidance, please reach out to the (#general channel).
If you wish, it is possible to use Bacalhau with a self-signed certificate which does not rely on an external Certificate Authority. This is an involved process and so is not described in detail here although there is which should provide a good starting point.
BACALHAU_COMPUTE_STORE_TYPE
--compute-execution-store-type
boltdb
Uses the bolt db execution store (default)
BACALHAU_COMPUTE_STORE_PATH
--compute-execution-store-path
A path (inc. filename)
Specifies where the boltdb database should be stored. Default is ~/.bacalhau/{NODE-ID}-compute/executions.db
if not set
BACALHAU_JOB_STORE_TYPE
--requester-job-store-type
boltdb
Uses the bolt db job store (default)
BACALHAU_JOB_STORE_PATH
--requester-job-store-path
A path (inc. filename)
Specifies where the boltdb database should be stored. Default is ~/.bacalhau/{NODE-ID}-requester/jobs.db
if not set
| The minimum acceptable value for a job timeout. A job will only be accepted if it is submitted with a timeout of longer than this value. |
| The maximum acceptable value for a job timeout. A job will only be accepted if it is submitted with a timeout of shorter than this value. |
| The job timeout that will be applied to jobs that are submitted without a timeout value. |
| If a job is submitted with a timeout less than this value, the default job execution timeout will be used instead. |
| The timeout to use in the job if a timeout is missing or too small. |
How to run the WebUI.
The Bacalhau WebUI offers an intuitive interface for interacting with the Bacalhau network. This guide provides comprehensive instructions for setting up, deploying, and utilizing the WebUI.
For contributing to the WebUI's development, please refer to the Bacalhau WebUI GitHub Repository.
Ensure you have a Bacalhau v1.1.7 or later installed.
To launch the WebUI locally, execute the following command:
This command initializes a requester and compute node, configured to listen on HOST=0.0.0.0
and PORT=1234
.
Once started, the WebUI is accessible at (http://127.0.0.1/
). This local instance allows you to interact with your local Bacalhau network setup.
For observational purposes, a development version of the WebUI is available at bootstrap.development.bacalhau.org. This instance displays jobs from the development server.
N.b. The development version of the WebUI is for observation only and may not reflect the latest changes or features available in the local setup.
Config property |
| Default value | Meaning |
Node.Compute.JobSelection.Locality |
| Anywhere | Only accept jobs that reference data we have locally ("local") or anywhere ("anywhere"). |
Node.Compute.JobSelection.ProbeExec |
| unused | Use the result of an external program to decide if we should take on the job. |
Node.Compute.JobSelection.ProbeHttp |
| unused | Use the result of a HTTP POST to decide if we should take on the job. |
Node.Compute.JobSelection.RejectStatelessJobs |
| False |
Node.Compute.JobSelection.AcceptNetworkedJobs |
| False |
Set up private IPFS network
Note that currently Bacalhau v1.4.0
supports IPFS v0.27
and below. Support for later versions of IPFS will be added in the next versions.
Support for the embedded node was in v1.4.0
to streamline communication and reduce overhead. Therefore, now in order to use a private IPFS network, it is necessary to create it yourself and then connect to it with nodes. This manual describes how to:
Install and configure IPFS
Create Private IPFS network
Configure your to use the private IPFS network
Pin your data to private IPFS network
Install on all nodes
Install
Initialize Private IPFS network
Connect all nodes to the same private network
Connect Bacalhau network to use private IPFS network
Remove any previous Go installation by deleting the /usr/local/go
folder (if it exists), then extract the archive you downloaded into /usr/local
, creating a fresh Go tree in /usr/local/go
:
Add /usr/local/go/bin
to the PATH
environment variable. You can do this by adding the following line to your $HOME/.profile
or /etc/profile
(for a system-wide installation):
Changes made to a profile file may not apply until the next time you log into the system. To apply the changes immediately, just run the shell commands directly or execute them from the profile using a command such as source $HOME/.profile
.
Verify that Go is installed correctly by checking its version:
Verify that IPFS
is installed correctly by checking its version:
A bootstrap node is used by client nodes to connect to the private IPFS network. The bootstrap connects clients to other nodes available on the network.
Execute the ipfs init
command to initialize an IPFS node:
The next step is to generate the swarm key - a cryptographic key that is used to control access to an IPFS network, and export the key into a swarm.key
file, located in the ~/ipfs
folder.
Now the default entries of bootstrap nodes should be removed. Execute the command on all nodes:
Check that bootstrap config does not contain default values:
Configure IPFS to listen for incoming connections on specific network addresses and ports, making the IPFS Gateway and API services accessible. Consider changing addresses and ports depending on the specifics of your network.
Start the IPFS daemon:
Copy the swarm.key
file from the bootstrap node to client nodes into the ~/.ipfs/
folder and initialize IPFS:
Apply same config as on bootstrap node and start the daemon:
Done! Now you can check that private IPFS network works properly:
List peers on the bootstrap node. It should list all connected nodes:
Pin some files and check their availability across the network:
systemd
ServiceFinally, make the IPFS daemon run at system startup. To do this:
Create new service unit file in the /etc/systemd/system/
Add following content to the file, replacing /path/to/your/ipfs/executable
with the actual path
Use which ipfs
command to locate the executable.
Usually path to the executable is /usr/local/bin/ipfs
For security purposes, consider creating a separate user to run the service. In this case, specify its name in the User=
line. Without specifying user, the ipfs service will be launched with root
, which means that you will need to copy the ipfs binary to the /root
directory
Reload and enable the service
Done! Now reboot the machine to ensure that daemon starts correctly. Use systemctl status ipfs
command to check that service is running:
Now to connect your private Bacalhau network to the private IPFS network, the IPFS API address should be specified using the --ipfs-connect
flag. It can be found in the ~/.ipfs/api
file:
Done! Now your private Bacalhau network is connected to the private IPFS network!
To verify that everything works correctly:
Pin the file to the private IPFS network
Run the job, which takes the pinned file as input and publishes result to the private IPFS network
View and download job results
Create any file and pin it. Use the ipfs add
command:
Run a simple job, which fetches the pinned file via its CID, lists its content and publishes results back into the private IPFS network:
Use the ipfs ls
command to view the results:
Use the ipfs cat
command to view the file content. In our case, the file of interest is the stdout
:
Use the ipfs get
command to download the file using its CID:
Reject jobs that don't specify any .
Accept jobs that require .
In this manual (the earliest and most widely used implementation of IPFS) will be used, so first of all, should be installed.
See the page for latest Go version.
The next step is to download and install Kubo. the appropriate version for your system. It is recommended to use the latest stable version.
Use command to view job execution results:
Use command to download job results. In this particular case, ipfs
publisher was used, so the get command will print the CID
of the job results:
For questions and feedback, please reach out in our