Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This tutorial describes how to add new nodes to an existing private network. Two basic scenarios will be covered:
Adding a physical host/virtual machine as a new node.
Adding a cloud instance as a new node.
You should have an established private network consisting of at least one requester node. See the Create Private Network guide to set one up.
You should have a new host (physical/virtual machine, cloud instance or docker container) with Bacalhau installed.
Let's assume that you already have a private network with at least one requester node. In this case, the process of adding new nodes follows the Create And Connect Compute Node section. You will need to:
Set the token in the Compute.Auth.Token
configuration key
Set the orchestrators IP address in the Compute.Orchestrators
configuration key
Execute bacalhau serve
specifying the node type via --orchestrator
flag
Let's assume you already have all the necessary cloud infrastructure set up with a private network with at least one requester node. In this case, you can add new nodes manually (AWS, Azure, GCP) or use a tool like Terraform to automatically create and add any number of nodes to your network. The process of adding new nodes manually follows the Create And Connect Compute Node section.
To automate the process using Terraform follow these steps:
Configure terraform for your cloud provider
Determine the IP address of your requester node
Write a terraform script, which does the following:
Adds a new instance
Installs bacalhau
on it
Launches a compute node
Execute the script
If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel).
Bacalhau has two ways to make use of external storage providers: Sources and Publishers. Sources are storage resources consumed as inputs to jobs. And Publishers are storage resources created with the results of jobs.
Bacalhau allows you to use S3 or any S3-compatible storage service as an input source. Users can specify files or entire prefixes stored in S3 buckets to be fetched and mounted directly into the job execution environment. This capability ensures that your jobs have immediate access to the necessary data. See the S3 source specification for more details.
To use the S3 source, you will have to to specify the mandatory name of the S3 bucket and the optional parameters Key, Filter, Region, Endpoint, VersionID and ChechsumSHA256.
Below is an example of how to define an S3 input source in YAML format:
To start, you'll need to connect the Bacalhau node to an IPFS server so that you can run jobs that consume CIDs as inputs. You can either install IPFS and run it locally, or you can connect to a remote IPFS server.
In both cases, you should have an IPFS multiaddress for the IPFS server that should look something like this:
The multiaddress above is just an example - you'll need to get the multiaddress of the IPFS server you want to connect to.
You can then configure your Bacalhau node to use this IPFS server by adding the address to the InputSources.Types.IPFS.Endpoint
configuration key:
See the IPFS input source specification for more details.
Below is an example of how to define an IPFS input source in YAML format:
The Local input source allows Bacalhau jobs to access files and directories that are already present on the compute node. This is especially useful for utilizing locally stored datasets, configuration files, logs, or other necessary resources without the need to fetch them from a remote source, ensuring faster job initialization and execution. See the Local source specification for more details.
To use a local data source, you will have to to:
Enable the use of local data when configuring the node itself by using the Compute.AllowListedLocalPaths
configuration key, specifying the file path and access mode. For example
In the job description specify parameters SourcePath - the absolute path on the compute node where your data is located and ReadWrite - the access mode.
Below is an example of how to define a Local input source in YAML format:
The URL Input Source provides a straightforward method for Bacalhau jobs to access and incorporate data available over HTTP/HTTPS. By specifying a URL, users can ensure the required data, whether a single file or a web page content, is retrieved and prepared in the job's execution environment, enabling direct and efficient data utilization. See the URL source specification for more details.
To use a URL data source, you will have to to specify only URL parameter, as in the part of the declarative job description below:
Bacalhau's S3 Publisher provides users with a secure and efficient method to publish job results to any S3-compatible storage service. To use an S3 publisher you will have to specify required parameters Bucket and Key and optional parameters Region, Endpoint, VersionID, ChecksumSHA256. See the S3 publisher specification for more details.
Here’s an example of the part of the declarative job description that outlines the process of using the S3 Publisher with Bacalhau:
The IPFS publisher works using the same setup as above - you'll need to have an IPFS server running and a multiaddress for it. Then you'll configure that multiaddress using the InputSources.Types.IPFS.Endpoint
configuration key. Then you can use bacalhau job get <job-ID>
with no further arguments to download the results.
To use the IPFS publisher you will have to specify CID which can be used to access the published content. See the IPFS publisher specification for more details.
And part of the declarative job description with an IPFS publisher will look like this:
The Local Publisher should not be used for Production use as it is not a reliable storage option. For production use, we recommend using a more reliable option such as an S3-compatible storage service.
Another possibility to store the results of a job execution is on a compute node. In such case the results will be published to the local compute node, and stored as compressed tar file, which can be accessed and retrieved over HTTP from the command line using the get command. To use the Local publisher you will have to specify the only URL parameter with a HTTP URL to the location where you would like to save the result. See the Local publisher specification for more details.
Here is an example of part of the declarative job description with a local publisher:
How to configure TLS for the requester node APIs
By default, the requester node APIs used by the Bacalhau CLI are accessible over HTTP, but it is possible to configure it to use Transport Level Security (TLS) so that they are accessible over HTTPS instead. There are several ways to obtain the necessary certificates and keys, and Bacalhau supports obtaining them via ACME and Certificate Authorities or even self-signing them.
Once configured, you must ensure that instead of using http://IP:PORT you use https://IP:PORT to access the Bacalhau API
Automatic Certificate Management Environment (ACME) is a protocol that allows for automating the deployment of Public Key Infrastructure, and is the protocol used to obtain a free certificate from the Let's Encrypt Certificate Authority.
Using the --autocert [hostname]
parameter to the CLI (in the serve
and devstack
commands), a certificate is obtained automatically from Lets Encrypt. The provided hostname should be a comma-separated list of hostnames, but they should all be publicly resolvable as Lets Encrypt will attempt to connect to the server to verify ownership (using the ACME HTTP-01 challenge). On the very first request this can take a short time whilst the first certificate is issued, but afterwards they are then cached in the bacalhau repository.
Alternatively, you may set these options via the environment variable, BACALHAU_AUTO_TLS
. If you are using a configuration file, you can set the values inNode.ServerAPI.TLS.AutoCert
instead.
As a result of the Lets Encrypt verification step, it is necessary for the server to be able to handle requests on port 443. This typically requires elevated privileges, and rather than obtain these through a privileged account (such as root), you should instead use setcap to grant the executable the right to bind to ports <1024.
A cache of ACME data is held in the config repository, by default ~/.bacalhau/autocert-cache
, and this will be used to manage renewals to avoid rate limits.
Obtaining a TLS certificate from a Certificate Authority (CA) without using the Automated Certificate Management Environment (ACME) protocol involves a manual process that typically requires the following steps:
Choose a Certificate Authority: First, you need to select a trusted Certificate Authority that issues TLS certificates. Popular CAs include DigiCert, GlobalSign, Comodo (now Sectigo), and others. You may also consider whether you want a free or paid certificate, as CAs offer different pricing models.
Generate a Certificate Signing Request (CSR): A CSR is a text file containing information about your organization and the domain for which you need the certificate. You can generate a CSR using various tools or directly on your web server. Typically, this involves providing details such as your organization's name, common name (your domain name), location, and other relevant information.
Submit the CSR: Access your chosen CA's website and locate their certificate issuance or order page. You'll typically find an option to "Submit CSR" or a similar option. Paste the contents of your CSR into the provided text box.
Verify Domain Ownership: The CA will usually require you to verify that you own the domain for which you're requesting the certificate. They may send an email to one of the standard domain-related email addresses (e.g., admin@yourdomain.com, webmaster@yourdomain.com). Follow the instructions in the email to confirm domain ownership.
Complete Additional Verification: Depending on the CA's policies and the type of certificate you're requesting (e.g., Extended Validation or EV certificates), you may need to provide additional documentation to verify your organization's identity. This can include legal documents or phone calls from the CA to confirm your request.
Payment and Processing: If you're obtaining a paid certificate, you'll need to make the payment at this stage. Once the CA has received your payment and completed the verification process, they will issue the TLS certificate.
Once you have obtained your certificates, you will need to put two files in a location that bacalhau can read them. You need the server certificate, often called something like server.cert
or server.cert.pem
, and the server key which is often called something like server.key
or server.key.pem
.
Once you have these two files available, you must start bacalhau serve
which two new flags. These are tlscert
and tlskey
flags, whose arguments should point to the relevant file. An example of how it is used is:
Alternatively, you may set these options via the environment variables, BACALHAU_TLS_CERT
and BACALHAU_TLS_KEY
. If you are using a configuration file, you can set the values inNode.ServerAPI.TLS.ServerCertificate
and Node.ServerAPI.TLS.ServerKey
instead.
If you wish, it is possible to use Bacalhau with a self-signed certificate which does not rely on an external Certificate Authority. This is an involved process and so is not described in detail here although there is a helpful script in the Bacalhau github repository which should provide a good starting point.
Once you have generated the necessary files, the steps are much like above, you must start bacalhau serve
which two new flags. These are tlscert
and tlskey
flags, whose arguments should point to the relevant file. An example of how it is used is:
Alternatively, you may set these options via the environment variables, BACALHAU_TLS_CERT
and BACALHAU_TLS_KEY
. If you are using a configuration file, you can set the values inNode.ServerAPI.TLS.ServerCertificate
and Node.ServerAPI.TLS.ServerKey
instead.
If you use self-signed certificates, it is unlikely that any clients will be able to verify the certificate when connecting to the Bacalhau APIs. There are three options available to work around this problem:
Provide a CA certificate file of trusted certificate authorities, which many software libraries support in addition to system authorities.
Install the CA certificate file in the system keychain of each machine that needs access to the Bacalhau APIs.
Instruct the software library you are using not to verify HTTPS requests.
Note that in version v1.5.0
the configuration management approach was completely changed and certain limits were deprecated.
Check out the release notes to learn about all the changes in configuration management: CLI commands syntax and configuration files management.
These are the configuration keys that control the capacity of the Bacalhau node, and the limits for jobs that might be run.
Compute.AllocatedCapacity.CPU
Specifies the amount of CPU a compute node allocates for running jobs. It
can be expressed as a percentage (e.g., 85%
) or a Kubernetes resource string
Compute.AllocatedCapacity.Disk
Specifies the amount of Disk space a compute node allocates for running
jobs. It can be expressed as a percentage (e.g., 85%
) or a Kubernetes resource string (e.g., 10Gi
)
Compute.AllocatedCapacity.GPU
Specifies the amount of GPU a compute node allocates for running jobs. It can be expressed as a percentage (e.g., 85%
) or a Kubernetes resource string (e.g., 1
).
Note: When using percentages, the result is always rounded up to the nearest whole GPU
Compute.AllocatedCapacity.Memory
Specifies the amount of Memory a compute node allocates for running jobs. It can be expressed as a percentage (e.g., 85%
) or a Kubernetes resource string (e.g., 1Gi
)
It is also possible to additionally specify the number of resources to be allocated to each job by default, if the required number of resources is not specified in the job itself. JobDefaults.<
Job type
>.Task.Resources.<Resource Type>
configuration keys are used for this purpose. E.g. to provide each Ops job with 2Gb of RAM the following key is used: JobDefaults.Ops.Task.Resources.Memory
:
See the complete configuration keys list for more details.
Resource limits are not supported for Docker jobs running on Windows. Resource limits will be applied at the job bid stage based on reported job requirements but will be silently unenforced. Jobs will be able to access as many resources as requested at runtime.
Running a Windows-based node is not officially supported, so your mileage may vary. Some features (like resource limits) are not present in Windows-based nodes.
Bacalhau currently makes the assumption that all containers are Linux-based. Users of the Docker executor will need to manually ensure that their Docker engine is running and configured appropriately to support Linux containers, e.g. using the WSL-based backend.
Bacalhau can limit the total time a job spends executing. A job that spends too long executing will be cancelled, and no results will be published.
By default, a Bacalhau node does not enforce any limit on job execution time. Both node operators and job submitters can supply a maximum execution time limit. If a job submitter asks for a longer execution time than permitted by a node operator, their job will be rejected.
Applying job timeouts allows node operators to more fairly distribute the work submitted to their nodes. It also protects users from transient errors that result in their jobs waiting indefinitely.
Job submitters can pass the --timeout
flag to any Bacalhau job submission CLI to set a maximum job execution time. The supplied value should be a whole number of seconds with no unit.
The timeout can also be added to an existing job spec by adding the Timeout
property to the Spec
.
Node operators can use configuration keys to specify default and maximum job execution time limits. The supplied values should be a numeric value followed by a time unit (one of s
for seconds, m
for minutes or h
for hours).
Here is a list of the relevant properties:
JobDefaults.Batch.Task.Timeouts.ExecutionTimeout
Default value for batch job execution timeouts on your current compute node. It will be assigned to batch jobs with no timeout requirement defined
JobDefaults.Ops.Task.Timeouts.ExecutionTimeout
Default value for ops job execution timeouts on your current compute node. It will be assigned to ops jobs with no timeout requirement defined
JobDefaults.Batch.Task.Timeouts.TotalTimeout
Default value for the maximum execution timeout this compute node supports for batch jobs. Jobs with higher timeout requirements will not be bid on
JobDefaults.Ops.Task.Timeouts.TotalTimeout
Default value for the maximum execution timeout this compute node supports for ops jobs. Jobs with higher timeout requirements will not be bid on
Note, that timeouts can not be configured for Daemon and Service jobs.
How to configure compute/requester persistence
Both compute nodes, and requester nodes, maintain state. How that state is maintained is configurable, although the defaults are likely adequate for most use-cases. This page describes how to configure the persistence of compute and requester nodes should the defaults not be suitable.
The computes nodes maintain information about the work that has been allocated to them, including:
The current state of the execution, and
The original job that resulted in this allocation
This information is used by the compute and requester nodes to ensure allocated jobs are completed successfully. By default, compute nodes store their state in a bolt-db database and this is located in the bacalhau repository along with configuration data. For a compute node whose ID is "abc", the database can be found in ~/.bacalhau/abc-compute/executions.db
.
In some cases, it may be preferable to maintain the state in memory, with the caveat that should the node restart, all state will be lost. This can be configured using the environment variables in the table below.
BACALHAU_COMPUTE_STORE_TYPE
--compute-execution-store-type
boltdb
Uses the bolt db execution store (default)
BACALHAU_COMPUTE_STORE_PATH
--compute-execution-store-path
A path (inc. filename)
Specifies where the boltdb database should be stored. Default is ~/.bacalhau/{NODE-ID}-compute/executions.db
if not set
When running a requester node, it maintains state about the jobs it has been requested to orchestrate and schedule, the evaluation of those jobs, and the executions that have been allocated. By default, this state is stored in a bolt db database that, with a node ID of "xyz" can be found in ~/.bacalhau/xyz-requester/jobs.db
.
BACALHAU_JOB_STORE_TYPE
--requester-job-store-type
boltdb
Uses the bolt db job store (default)
BACALHAU_JOB_STORE_PATH
--requester-job-store-path
A path (inc. filename)
Specifies where the boltdb database should be stored. Default is ~/.bacalhau/{NODE-ID}-requester/jobs.db
if not set
How to enable GPU support on your Bacalhau node
Bacalhau supports GPUs out of the box and defaults to allowing execution on all GPUs installed on the node.
Bacalhau makes the assumption that you have installed all the necessary drivers and tools on your node host and have appropriately configured them for use by Docker.
In general for GPUs from any vendor, the Bacalhau client requires:
Verify installation by Running a Sample Workload
nvidia-smi
installed and functional
rocm-smi
tool installed and functional
See the Running ROCm Docker containers for guidance on how to run Docker workloads on AMD GPU.
xpu-smi
tool installed and functional
See the Running on GPU under docker for guidance on how to run Docker workloads on Intel GPU.
Access to GPUs can be controlled using resource limits. To limit the number of GPUs that can be used per job, set a job resource limit. To limit access to GPUs from all jobs, set a total resource limit.
When running a node, you can choose which jobs you want to run by using configuration options, environment variables or flags to specify a job selection policy.
Confiuration key
Default value
Meaning
JobAdmissionControl.Locality
Anywhere
Only accept jobs that reference data we have locally ("local") or anywhere ("anywhere").
JobAdmissionControl.ProbeExec
unused
Use the result of an external program to decide if we should take on the job.
JobAdmissionControl.ProbeHTTP
unused
Use the result of a HTTP POST to decide if we should take on the job.
JobAdmissionControl.RejectStatelessJobs
False
JobAdmissionControl.AcceptNetworkedJobs
False
If you want more control over making the decision to take on jobs, you can use the JobAdmissionControl.ProbeExec
and JobAdmissionControl.ProbeHTTP
configuration keys.
These are external programs that are passed the following data structure so that they can make a decision about whether to take on a job:
The exec
probe is a script to run that will be given the job data on stdin
, and must exit with status code 0 if the job should be run.
The http
probe is a URL to POST the job data to. The job will be rejected if the HTTP request returns a non-positive status code (e.g. >= 400).
If the HTTP response is a JSON blob, it should match the following schema and will be used to respond to the bid directly:
For example, the following response will reject the job:
If the HTTP response is not a JSON blob, the content is ignored and any non-error status code will accept the job.
Set up private IPFS network
Note that Bacalhauv1.4.0
supports IPFS v0.27
and below.
Starting from v.1.5.0
Bacalhau supports latest IPFS versions.
Consider this when selecting versions of Bacalhau and IPFS when setting up your own private network.
Support for the embedded node was in v1.4.0
to streamline communication and reduce overhead. Therefore, now in order to use a private IPFS network, it is necessary to create it yourself and then connect to it with nodes. This manual describes how to:
Install and configure IPFS
Create Private IPFS network
Configure your to use the private IPFS network
Pin your data to private IPFS network
Install on all nodes
Install
Initialize Private IPFS network
Connect all nodes to the same private network
Connect Bacalhau network to use private IPFS network
Remove any previous Go installation by deleting the /usr/local/go
folder (if it exists), then extract the archive you downloaded into /usr/local
, creating a fresh Go tree in /usr/local/go
:
Add /usr/local/go/bin
to the PATH
environment variable. You can do this by adding the following line to your $HOME/.profile
or /etc/profile
(for a system-wide installation):
Changes made to a profile file may not apply until the next time you log into the system. To apply the changes immediately, just run the shell commands directly or execute them from the profile using a command such as source $HOME/.profile
.
Verify that Go is installed correctly by checking its version:
Verify that IPFS
is installed correctly by checking its version:
A bootstrap node is used by client nodes to connect to the private IPFS network. The bootstrap connects clients to other nodes available on the network.
Execute the ipfs init
command to initialize an IPFS node:
The next step is to generate the swarm key - a cryptographic key that is used to control access to an IPFS network, and export the key into a swarm.key
file, located in the ~/ipfs
folder.
Now the default entries of bootstrap nodes should be removed. Execute the command on all nodes:
Check that bootstrap config does not contain default values:
Configure IPFS to listen for incoming connections on specific network addresses and ports, making the IPFS Gateway and API services accessible. Consider changing addresses and ports depending on the specifics of your network.
Start the IPFS daemon:
Copy the swarm.key
file from the bootstrap node to client nodes into the ~/.ipfs/
folder and initialize IPFS:
Apply same config as on bootstrap node and start the daemon:
Done! Now you can check that private IPFS network works properly:
List peers on the bootstrap node. It should list all connected nodes:
Pin some files and check their availability across the network:
systemd
ServiceFinally, make the IPFS daemon run at system startup. To do this:
Create new service unit file in the /etc/systemd/system/
Add following content to the file, replacing /path/to/your/ipfs/executable
with the actual path
Use which ipfs
command to locate the executable.
Usually path to the executable is /usr/local/bin/ipfs
For security purposes, consider creating a separate user to run the service. In this case, specify its name in the User=
line. Without specifying user, the ipfs service will be launched with root
, which means that you will need to copy the ipfs binary to the /root
directory
Reload and enable the service
Done! Now reboot the machine to ensure that daemon starts correctly. Use systemctl status ipfs
command to check that service is running:
Now to connect your private Bacalhau network to the private IPFS network, the IPFS API address should be specified using the --ipfs-connect
flag. It can be found in the ~/.ipfs/api
file:
Done! Now your private Bacalhau network is connected to the private IPFS network!
To verify that everything works correctly:
Pin the file to the private IPFS network
Run the job, which takes the pinned file as input and publishes result to the private IPFS network
View and download job results
Create any file and pin it. Use the ipfs add
command:
Run a simple job, which fetches the pinned file via its CID, lists its content and publishes results back into the private IPFS network:
Use the ipfs ls
command to view the results:
Use the ipfs cat
command to view the file content. In our case, the file of interest is the stdout
:
Use the ipfs get
command to download the file using its CID:
Before you join the main Bacalhau network, you can test locally.
To test, you can use the bacalhau devstack
command, which offers a way to get a 3 node cluster running locally.
By settings PREDICTABLE_API_PORT=1
, the first node of our 3 node cluster will always listen on port 20000
In another window, export the following environment variables so that the Bacalhau client binary connects to our local development cluster:
You can now interact with Bacalhau - all jobs are running by the local devstack cluster.
How to run the WebUI.
Note that in version v1.5.0
the WebUI was completely reworked.
Check out the to learn about all the changes to WebUI and more.
The Bacalhau WebUI offers an intuitive interface for interacting with the Bacalhau network. This guide provides comprehensive instructions for setting up and utilizing the WebUI.
For contributing to the WebUI's development, please refer to the .
Ensure you have a Bacalhau v1.5.0
or later installed.
To enable the WebUI, use the WebUI.Enabled
configuration key:
By default, WebUI uses host=0.0.0.0
and port=8438
. This can be configured via WebUI.Listen
configuration key:
Once started, the WebUI is accessible at the specified address, localhost:8438
by default.
The updated WebUI allows you to view a list of jobs, including job status, run time, type, and a message in case the job failed.
Clicking on the id of a job in the list opens the job details page, where you can see the history of events related to the job, the list of nodes on which the job was executed and the real-time logs of the job.
On the Nodes page you can see a list of nodes connected to your network, including node type, membership and connection statuses, amount of resources - total and currently available, and a list of labels of the node.
Clicking on the node id opens the node details page, where you can see the status and settings of the node, the number of running and scheduled jobs.
How to configure authentication and authorization on your Bacalhau node.
Bacalhau includes a flexible auth system that supports multiple methods of auth that are appropriate for different deployment environments.
With no specific authentication configuration supplied, Bacalhau runs in "anonymous mode" – which allows unidentified users limited control over the system. "Anonymous mode" is only appropriate for testing or evaluation setups.
In anonymous mode, Bacalhau will allow:
Users identified by a self-generated private key to submit any job and cancel their own jobs.
Users not identified by any key to access other read-only endpoints, such as to read job lists, describe jobs, and query node or agent information.
Bacalhau auth is controlled by policies. Configuring the auth system is done by supplying a different policy file.
Restricting API access to only users that have authenticated requires specifying a new authorization policy. You can download a policy that restricts anonymous access and install it by using:
Once the node is restarted, accessing the node APIs will require the user to be authenticated, but by default will still allow users with a self-generated key to authenticate themselves.
Restricting the list of keys that can authenticate to only a known set requires specifying a new authentication policy. You can download a policy that restricts key-based access and install it by using:
Then, modify the allowed_clients
variable in challange_ns_no_anon.rego
to include acceptable client IDs, found by running bacalhau agent node
.
Once the node is restarted, only keys in the allowed list will be able to access any API.
Users can authenticate using a username and password instead of specifying a private key for access. Again, this requires installation of an appropriate policy on the server.
Passwords are not stored in plaintext and are salted. The downloaded policy expects password hashes and salts generated by scrypt
. To generate a salted password, the helper script in pkg/authn/ask/gen_password
can be used:
This will ask for a password and generate a salt and hash to authenticate with it. Add the encoded username, salt and hash into the ask_ns_password.rego
.
In principle, Bacalhau can implement any auth scheme that can be described in a structured way by a policy file.
Bacalhau will pass information pertinent to the current request into every authentication policy query as a field on the input
variable. The exact information depends on the type of authentication used.
challenge
authenticationchallenge
authentication uses identifies the user by the presence of a private key. The user is asked to sign an input phrase to prove they have the key they are identifying with.
Policies used for challenge
authentication do not need to actually implement the challenge verification logic as this is handled by the core code. Instead, they will only be invoked if this verification passes.
Policies for this type will need to implement these rules:
bacalhau.authn.token
: if the user should be authenticated, an access token they should use in subsequent requests. If the user should not be authenticated, should be undefined.
They should expect as fields on the input
variable:
clientId
: an ID derived from the user's private key that identifies them uniquely
nodeId
: the ID of the requester node that this user is authenticating with
signingKey
: the private key (as a JWK) that should be used to sign any access tokens to be returned
The simplest possible policy might therefore be this policy that returns the same opaque token for all users:
ask
authenticationask
authentication uses credentials supplied manually by the user as identification. For example, an ask
policy could require a username and password as input and check these against a known list. ask
policies do all the verification of the supplied credentials.
Policies for this type will need to implement these rules:
bacalhau.authn.token
: if the user should be authenticated, an access token they should use in subsequent requests. If the user should not be authenticated, should be undefined.
bacalhau.authn.schema
: a static JSON schema that should be used to collect information about the user. The type
of declared fields may be used to pick the input method, and if a field is marked as writeOnly
then it will be collected in a secure way (e.g. not shown on screen). The schema
rule does not receive any input
data.
They should expect as fields on the input
variable:
ask
: a map of field names from the JSON schema to strings supplied by the user. The policy should validate these credentials.
nodeId
: the ID of the requester node that this user is authenticating with
signingKey
: the private key (as a JWK) that should be used to sign any access tokens to be returned
The simplest possible policy might therefore be one that asks for no data and returns the same opaque token for every user:
Authorization policies do not vary depending on the type of authentication used – Bacalhau uses one authz policy for all API requests.
Authz policies are invoked for every API request. Authz policies should check the validity of any supplied access tokens and issue an authz decision for the requested API endpoint. It is not required that authz policies enforce that an access token is present – they may choose to grant access to unauthorized users.
Policies will need to implement these rules:
bacalhau.authz.token_valid
: true if the access token in the request is "valid" (but does not necessarily grant access for this request), or false if it is invalid for every request (e.g. because it has expired) and should be discarded.
bacalhau.authz.allow
: true if the user should be permitted to carry out the input request, false otherwise.
They should expect as fields on the input
variable for both rules:
http
: details of the user's HTTP request:
host
: the hostname used in the HTTP request
method
: the HTTP method (e.g. GET
, POST
)
path
: the path requested, as an array of path components without slashes
query
: a map of URL query parameters to their values
headers
: a map of HTTP header names to arrays representing their values
body
: a blob of any content submitted as the body
constraints
: details about the receiving node that should be used to validate any supplied tokens:
cert
: keys that the input token should have been signed with
iss
: the name of a node that this node will recognize as the issuer of any signed tokens
aud
: the name of this node that is receiving the request
Notably, the constraints
data is appropriate to be passed directly to the Rego io.jwt.decode_verify
method which will validate the access token as a JWT against the given constraints.
The simplest possible authz policy might be this one that allows all users to access all endpoints:
Reject jobs that don't specify any .
Accept jobs that require .
In this manual (the earliest and most widely used implementation of IPFS) will be used, so first of all, should be installed.
See the page for latest Go version.
The next step is to download and install Kubo. the appropriate version for your system. It is recommended to use the latest stable version.
Use command to view job execution results:
Use command to download job results. In this particular case, ipfs
publisher was used, so the get command will print the CID
of the job results:
For questions and feedback, please reach out in our
Policies are written in a language called , also used by Kubernetes. Users who want to write their own policies should get familiar with the Rego language.
A more realistic example that returns a signed JWT is in .
A more realistic example that returns a signed JWT is in .
A more realistic example (which is the Bacalhau "anonymous mode" default) is in .