Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Welcome to the Bacalhau documentation!
Bacalhau is a platform designed for fast, cost-efficient, and secure computation by running jobs directly where the data is generated and stored. Bacalhau helps you streamline existing workflows without extensive rewriting, as it allows you to run arbitrary Docker containers and WebAssembly (WASM) images as tasks. This approach is also known as Compute Over Data (or CoD). The name Bacalhau comes from the Portuguese word for salted cod fish.
Bacalhau aims to revolutionize data processing for large-scale datasets by enhancing cost-efficiency and accessibility, making data processing available to a broader audience. Our goal is to build an open, collaborative compute ecosystem that fosters unmatched collaboration. At (Expanso.io), we offer a demo network where you can try running jobs without any installation. Give it a try!
Bacalhau simplifies the process of managing compute jobs by providing a unified platform for managing jobs across different regions, clouds, and edge devices.
Bacalhau consists of a network of nodes that enables orchestration between every compute resource, no matter whether it is a Cloud VM, an On-premise server, or Edge devices. The network consists of two types of nodes:
Requester Node: responsible for handling user requests, discovering and ranking compute nodes, forwarding jobs to compute nodes, and monitoring the job lifecycle.
Compute Node: responsible for executing jobs and producing results. Different compute nodes can be used for different types of jobs, depending on their capabilities and resources.
Bacalhau can be used for a variety of data processing workloads, including machine learning, data analytics, and scientific computing. It is well-suited for workloads that require processing large amounts of data in a distributed and parallelized manner.
Once you have more than 10 devices generating or storing around 100GB of data, you're likely to face challenges with processing that data efficiently. Traditional computing approaches may struggle to handle such large volumes, and that's where distributed computing solutions like Bacalhau can be extremely useful. Bacalhau can be used in various industries, including security, web serving, financial services, IoT, Edge, Fog, and multi-cloud. Bacalhau shines when it comes to data-intensive applications like data engineering, model training, model inference, molecular dynamics, etc.
Bacalhau has a very friendly community and we are always happy to help you get started:
GitHub Discussions – ask anything about the project, give feedback, or answer questions that will help other users.
Join the Slack Community and go to #bacalhau channel – it is the easiest way to engage with other members in the community and get help.
Contributing – learn how to contribute to the Bacalhau project.
👉 Continue with Bacalhau's Getting Started guide to learn how to install and run a job with the Bacalhau client.
In this tutorial, you'll learn how to install and run a job with the Bacalhau client using the Bacalhau CLI or Docker.
The Bacalhau client is a command-line interface (CLI) that allows you to submit jobs to the Bacalhau. The client is available for Linux, macOS, and Windows. You can also run the Bacalhau client in a Docker container.
To install the CLI, choose your environment, and run the command(s) below.
You can install or update the Bacalhau CLI by running the commands in a terminal. You may need sudo mode or root password to install the local Bacalhau binary to /usr/local/bin
:
Windows users can download the latest release tarball from Github and extract bacalhau.exe
to any location available in the PATH environment variable.
To run a specific version of Bacalhau using Docker, use the command docker run -it docker.io/bacalhauproject/bacalhau:vX.Y.Z
, where vX.Y.Z
is the version you want to run.
To verify installation and check the version of the client and server, use the version
command. To run a Bacalhau client command with Docker, prefix it with docker run ghcr.io/bacalhau-project/bacalhau:latest
.
If you're wondering which server is being used, the Bacalhau Project has a demo network that's shared with the community. This network allows you to familiarize with Bacalhau's capabilities and launch jobs from your computer without maintaining a compute cluster on your own.
Getting started with Bacalhau is super straight forward! Below we lay out the steps involved:
That's it!
For more advanced users, you can create your own Bacalhau network, which .
If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel).
After having deployed the job, we now can use the CLI for the interaction with the network. The jobs were sent to the public demo network, where it was processed and we can call the following functions. The job_id
will differ for every submission.
You can find out more information about your job by using .
Let's take a look at the results of the command execution in the terminal:
This outputs all information about the job, including stdout, stderr, where the job was scheduled, and so on.
Depending on selected publisher, this may result in:
After the download has finished you should see the following contents in the results directory.
Each of these files contain the following information:
exitCode
- the numeric exit code after your job ran (just like if you ran looked at the exit code on your local machine)
stderr
- the output of the standard error stream of the job.
stdout
- the output of the standard out stream of the job.
In this tutorial we will go over the components and the architecture of Bacalhau. You will learn how it is built, what components are used, how you could interact and how you could use Bacalhau.
Bacalhau is a peer-to-peer network of nodes that enables decentralized communication between computers. The network consists of two types of nodes, which can communicate with each other.
The requester and compute nodes together form a p2p network and use gossiping to discover each other, share information about node capabilities, available resources and health status. Bacalhau is a peer-to-peer network of nodes that enables decentralized communication between computers.
Requester Node: responsible for handling user requests, discovering and ranking compute nodes, forwarding jobs to compute nodes, and monitoring the job lifecycle.
Compute Node: responsible for executing jobs and producing results. Different compute nodes can be used for different types of jobs, depending on their capabilities and resources.
To interact with the Bacalhau network, users can use the Bacalhau CLI (command-line interface) to send requests to a requester node in the network. These requests are sent using the JSON format over HTTP, a widely-used protocol for transmitting data over the internet. Bacalhau's architecture involves two main sections which are the core components and interfaces.
The core components are responsible for handling requests and connecting different nodes. The network includes two different components:
The interfaces handle the distribution, execution, storage and publishing of jobs. In the following all the different components are described and their respective protocols are shown.
You should use the Bacalhau client to send a task to the network. The client transmits the job information to the Bacalhau network via established protocols and interfaces. Jobs submitted via the Bacalhau CLI are forwarded to a Bacalhau network node at (http://bootstrap.production.bacalhau.org/) via port 1234
by default. This Bacalhau node will act as the requester node for the duration of the job lifecycle.
Bacalhau provides an interface to interact with the server via a REST API. Bacalhau uses 127.0.0.1 as the localhost and 1234 as the port by default.
You can use the bacalhau docker run
command to start a job in a Docker container. Below, you can see an excerpt of the commands:
You can also use the bacalhau wasm run
command to run a job compiled into the (WASM) format. Below, you can find an excerpt of the commands in the Bacalhau CLI:
When a job is submitted to a requester node, it selects compute nodes that are capable and suitable to execute the job, and communicate with them directly. The compute node has a collection of named executors, storage sources, and publishers, and it will choose the most appropriate ones based on the job specifications.
When the Compute node completes the job, it publishes the results to S3's remote storage, IPFS.
The Bacalhau client receives updates on the task execution status and results. A user can access the results and manage tasks through the command line interface.
To Get the results of a job you can run the following command.
One can choose from a wide range of flags, from which a few are shown below.
To describe a specific job, inserting the ID to the CLI or API gives back an overview of the job.
If you run more then one job or you want to find a specific job ID
To list executions follow the following commands.
Now that you have the Bacalhau CLI installed, what can you do with it? Just about anything! Let's walk you through running some very simple jobs.
To submit a job in Bacalhau, we will use the bacalhau docker run
command. The command runs a job using the Docker executor on the node. Let's take a quick look at its syntax:
To run the job, you will need to connect to a public demo network or set up your own private network. In the following example, we will use the public demo network by using the --configuration
flag.
We will use the command to submit a Hello World job that runs an program within an .
Let's take a look at the results of the command execution in the terminal:
After the above command is run, the job is submitted to the selected network, which processes the job and Bacalhau prints out the related job id:
The job_id
above is shown in its full form. For convenience, you can use the shortened version, in this case: j-de72aeff
.
Let's take a look at the results of the command execution in the terminal:
The output will look something like the following:
After the job runs, if it produces any output, the output will be stored according to the setting for your job. You can download your job results directly by using bacalhau job get
.
outputs
- a directory of everything you wrote in the job (if using ).
You can create jobs in the Bacalhau network using various introduced in version 1.2. Each job may need specific variables, resource requirements and data details that are described in the .
You can use the command with to create a job in Bacalhau using JSON and YAML formats.
You can use to submit a new job for execution.
The selected compute node receives the job and starts its execution inside a container. The container can use different executors to work with the data and perform the necessary actions. A job can use the docker executor, WASM executor or a library storage volumes. Use to view the parameters to configure the Docker Engine. If you want tasks to be executed in a WebAssembly environment, pay attention to .
Bacalhau's seamless integration with IPFS ensures that users have a decentralized option for publishing their task results, enhancing accessibility and resilience while reducing dependence on a single point of failure. View to get the detailed information.
Bacalhau's S3 Publisher provides users with a secure and efficient method to publish task results to any S3-compatible storage service. This publisher supports not just AWS S3, but other S3-compatible services offered by cloud providers like Google Cloud Storage and Azure Blob Storage, as well as open-source options like MinIO. View to get the detailed information.
You can use the command with to get a full description of a job in yaml format.
You can use to retrieve the specification and current status of a particular job.
You can use the command with to list jobs on the network in yaml format.
You can use to retrieve a list of jobs.
You can use the command with to list all executions associated with a job, identified by its ID, in yaml format.
You can use to retrieve all executions for a particular job.
The Bacalhau client provides the user with tools to monitor and manage the execution of jobs. You can get information about status, progress and decide on next steps. View the if you want to know the node's health, capabilities, and deployed Bacalhau version. To get information about the status and characteristics of the nodes in the cluster use .
You can use the command with to cancel a job that was previously submitted and stop it running if it has not yet completed.
You can use to terminate a specific job asynchronously.
You can use the command with to enumerate the historical events related to a job, identified by its ID.
You can use to retrieve historical events for a specific job.
You can use this to retrieve the log output (stdout, and stderr) from a job. If the job is still running it is possible to follow the logs after the previously generated logs are retrieved.
To familiarize yourself with all the commands used in Bacalhau, please view
With that, you have just successfully run a job on Bacalhau! Congratulations!