Skip to main content

Bacalhau Logo

Overview​

Bacalhau is a platform for fast, cost efficient, and secure computation by running jobs where the data is generated and stored. With Bacalhau, you can streamline your existing workflows without the need of extensive rewriting by running arbitrary Docker containers and WebAssembly (wasm) images as tasks. This architecture is also referred to as Compute Over Data (or CoD). Bacalhau was coined from the Portuguese word for salted Cod fish.

Bacalhau seeks to transform data processing for large-scale datasets to improve cost and efficiency, and to open up data processing to larger audiences. Our goals with the project center around creating an open, collaborative Compute ecosystem. We believe that the same benefits of open collaboration on datasets should be available to generic storage compute tasks. At the moment we are free volunteer network, enjoy;)

Why Bacalhau?​

⚑️ Jobs in Bacalhau are processed where the data was created and all jobs are parallel by default.

πŸ” You can run private workloads to reduce the chance of leaking private information or inadvertently sharing your data outside of your organization.

πŸ’Έ Bacalhau eliminates ingress/egress costs since jobs are processed closer to the source.

πŸ€“ You can mount your data anywhere on your machine, and Bacalhau will be able to run against that data.

πŸ’₯ You can integrate with Bacalhau and run a job on a database.

πŸ“š Bacalhau operates on a network of open compute resources made available to serve any data processing workload. With Bacalhau you can batch process petabytes (quadrillion bytes) of data.

πŸŽ† You can auto-generate art using a Stable Diffusion AI model trained on the chosen artist’s original works.

Fast Track ⏱️​

Understand Bacalhau in 1 minute

Go to the folder directory that you want to store your job results

Install the bacalhau client

curl -sL https://get.bacalhau.org/install.sh | bash

Submit a "Hello World" job

bacalhau docker run ubuntu echo Hello World

Download your result

bacalhau get 63d08ff0..... # make sure to use the right job id from the docker run command
info

For a more detailed tutorial, check out our Getting Started tutorial.

How it works​

The goal of the Bacalhau project is to make it easy to perform distributed, decentralised computation next to where the data resides. So a key step in this process is making your data accessible. Data is identified by its content identifier (CID) and can be accessed by anyone who knows the CID. Here are some options that can help you mount your data:

info

The options are not limited to the above mentioned. You can mount your data anywhere on your machine, and Bacalhau will be able to run against that data

Use Cases​

Bacalhau shines when it comes to data-intensive applications like data engineering, model training, model inference, model training, model dynanmics, etc.

Here are some example tutorials on how you can process your data with Bacalhau:

info

For more tutorials, visit our example page

Roadmap​

Initially, the Bacalhau project will focus on serving data processing and analytics use cases. Over time, Bacalhau will expand to other Compute workloads. You can find Bacalhau's Public Roadmap here!

Community​

Bacalhau has a very friendly community and we are always happy to help you get started:

  • GitHub Discussions – ask anything about the project, give feedback or answer questions that will help other users.
  • Join the Slack Community and go to #bacalhau channel – it is the easiest way engage with other members in the community and get help.
  • Contributing – learn how to contribute to the Bacalhau project.

Next Steps​

πŸ‘‰ Continue with Getting Started guide to learn how to install and run a job with the Bacalhau client.

πŸ‘‰ Or jump directly to try out the different Examples that showcases Bacalhau abilities.