Bacalhau is a platform for fast, cost efficient, and secure computation by running jobs where the data is generated and stored. With Bacalhau, you can streamline your existing workflows without the need of extensive rewriting by running arbitrary Docker containers and WebAssembly (wasm) images as tasks. This architecture is also referred to as Compute Over Data (or CoD). Bacalhau was coined from the Portuguese word for salted Cod fish.
Bacalhau seeks to transform data processing for large-scale datasets to improve cost and efficiency, and to open up data processing to larger audiences. Our goals with the project center around creating an open, collaborative Compute ecosystem. We believe that the same benefits of open collaboration on datasets should be available to generic storage compute tasks. At the moment we are free volunteer network, enjoy;)
⚡️ Jobs in Bacalhau are processed where the data was created and all jobs are parallel by default.
🔐 You can run private workloads to reduce the chance of leaking private information or inadvertently sharing your data outside of your organization.
💸 Bacalhau eliminates ingress/egress costs since jobs are processed closer to the source.
🤓 You can mount your data anywhere on your machine, and Bacalhau will be able to run against that data.
💥 You can integrate with Bacalhau and run a job on a database.
📚 Bacalhau operates on a network of open compute resources made available to serve any data processing workload. With Bacalhau you can batch process petabytes (quadrillion bytes) of data.
🎆 You can auto-generate art using a Stable Diffusion AI model trained on the chosen artist’s original works.
Fast Track ⏱️
Understand Bacalhau in 1 minute
Go to the folder directory that you want to store your job results
Install the bacalhau client
curl -sL https://get.bacalhau.org/install.sh | bash
Submit a "Hello World" job
bacalhau docker run ubuntu echo Hello World
Download your result
bacalhau get 63d08ff0..... # make sure to use the right job id from the docker run command
For a more detailed tutorial, check out our Getting Started tutorial.
How it works
The goal of the Bacalhau project is to make it easy to perform distributed, decentralised computation next to where the data resides. So a key step in this process is making your data accessible. Data is identified by its content identifier (CID) and can be accessed by anyone who knows the CID. Here are some options that can help you mount your data:
- Copy data from a URL to public storage
- Pin Data to public storage
- Copy Data from S3 Bucket to public storage
The options are not limited to the above mentioned. You can mount your data anywhere on your machine, and Bacalhau will be able to run against that data
Bacalhau shines when it comes to data-intensive applications like data engineering, model training, model inference, model training, model dynanmics, etc.
Here are some example tutorials on how you can process your data with Bacalhau:
- Stable Diffusion AI
- Generate Realistic Images using StyleGAN3 and Bacalhau
- Object Detection with YOLOv5 on Bacalhau
- Running Genomics on Bacalhau
- Training Pytorch Model with Bacalhau
For more tutorials, visit our example page
Initially, the Bacalhau project will focus on serving data processing and analytics use cases. Over time, Bacalhau will expand to other Compute workloads. You can find Bacalhau's Public Roadmap here!
Bacalhau has a very friendly community and we are always happy to help you get started:
- GitHub Discussions – ask anything about the project, give feedback or answer questions that will help other users.
- Join the Slack Community and go to #bacalhau channel – it is the easiest way engage with other members in the community and get help.
- Contributing – learn how to contribute to the Bacalhau project.
👉 Continue with Getting Started guide to learn how to install and run a job with the Bacalhau client.
👉 Or jump directly to try out the different Examples that showcases Bacalhau abilities.