Welcome

Welcome to the Bacalhau documentation!

In Bacalhau v.1.5.0 a couple of things changed.

Bacalhau has updated configurations. Please check out the guide.
There is no default endpoint anymore to give users even more control over their network from the start.
There is now a built-in WebUI.

For more information, check out the release notes.

What is Bacalhau?

Bacalhau is a platform for fast, cost-efficient, and secure computation by running jobs where the data is generated and stored. With Bacalhau, you can streamline your existing workflows without the need for extensive rewriting by running arbitrary Docker containers and WebAssembly (WASM) images as tasks. This architecture is also referred to as Compute Over Data (or CoD). Bacalhau was coined from the Portuguese word for salted Cod fish.

Bacalhau seeks to transform data processing for large-scale datasets to improve cost and efficiency, and to open up data processing to larger audiences. Our goal is to create an open, collaborative compute ecosystem that enables unparalleled collaboration. We (Expanso.io) offer a demo network so you can try out jobs without even installing. Give it a shot!

Why Bacalhau?

Scalability and Flexibility:
- You can run large-scale computations without relying on a single cloud provider, enhancing flexibility and potentially reducing costs.
- Bacalhau enables distributed data processing, which can significantly speed up analysis and model training by parallelizing tasks across multiple nodes.
Data Privacy and Security:
- Bacalhau allows data to be processed close to its source, which can help maintain data privacy and comply with regulatory requirements.
Cost Efficiency:
- Utilize Bacalhau’s platform to dynamically allocate resources, ensuring optimal performance while controlling costs.

Bacalhau simplifies the process of managing compute jobs by providing a unified platform for managing jobs across different regions, clouds, and edge devices.

How it works

Bacalhau consists of a network of nodes that enables orchestration between every compute resource, no matter whether it is a Cloud VM, an On-premise server, or Edge devices. The network consists of two types of nodes:

Requester Node: responsible for handling user requests, discovering and ranking compute nodes, forwarding jobs to compute nodes, and monitoring the job lifecycle.

Compute Node: responsible for executing jobs and producing results. Different compute nodes can be used for different types of jobs, depending on their capabilities and resources.

For a more detailed tutorial, check out our Getting Started Tutorial.

Data ingestion

Data is identified by its content identifier (CID) and can be accessed by anyone who knows the CID. Here are some options that can help you mount your data:

The options are not limited to the above-mentioned. You can mount your data anywhere on your machine, and Bacalhau will be able to run against that data

Security in Bacalhau

All workloads run under restricted Docker or WASM permissions on the node. Additionally, you can use existing (locked down) binaries that are pre-installed through Pluggable Executors.

The best practice in 12-factor apps is to use environment variables to store sensitive data such as access tokens, API keys, or passwords. These variables can be accessed by Bacalhau at runtime and are not visible to anyone who has access to the code or the server. Alternatively, you can pre-provision credentials to the nodes and access those on a node-by-node basis.

Finally, endpoints (such as vaults) can also be used to provide secure access to Bacalhau. This way, the client can authenticate with Bacalhau using the token without exposing their credentials.

Use Cases

Bacalhau can be used for a variety of data processing workloads, including machine learning, data analytics, and scientific computing. It is well-suited for workloads that require processing large amounts of data in a distributed and parallelized manner.

Once you have more than 10 devices generating or storing around 100GB of data, you're likely to face challenges with processing that data efficiently. Traditional computing approaches may struggle to handle such large volumes, and that's where distributed computing solutions like Bacalhau can be extremely useful. Bacalhau can be used in various industries, including security, web serving, financial services, IoT, Edge, Fog, and multi-cloud. Bacalhau shines when it comes to data-intensive applications like data engineering, model training, model inference, molecular dynamics, etc.

Here are some example tutorials on how you can process your data with Bacalhau:

Stable Diffusion AI training and deployment with Bacalhau.
Generate Realistic Images using StyleGAN3 and Bacalhau.
Object Detection with YOLOv5 on Bacalhau.
Running Genomics on Bacalhau.
Training Pytorch Model with Bacalhau.

For more tutorials, visit our example page

Community

Bacalhau has a very friendly community and we are always happy to help you get started:

GitHub Discussions – ask anything about the project, give feedback, or answer questions that will help other users.
Join the Slack Community and go to #bacalhau channel – it is the easiest way to engage with other members in the community and get help.
Contributing – learn how to contribute to the Bacalhau project.

Next Steps

👉 Continue with Bacalhau's Getting Started guide to learn how to install and run a job with the Bacalhau client.

👉 Or jump directly to try out the different Examples that showcase Bacalhau's abilities.

NextHow Bacalhau Works

Last updated 21 days ago