Key Concepts

Bacalhau is built around a few core ideas and terminologies. If you're new to Bacalhau, here's what you need to know:

Distributed Compute Orchestration

Bacalhau coordinates computing workloads across a network of machines, intelligently matching jobs to resources.

  • Bacalhau acts as a dispatcher: You submit jobs (e.g., container workloads), and it finds the best node to run them based on available resources, data location, and constraints.

Bring the Compute to the Data

Instead of moving data to compute, Bacalhau moves compute to where data lives, reducing network overhead and improving efficiency.

  • Traditionally, big data solutions shuffle large datasets across networks to a central compute cluster.

  • Bacalhau inverts this approach: it places compute tasks where the data already resides—whether in local storage, an S3 bucket, or other storage providers—reducing unnecessary data movement.

Jobs & Executions

Bacalhau organizes work in a hierarchy that enables efficient resource allocation and parallelization.

  • A Job defines the overall workflow (e.g., "run a Docker image with these arguments").

  • A job can be broken into multiple Executions that run in parallel across different compute nodes.

  • Bacalhau optimizes these executions based on data locality and available resources.

Job Types

Bacalhau supports various execution patterns to accommodate different workload requirements:

  • Batch Jobs: One-time execution of a workload, typically for data processing tasks that run to completion.

  • Ops Jobs: Administrative or operational tasks, often for system maintenance or monitoring.

  • Daemon Jobs: Long-running background processes that perform ongoing work.

  • Service Jobs: Web services or APIs that need to remain available and respond to requests.

Node Types

The Bacalhau network consists of specialized components, each with specific responsibilities:

  • Orchestrator Node: Receives job submissions, schedules executions, and monitors state. Started with bacalhau serve --orchestrator.

  • Compute Node: Executes workloads locally, typically requiring Docker or another runtime. Started with bacalhau serve --compute.

  • Hybrid Node: Serves both roles at once—often used for local dev or small setups. Started with bacalhau serve --orchestrator --compute.

Execution Engines

Bacalhau runs your code through pluggable runtime environments:

  • Bacalhau supports multiple execution engines through its modular architecture:

    • Docker: For container-based workloads

    • WebAssembly (WASM): For lightweight, sandboxed execution

  • The framework is designed to accommodate additional engines as needed.

Storage Providers

Bacalhau can access data from various sources through a clean, extensible interface:

  • Bacalhau can mount data from various sources through its flexible storage provider interface:

    • S3-compatible storage

    • HTTP/HTTPS URLs

    • Local filesystems

    • IPFS

    • And more via storage provider plugins

Publisher

After execution, Bacalhau ensures your results are accessible where you need them:

  • After a job finishes, its results can be published to a specific backend—like local disk, S3 or IPFS—so they're easy to retrieve.

Communication Layer

A reliable messaging system allows Bacalhau components to coordinate effectively:

  • Bacalhau uses NATS.io as its communication backbone:

    • Orchestrators act as NATS servers

    • Compute nodes connect as NATS clients

    • This provides reliable, scalable messaging between components

Last updated

Was this helpful?