Connect Storage

Bacalhau has two ways to make use of external storage providers: Sources and Publishers. Sources storage resources consumed as inputs to jobs. And Publishers storage resources created with the results of jobs.

Sources

Bacalhau allows you to use S3 or any S3-compatible storage service as an input source. Users can specify files or entire prefixes stored in S3 buckets to be fetched and mounted directly into the job execution environment. This capability ensures that your jobs have immediate access to the necessary data. See the S3 source specification for more details.

To use the S3 source, you will have to to specify the mandatory name of the S3 bucket and the optional parameters Key, Filter, Region, Endpoint, VersionID and ChechsumSHA256.

Below is an example of how to define an S3 input source in YAML format:

InputSources:
  - Source:
      Type: "s3"
      Params:
        Bucket: "my-bucket"
        Key: "data/"
        Endpoint: "https://s3.us-west-2.amazonaws.com"
        ChecksumSHA256: "e3b0c44b542b..."
  - Target: "/data"

Publishers

Bacalhau's S3 Publisher provides users with a secure and efficient method to publish job results to any S3-compatible storage service. To use an S3 publisher you will have to specify required parameters Bucket and Key and optional parameters Region, Endpoint, VersionID, ChecksumSHA256. See the S3 publisher specification for more details.

Here’s an example of the part of the declarative job description that outlines the process of using the S3 Publisher with Bacalhau:

Publisher:
  Type: "s3"
  Params:
    Bucket: "my-task-results"
    Key: "task123/result.tar.gz"
    Endpoint: "https://s3.us-west-2.amazonaws.com"

Last updated