Publishing & Retrieving Results
This guide explains how to configure output publishing and retrieve results from Bacalhau jobs across different storage systems. Proper output handling is essential for building effective data pipelines and workflows.
What You'll Learn
How Bacalhau's Publishers mechanism works
How to configure different output destination types
How to retrieve outputs from various storage systems
How to choose the right publisher for your use case
Understanding Publishers and Result Paths
In Bacalhau, you need to configure two key components for handling outputs:
A Publisher defines where your job's output files are stored after execution
Result Paths specify which directories should be captured as job results.
Retrieving Local Outputs
After your job completes, retrieve outputs using the bacalhau job get
command:
This will download all published outputs to your current directory.
Important Notes:
If you define a publisher without specifying result paths, only stdout and stderr will be uploaded to the chosen publisher
If you define result paths without a publisher, the job will fail
You can have multiple result paths, each capturing different directories
Publisher Types
Bacalhau supports multiple publisher types to accommodate different needs and infrastructure requirements.
S3 Publisher
The S3 Publisher uploads outputs to an Amazon S3 bucket or any S3-compatible storage service, such as MinIO. The compute node must have permission to write to the bucket, and the orchestrator must have permission to provide pre-signed URLs to download the results.
IPFS Publisher
The IPFS Publisher uploads outputs to the InterPlanetary File System. Both the client (downloading the result) and the compute node must be connected to an IPFS daemon.
Local Publisher
The Local Publisher saves outputs to the local filesystem of the compute node that ran your job. This is intended for local testing only, as it requires the client downloading the results to be on the same network as the compute node.
Troubleshooting
No Outputs Found
If you don't see expected outputs:
Check that your job wrote to the directories specified in your
ResultPaths
Verify the job completed successfully with
bacalhau job describe <jobID>
Check for errors in the logs with
bacalhau job logs <jobID>
S3 Publishing Issues
For S3 publisher problems:
Ensure compute nodes have proper IAM roles or credentials to write to the bucket
Check that the orchestrator has permissions to generate pre-signed URLs
IPFS Publishing Issues
For IPFS publisher issues:
Ensure IPFS daemon is running on both compute node and client
Check for network connectivity between nodes
Verify you have enough disk space for pinning
Last updated
Was this helpful?