Bacalhau Amplify is a tool for automatically explaining, enriching and enhancing your data. This document explains how it works and how to use it.
What is Bacalhau Amplify?
Amplify is both a service and a tool that leverages the power of Bacalhau to automatically run a wide range of data engineering tasks on your data.
It works by running a separate service, the Amplify daemon, that hosts a bundled UI and API, and all of the logic to manage jobs and communicate with executors.
It is designed to be easily extended and used in a variety of deployment scenarios.
Who is Bacalhau Amplify For?
Amplify is for anyone who wants to automatically run data engineering tasks on their data. You can choose to use the hosted service or deploy it yourself.
There are four ways you can leverage Amplify, depending on your needs.
Beginners -- Use the UI and the Hosted Service
If you're just getting started with Bacalhau, or you don't want to manage your own infrastructure, you can use the hosted Amplify UI.
To get started, visit amplify.bacalhau.org and click on the
Submit a Job button on the dashboard.
This currently only accepts a CID as an input.
Developers -- As a Service -- Use the Amplify API
If you're a developer and you want to integrate Amplify into your own application, you can use the Amplify API.
On-Prem Developers -- Use the Amplify Container
If you're a developer and you want to run Amplify on your own infrastructure, you can use the Amplify container.
To start Amplify as a service, like the hosted version, run:
docker run -p 8080:8080 ghcr.io/bacalhau-project/amplify:latest serve
To run a single job without starting the service, run:
docker run ghcr.io/bacalhau-project/amplify:latest run QmS8ioZzB8foNEwFmJZTVJT1se5ycgRuc1Ey5fjHfZi5wb
You can replace that CID with your own!
Advanced Users -- Use the Amplify Binary
If you're an advanced user and you want to bundle amplify, then you can use the Amplify binary, or indeed the raw Go code.
You can find the most recent binary builds on the releases page.
- Download the latest version for your platform.
- Untar the file.
- Make the binary executable and place it in a location that is on the PATH.
Now you can run the binary:
amplify serve # for the service
amplify run QmS8ioZzB8foNEwFmJZTVJT1se5ycgRuc1Ey5fjHfZi5wb # for a single job
Amplify can be configured using parameters or environment variables.
Get the most recent configuration options by passing
-h to the subcommand of your choice:
docker run ghcr.io/bacalhau-project/amplify:latest serve -h
By default, Amplify runs with an in-memory database. But that implementation is very bare-bones and obviously, you will lose historical information when it restarts. We recommend running Amplify with a PostgreSQL database.
Start a PostgreSQL database and then point Amplify to it using the
AMPLIFY_DB_URI environment variable:
docker network create anet
docker run -p 5432:5432 --network=anet --name amplify-postgres -e POSTGRES_DB=amplify -e POSTGRES_PASSWORD=mysecretpassword -d postgres
docker run -p 8080:8080 --network=anet --env AMPLIFY_DB_URI="postgres://postgres:email@example.com/amplify?sslmode=disable" ghcr.io/bacalhau-project/amplify:latest serve
Amplify workflows are executed via a trigger. As of May 2023, Amplify supports the following triggers:
- API -- A trigger that accepts a POST request with a CID as the body.
- IPFS-Search.com -- A trigger that watches an IPFS-Search.com index for new IPFS CIDs. This must be enabled in the configuration.
Creating New Workflows and Jobs
It's really easy to add new workflows and jobs to Amplify. You can see the existing workflows and jobs in the
config.yaml file in the repository.
Jobs are simply Docker containers that are executed in Bacalhau. Workflows connect jobs into an execution graph. To find out more please read the developer documentation.
All of the documentation intended for a developer audience is located in the developer documentation of the repository.