Good news everyone! You can now run your Bacalhau-IPFS stack in Docker.
This page describes several ways in which to operate Bacalhau. You can choose the method that best suits your needs. The methods are:
This guide works best on a Linux machine. If you're trying to run this on a Mac, you may encounter issues. Remember that network host mode doesn't work.
You need to have Docker installed. If you don't have it, you can install it here.
This method is appropriate for those who:
Provide compute resources to the public Bacalhau network
This is not appropriate for:
Testing and development
Running a private network
This will start a local IPFS node and connect it to the public DHT. If you already have an IPFS node running, then you can skip this step.
Some notes about this command:
It wipes the $(pwd)/ipfs
directory to make sure you have a clean slate
It runs the IPFS container in the specified Docker network
It exposes the IPFS API port to the world on port 4002, to avoid clashes with Bacalhau
It exposes the admin RPC API to the local host only, on port 5001
We are not specifying or removing the bootstrap nodes, so it will default to connecting to public machines
You can now test that the IPFS node is working.
Bacalhau consists of two parts: a "requester" that is responsible for operating the API and managing jobs, and a "compute" element that is responsible for executing jobs. In a public context, you'd typically just run a compute node, and allow the public requesters to handle the traffic.
Notes about the command:
It runs the Bacalhau container in "host" mode. This means that the container will use the same network as the host.
It uses the root
user, which is the default system user that has access to the Docker socket on a Mac. You may need to change this to suit your environment.
It mounts the Docker Socket
It mounts the /tmp
directory
It exposes the Bacalhau API ports to the world
The container version should match that of the current release
The IPFS connect string points to the RPC port of the IPFS node in Docker. Because Bacalhau is running in the same network, it can use DNS to find the IPFS container IP. If you're running your own node, replace it
The --node-type
flag is set to compute
because we only want to run a compute node
The --labels
flag is used to set a human-readable label for the node, and so we can run jobs on our machine later
We specify the --peer env
flag so that it uses the environment specified by BACALHAU_ENVIRONMENT=production
and therefore connects to the public network peers
There are several ways to ensure that the Bacalhau compute node is connected to the network.
First, check that the Bacalhau libp2p port is open and connected. On Linux you can run lsof
and it should look something like this:
Note the three established connections at the bottom. These are the production bootstrap nodes that Bacalhau is now connected to.
You can also check that the node is connected by listing the current network peers and grepping for your IP address or node ID. The node ID can be obtained from the Bacalhau logs. It will look something like this:
Finally, submit a job with the label you specified when you ran the compute node. If this label is unique, there should be only one node with this label. The job should succeed. Run the following:
If instead, your job fails with the following error, it means that the compute node is not connected to the network:
This method is insecure. It does not lock down the IPFS node. Anyone connected to your network can access the IPFS node and read/write data. This is not recommended for production use.
This method is appropriate for:
Testing and development
Evaluating the Bacalhau platform before scaling jobs via the public network
This method is useful for testing and development. It's easier to use because it doesn't require a secret IPFS swarm key -- this is essentially an authentication token that allows you to connect to the node.
This method is not appropriate for:
Secure, private use
Production use
To run an insecure, private node, you need to initialize your IPFS configuration by removing all of the default public bootstrap nodes. Then we run the node in the normal way, without the special LIBP2P_FORCE_PNET
flag that checks for a secure private connection.
Some notes about this command:
It wipes the $(pwd)/ipfs
directory to make sure you have a clean slate
It removes the default bootstrap nodes
It runs the IPFS container in the specified Docker network
It exposes the IPFS API port to the local host only, to prevent accidentally exposing the IPFS node, on 4002, to avoid clashes with Bacalhau
It exposes the admin RPC API to the local host only, on port 5001
You can now test that the IPFS node is working.
Bacalhau consists of two parts: a "requester" that is responsible for operating the API and managing jobs, and a "compute" element that is responsible for executing jobs. In a public context, you'd typically just run a compute node, and allow the public requesters to handle the traffic. But in a private context, you'll want to run both.
Notes about the command:
It runs the Bacalhau container in the specified Docker network
It uses the root
user, which is the default system user that has access to the Docker socket on a Mac. You may need to change this to suit your environment
It mounts the Docker Socket
It mounts the /tmp
directory and specifies this as the location where Bacalhau will write temporary execution data (BACALHAU_NODE_COMPUTESTORAGEPATH
)
It exposes the Bacalhau API ports to the local host only, to prevent accidentally exposing the API to the public internet
The container version should match that of the Bacalhau installed on your system
The IPFS connect string points to the RPC port of the IPFS node. Because Bacalhau is running in the same network, it can use DNS to find the IPFS container IP.
The --node-type
flag is set to requester,compute
because we want to run both a requester and a compute node
You can now test that Bacalhau is working.
Now it's time to run a job. Recall that you exposed the Bacalhau API on the default ports to the local host only. So you'll need to use the --api-host
flag to tell Bacalhau where to find the API. Everything else is a standard part of the Bacalhau CLI.
The job should succeed. Run it again but this time capture the job ID to make it easier to retrieve the results.
To retrieve the results using the Bacalhau CLI, you need to know the p2p swarm multiaddress of the IPFS node because you don't want to connect to the public global IPFS network. To do that you can run the IPFS id command (and parse to remove the trub at the bottom of the barrel):
Note that the command above changes the reported port from 4001 to 4002. This is because the IPFS node is running on port 4002, but the IPFS id command reports the port as 4001.
Now get the results:
Alternatively, you can use the Docker container, mount the results volume, and change the --api-host
to the name of the Bacalhau container and the --ipfs-swarm-addrs
back to port 4001:
Running a private secure network is useful in a range of scenarios, including:
Running a private network for a private project
You need two things. A private IPFS node to store data and a Bacalhau node to execute over that data. To keep the nodes private you need to tell the nodes to shush and use a secret key. This is a bit harder to use, and a bit more involved than the insecure version.
Private IPFS nodes are experimental. See the IPFS documentation for more information.
First, you need to bootstrap a new IPFS cluster for your own private use. This consists of a process of generating a swarm key, removing any bootstrap nodes, and then starting the IPFS node.
Some notes about this command:
It wipes the $(pwd)/ipfs
directory to make sure you have a clean slate
It generates a new swarm key -- this is the token that is required to connect to this node
It removes the default bootstrap nodes
It runs the IPFS container in the specified Docker network
It exposes the IPFS API port to the local host only, to prevent accidentally exposing the IPFS node, on 4002, to avoid clashes with Bacalhau
It exposes the admin RPC API to the local host only, on port 5001
The instructions to run a secure private Bacalhau network are the same as the insecure version, please follow those instructions.
The instructions to run a job are the same as the insecure version, please follow those instructions.
The same process as above can be used to retrieve results from the IPFS node as long as the Bacalhau get
command has access to the IPFS swarm key.
Running the Bacalhau binary from outside of Docker:
Alternatively, you can use the Docker container, mount the results volume, and change the --api-host
to the name of the Bacalhau container and the --ipfs-swarm-addrs
back to port 4001:
Without this, inter-container DNS will not work, and internet access may not work either.
Double check that this network can access the internet (so Bacalhau can call external URLs).
This should be successful. If it is not, then please troubleshoot your docker networking. For example, on my Mac, I had to totally uninstall Docker, restart the computer, and then reinstall Docker. Then it worked. Also check https://docs.docker.com/desktop/troubleshoot/known-issues/. Apparently "ping from inside a container to the Internet does not work as expected.". No idea what that means. How do you break ping?
You can now browse the IPFS web UI at http://127.0.0.1:5001/webui.
Read more about the IPFS docker image here.
As described in their documentation, never expose the RPC API port (port 5001) to the public internet.
Ensure that the Bacalhau logs (docker logs bacalhau
) have no errors.
Check that your Bacalhau installation is the same version:
The versions should match. Alternatively, you can use the Docker container:
Perform a list command to ensure you can connect to the Bacalhau API.
It should return empty.
If you are retrieving and running images from docker hub you may encounter issues with rate-limiting. Docker provides higher limits when authenticated, the size of the limit is based on the type of your account.
Should you wish to authenticate with Docker Hub when pulling images, you can do so by specifying credentials as environment variables wherever your compute node is running.
Currently, this authentication is only available (and required) by the Docker Hub
Environment variable | Description |
---|---|
DOCKER_USERNAME
The username with which you are registered at https://hub.docker.com/
DOCKER_PASSWORD
A read-only access token, generated from the page at https://hub.docker.com/settings/security>