Running a Compute Node Using Docker
Good news everyone! You can now run your Bacalhau-IPFS stack in Docker.
This page describes several ways in which to operate Bacalhau. You can choose the method that best suits your needs. The methods are:
Pre-Prerequisites
This guide works best on a Linux machine. If you're trying to run this on a Mac, you may encounter issues. Remember that network host mode doesn't work.
You need to have Docker installed. If you don't have it, you can install it here.
Connect to the Public Bacalhau Network Using Docker
This method is appropriate for those who:
Provide compute resources to the public Bacalhau network
This is not appropriate for:
Testing and development
Running a private network
Prerequisites
(Optional) Start a Public IPFS Node
This will start a local IPFS node and connect it to the public DHT. If you already have an IPFS node running, then you can skip this step.
Some notes about this command:
It wipes the
$(pwd)/ipfs
directory to make sure you have a clean slateIt runs the IPFS container in the specified Docker network
It exposes the IPFS API port to the world on port 4002, to avoid clashes with Bacalhau
It exposes the admin RPC API to the local host only, on port 5001
We are not specifying or removing the bootstrap nodes, so it will default to connecting to public machines
You can now test that the IPFS node is working.
Start a Public Bacalhau Node
Bacalhau consists of two parts: a "requester" that is responsible for operating the API and managing jobs, and a "compute" element that is responsible for executing jobs. In a public context, you'd typically just run a compute node, and allow the public requesters to handle the traffic.
Notes about the command:
It runs the Bacalhau container in "host" mode. This means that the container will use the same network as the host.
It uses the
root
user, which is the default system user that has access to the Docker socket on a Mac. You may need to change this to suit your environment.It mounts the Docker Socket
It mounts the
/tmp
directoryIt exposes the Bacalhau API ports to the world
The container version should match that of the current release
The IPFS connect string points to the RPC port of the IPFS node in Docker. Because Bacalhau is running in the same network, it can use DNS to find the IPFS container IP. If you're running your own node, replace it
The
--node-type
flag is set tocompute
because we only want to run a compute nodeThe
--labels
flag is used to set a human-readable label for the node, and so we can run jobs on our machine laterWe specify the
--peer env
flag so that it uses the environment specified byBACALHAU_ENVIRONMENT=production
and therefore connects to the public network peers
There are several ways to ensure that the Bacalhau compute node is connected to the network.
First, check that the Bacalhau libp2p port is open and connected. On Linux you can run lsof
and it should look something like this:
Note the three established connections at the bottom. These are the production bootstrap nodes that Bacalhau is now connected to.
You can also check that the node is connected by listing the current network peers and grepping for your IP address or node ID. The node ID can be obtained from the Bacalhau logs. It will look something like this:
Finally, submit a job with the label you specified when you ran the compute node. If this label is unique, there should be only one node with this label. The job should succeed. Run the following:
If instead, your job fails with the following error, it means that the compute node is not connected to the network:
Run a Private Bacalhau Network Using Docker (Insecure)
This method is insecure. It does not lock down the IPFS node. Anyone connected to your network can access the IPFS node and read/write data. This is not recommended for production use.
This method is appropriate for:
Testing and development
Evaluating the Bacalhau platform before scaling jobs via the public network
This method is useful for testing and development. It's easier to use because it doesn't require a secret IPFS swarm key -- this is essentially an authentication token that allows you to connect to the node.
This method is not appropriate for:
Secure, private use
Production use
Prerequisites
Start a Local IPFS Node (Insecure)
To run an insecure, private node, you need to initialize your IPFS configuration by removing all of the default public bootstrap nodes. Then we run the node in the normal way, without the special LIBP2P_FORCE_PNET
flag that checks for a secure private connection.
Some notes about this command:
It wipes the
$(pwd)/ipfs
directory to make sure you have a clean slateIt removes the default bootstrap nodes
It runs the IPFS container in the specified Docker network
It exposes the IPFS API port to the local host only, to prevent accidentally exposing the IPFS node, on 4002, to avoid clashes with Bacalhau
It exposes the admin RPC API to the local host only, on port 5001
You can now test that the IPFS node is working.
Start a Private Bacalhau Node
Bacalhau consists of two parts: a "requester" that is responsible for operating the API and managing jobs, and a "compute" element that is responsible for executing jobs. In a public context, you'd typically just run a compute node, and allow the public requesters to handle the traffic. But in a private context, you'll want to run both.
Notes about the command:
It runs the Bacalhau container in the specified Docker network
It uses the
root
user, which is the default system user that has access to the Docker socket on a Mac. You may need to change this to suit your environmentIt mounts the Docker Socket
It mounts the
/tmp
directory and specifies this as the location where Bacalhau will write temporary execution data (BACALHAU_NODE_COMPUTESTORAGEPATH
)It exposes the Bacalhau API ports to the local host only, to prevent accidentally exposing the API to the public internet
The container version should match that of the Bacalhau installed on your system
The IPFS connect string points to the RPC port of the IPFS node. Because Bacalhau is running in the same network, it can use DNS to find the IPFS container IP.
The
--node-type
flag is set torequester,compute
because we want to run both a requester and a compute node
You can now test that Bacalhau is working.
Run a Job on the Private Network
Now it's time to run a job. Recall that you exposed the Bacalhau API on the default ports to the local host only. So you'll need to use the --api-host
flag to tell Bacalhau where to find the API. Everything else is a standard part of the Bacalhau CLI.
The job should succeed. Run it again but this time capture the job ID to make it easier to retrieve the results.
Retrieve the Results on the Private Network (Insecure)
To retrieve the results using the Bacalhau CLI, you need to know the p2p swarm multiaddress of the IPFS node because you don't want to connect to the public global IPFS network. To do that you can run the IPFS id command (and parse to remove the trub at the bottom of the barrel):
Note that the command above changes the reported port from 4001 to 4002. This is because the IPFS node is running on port 4002, but the IPFS id command reports the port as 4001.
Now get the results:
Alternatively, you can use the Docker container, mount the results volume, and change the --api-host
to the name of the Bacalhau container and the --ipfs-swarm-addrs
back to port 4001:
Run a Private Bacalhau Network Using Docker (Secure)
Running a private secure network is useful in a range of scenarios, including:
Running a private network for a private project
You need two things. A private IPFS node to store data and a Bacalhau node to execute over that data. To keep the nodes private you need to tell the nodes to shush and use a secret key. This is a bit harder to use, and a bit more involved than the insecure version.
Prerequisites
Start a Private IPFS Node (Secure)
Private IPFS nodes are experimental. See the IPFS documentation for more information.
First, you need to bootstrap a new IPFS cluster for your own private use. This consists of a process of generating a swarm key, removing any bootstrap nodes, and then starting the IPFS node.
Some notes about this command:
It wipes the
$(pwd)/ipfs
directory to make sure you have a clean slateIt generates a new swarm key -- this is the token that is required to connect to this node
It removes the default bootstrap nodes
It runs the IPFS container in the specified Docker network
It exposes the IPFS API port to the local host only, to prevent accidentally exposing the IPFS node, on 4002, to avoid clashes with Bacalhau
It exposes the admin RPC API to the local host only, on port 5001
Start a Private Bacalhau Node (Secure)
The instructions to run a secure private Bacalhau network are the same as the insecure version, please follow those instructions.
Run a Job on the Private Network (Secure)
The instructions to run a job are the same as the insecure version, please follow those instructions.
Retrieve the Results on the Private Network (Secure)
The same process as above can be used to retrieve results from the IPFS node as long as the Bacalhau get
command has access to the IPFS swarm key.
Running the Bacalhau binary from outside of Docker:
Alternatively, you can use the Docker container, mount the results volume, and change the --api-host
to the name of the Bacalhau container and the --ipfs-swarm-addrs
back to port 4001:
Common Prerequisites
Create a New Docker Network
Without this, inter-container DNS will not work, and internet access may not work either.
Double check that this network can access the internet (so Bacalhau can call external URLs).
This should be successful. If it is not, then please troubleshoot your docker networking. For example, on my Mac, I had to totally uninstall Docker, restart the computer, and then reinstall Docker. Then it worked. Also check https://docs.docker.com/desktop/troubleshoot/known-issues/. Apparently "ping from inside a container to the Internet does not work as expected.". No idea what that means. How do you break ping?
Test that the IPFS Node is Working
You can now browse the IPFS web UI at http://127.0.0.1:5001/webui.
Read more about the IPFS docker image here.
As described in their documentation, never expose the RPC API port (port 5001) to the public internet.
Test that the Bacalhau Node is Working
Ensure that the Bacalhau logs (docker logs bacalhau
) have no errors.
Check that your Bacalhau installation is the same version:
The versions should match. Alternatively, you can use the Docker container:
Perform a list command to ensure you can connect to the Bacalhau API.
It should return empty.
Authenticate with docker hub
If you are retrieving and running images from docker hub you may encounter issues with rate-limiting. Docker provides higher limits when authenticated, the size of the limit is based on the type of your account.
Should you wish to authenticate with Docker Hub when pulling images, you can do so by specifying credentials as environment variables wherever your compute node is running.
Environment variable | Description |
---|---|
DOCKER_USERNAME | The username with which you are registered at https://hub.docker.com/ |
DOCKER_PASSWORD | A read-only access token, generated from the page at https://hub.docker.com/settings/security> |
Currently, this authentication is only available (and required) by the Docker Hub