1 of 5

Networking Instructions

Networking Fundamentals in Bacalhau

Bacalhau uses libp2p under the hood to communicate with other nodes on the network.

Peer identity

Because bacalhau is built using libp2p, the concept of peer identity is used to identify nodes on the network.

When you start a bacalhau node using bacalhau serve, it will look for an RSA private key in the ~/.bacalhau directory. If it doesn't find one, it will generate a new one and save it there.

You can override the directory where the private key is stored using the BACALHAU_PATH environment variable.

Private keys are named after the port used for the libp2p connection which defaults to 1235. By default when first starting a node, the private key will be stored in ~/.bacalhau/private_key.1235.

The peer identity is derived from the private key and is used to identify the node on the network. You can get the peer identity of a node by running bacalhau id:

bacalhau id

Configure peers

By default, running bacalhau serve will connect to the following nodes (which are the default bootstrap nodes run by Protocol labs):

/ip4/35.245.115.191/tcp/1235/p2p/QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL
/ip4/35.245.61.251/tcp/1235/p2p/QmXaXu9N5GNetatsvwnTfQqNtSeKAD6uCmarbh3LMRYAcF
/ip4/35.245.251.239/tcp/1235/p2p/QmYgxZiySj3MRkwLSL4X2MF5F9f2PMhAE3LV49XkfNL1o3

Bacalhau uses libp2p multiaddresses to identify nodes on the network.

If you want to connect to other nodes, and you know their Peer IDs you can use the --peer flag to specify additional peers to connect to (comma-separated list).

bacalhau serve \
  --peer /ip4/35.245.115.191/tcp/1235/p2p/QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL,/ip4/35.245.61.251/tcp/1235/p2p/QmXaXu9N5GNetatsvwnTfQqNtSeKAD6uCmarbh3LMRYAcF

If you want to connect to a requester node, and you know it's IP but not it's Peer ID, you can use the following which will contact the requester API directly and ask for the current Peer ID instead.

bacalhau serve \
  --peer /ip4/35.245.115.191/tcp/1234/http

libp2p swarm port

The default port the libp2p swarm listens on is 1235.

You can configure the swarm port using the --port flag:

bacalhau serve \
  --port 1235

To ensure that the node can communicate with other nodes on the network, make sure the swarm port is open and accessible by other nodes.

REST API port

The Bacalhau node exposes a REST API that can be used to query the node for information.

The default port the REST API listens on is 1234.

The default network interface the REST API listens on is 0.0.0.0.

You can configure the REST API port using the --api-port flag:

You can also configure which network interface the REST API will bind to using the --host flag:

bacalhau serve \
  --api-port 1234 \
  --host 127.0.0.1

:::tip

You can use the --host flag to restrict network access to the REST API.

:::

Generic Endpoint

You can call http://dashboard.bacalhau.org:1000/api/v1/run with the POST body as a JSON serialized spec

curl -XPOST -d '{"Engine": "Docker", "Docker": {"Image": "ubuntu", "Entrypoint": ["echo"], "Parameters": ["hello"]}, "Deal": {"Concurrency": 1}, "Verifier": "Noop", "PublisherSpec":{"Type":"IPFS"}}' 'http://dashboard.bacalhau.org:1000/api/v1/run';

Once you run the command above, you'll get a CID output:

"cid": "QmeNhAA97qtdGHQtd1Qvgk13C6GHkn6aTCT8ih53JLN7vL"

Networking with Bacalhau

This directory contains instructions on how to setup the networking in Bacalhau.

Accessing the Internet from Jobs

By default, Bacalhau jobs do not have any access to the internet. This is to keep both compute providers and users safe from malicious activities.

However, by using data volumes you can read and access your data from within jobs and write back results.

Using Data Volumes

When you submit a Bacalhau job, you'll need to specify the internet locations to download data from and write results to. Both Docker and WebAssembly jobs support these features.

When submitting a Bacalhau job, you can specify the CID (Content IDentifier) or HTTP(S) URL to download data from. The data will be retrieved before the job starts and made available to the job as a directory on the filesystem. When running Bacalhau jobs, you can specify as many CIDs or URLs as needed using --input which is accepted by both bacalhau docker run and bacalhau wasm run. See command line flags for more information.

You can write back results from your Bacalhau jobs to your public storage location. By default, jobs will write results to the storage provider using the --publisher command line flag. See command line flags on how to configure this.

To use these features, the data to be downloaded has to be known before the job starts. For some workloads, the required data is computed as part of the job if the purpose of the job is to process web results. In these cases, networking may be possible during job execution.

Specifying Jobs to Access the Internet

To run Docker jobs on Bacalhau to access the internet, you'll need to specify one of the following:

full: unfiltered networking for any protocol --network=full
http: HTTP(S)-only networking to a specified list of domains --network=http
none: no networking at all, the default --network=none

:::tip Specifying none will still allow Bacalhau to download and upload data before and after the job. :::

Jobs using http must specify the domains they want to access when the job is submitted. When the job runs, only HTTP requests to those domains will be possible and data transfer will be rate limited to 10Mbit/sec in either direction to prevent ddos.

Jobs will be provided with http_proxy and https_proxy environment variables which contain a TCP address of an HTTP proxy to connect through. Most tools and libraries will use these environment variables by default. If not, they must be used by user code to configure HTTP proxy usage.

The required networking can be specified using the --network flag. For http networking, the required domains can be specified using the --domain flag, multiple times for as many domains as required. Specifying a domain starting with a . means that all sub-domains will be included. For example, specifying .example.com will cover some.thing.example.com as well as example.com.

:::caution Bacalhau jobs are explicitly prevented from starting other Bacalhau jobs, even if a Bacalhau requester node is specified on the HTTP allowlist. :::

Support for networked jobs on the public network

Bacalhau has support for describing jobs that can access the internet during job execution. The ability for compute nodes to run jobs that require internet access depends on what compute nodes are currently part of the network.

Compute nodes that join the Bacalhau network do not accept networked jobs by default (i.e. they only accept jobs that specify --network=none, which is also the default).

The public compute nodes provided by the Bacalhau network will accept jobs that require HTTP networking as long as the domains are from this allowlist.

If you need to access a domain that isn't on the allowlist, you can make a request to the Bacalhau Project team to include your required domains. You can also set up your own compute node that implements the allowlist you need.

Private Cluster

It is possible to run Bacalhau completely disconnected from the main Bacalhau network so that you can run private workloads without risking running on public nodes or inadvertently sharing your data outside of your organization. The isolated network will not connect to the public Bacalhau network nor connect to a public network. To do this, we will run our network in-process rather than externally.

:::info A private network and storage is easier to set up, but a separate public server is better for production. The private network and storage will use a temporary directory for its repository and so the contents will be lost on shutdown. :::

Initial Requester Node

The first step is to start up the initial node, which we will use as the requester node. This node will connect to nothing but will listen for connections.

bacalhau serve --node-type requester --private-internal-ipfs --peer none

This will produce output similar to this:

16:34:17.154 | INF pkg/libp2p/host.go:69 > started libp2p host [host-id:QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE] [listening-addresses:["/ip4/192.168.1.224/tcp/1235","/ip4/127.0.0.1/tcp/1235","/ip4/192.168.1.224/udp/1235/quic","/ip4/127.0.0.1/udp/1235/quic","/ip6/::1/tcp/1235","/ip6/::1/udp/1235/quic"]] [p2p-addresses:["/ip4/192.168.1.224/tcp/1235/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE","/ip4/127.0.0.1/tcp/1235/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE","/ip4/192.168.1.224/udp/1235/quic/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE","/ip4/127.0.0.1/udp/1235/quic/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE","/ip6/::1/tcp/1235/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE","/ip6/::1/udp/1235/quic/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE"]]
16:34:17.555 | INF cmd/bacalhau/serve.go:506 > Internal IPFS node available [NodeID:QmWg7m5G] [ipfs_swarm_addresses:["/ip4/192.168.1.224/tcp/53291/p2p/QmdCLbe2pUoGjCzffd75U8w1LTiVpSap88rNjzXsBhWkL2","/ip4/127.0.0.1/tcp/53291/p2p/QmdCLbe2pUoGjCzffd75U8w1LTiVpSap88rNjzXsBhWkL2"]]

To connect another node to this private one, run the following command in your shell:

bacalhau serve --private-internal-ipfs --peer /ip4/192.168.1.224/tcp/1235/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE --ipfs-swarm-addr /ip4/192.168.1.224/tcp/53291/p2p/QmdCLbe2pUoGjCzffd75U8w1LTiVpSap88rNjzXsBhWkL2

To use this requester node from the client, run the following commands in your shell:
export BACALHAU_IPFS_SWARM_ADDRESSES=/ip4/192.168.1.224/tcp/53291/p2p/QmdCLbe2pUoGjCzffd75U8w1LTiVpSap88rNjzXsBhWkL2
export BACALHAU_API_HOST=0.0.0.0
export BACALHAU_API_PORT=1234

Compute Nodes

To connect another node to this private one, run the following command in your shell:

bacalhau serve --private-internal-ipfs --peer /ip4/<ip-address>/tcp/1235/p2p/<peer-id> --ipfs-swarm-addr /ip4/<ip-address>/tcp/<port>/p2p/<peer-id>

:::tip The exact command will be different on each computer and is outputted by the bacalhau serve --node-type requester ... command :::

The command bacalhau serve --private-internal-ipfs --peer ... starts up a compute node and adds it to the cluster.

Submitting Jobs

To use this cluster from the client, run the following commands in your shell:

export BACALHAU_IPFS_SWARM_ADDRESSES=/ip4/<ip-address>/tcp/<port>/p2p/<peer-id>
export BACALHAU_API_HOST=0.0.0.0
export BACALHAU_API_PORT=1234

:::tip The exact command will be different on each computer and is outputted by the bacalhau serve --node-type requester ... command :::

The command export BACALHAU_IPFS_SWARM_ADDRESSES=... sends jobs into the cluster from the command line client.

Public IPFS Network

Instructions for connecting to the public IPFS network via the private Bacalhau cluster:

On all nodes, start ipfs:

ipfs init

Then run the following command in your shell:

export IPFS_CONNECT=$(ipfs id |grep tcp |grep 127.0.0.1 |sed s/4001/5001/|sed s/,//g |sed 's/"//g')

On the first node execute the following:

export LOG_LEVEL=debug
bacalhau serve --peer none --ipfs-connect $IPFS_CONNECT --node-type requester,compute

Monitor the output log for: 11:16:03.827 | DBG pkg/transport/bprotocol/compute_handler.go:39 > ComputeHandler started on host QmWXAaSHbbP7mU4GrqDhkgUkX9EscfAHPMCHbrBSUi4A35

On all other nodes execute the following:

export PEER_ADDR=/ip4/<public-ip>/tcp/1235/p2p/<above>

Replace the values in the command above with your own value

Here is our example:

export PEER_ADDR=/ip4/192.18.129.124/tcp/1235/p2p/QmWXAaSHbbP7mU4GrqDhkgUkX9EscfAHPMCHbrBSUi4A35
bacalhau serve --peer $PEER_ADDR --ipfs-connect $IPFS_CONNECT --node-type compute

Then from any client set the following before invoking your Bacalhau job:

export BACALHAU_API_HOST=address-of-first-node

Deploy a private cluster

A private cluster is a network of Bacalhau nodes completely isolated from any public node. That means you can safely process private jobs and data on your cloud or on-premise hosts!

Install Bacalhau curl -sL https://get.bacalhau.org/install.sh | bash on every host
Run bacalhau serve only on one host, this will be our "bootstrap" machine
Copy and paste the command it outputs under the "To connect another node to this private one, run the following command in your shell..." line to the other hosts
Copy and paste the env vars it outputs under the "To use this requester node from the client, run the following commands in your shell..." line to a client machine
Run bacalhau docker run ubuntu echo hello on the client machine

Optionally, set up systemd units make Bacalhau daemons permanent, here's an example systemd service file.

Please contact us on Slack #bacalhau channel for questions and feedback!

Private Cluster

Initial Requester Node

The first step is to start up the initial node, which we will use as the requester node. This node will connect to nothing but will listen for connections.

bacalhau serve --node-type requester --private-internal-ipfs --peer none

This will produce output similar to this:

16:34:17.154 | INF pkg/libp2p/host.go:69 > started libp2p host [host-id:QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE] [listening-addresses:["/ip4/192.168.1.224/tcp/1235","/ip4/127.0.0.1/tcp/1235","/ip4/192.168.1.224/udp/1235/quic","/ip4/127.0.0.1/udp/1235/quic","/ip6/::1/tcp/1235","/ip6/::1/udp/1235/quic"]] [p2p-addresses:["/ip4/192.168.1.224/tcp/1235/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE","/ip4/127.0.0.1/tcp/1235/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE","/ip4/192.168.1.224/udp/1235/quic/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE","/ip4/127.0.0.1/udp/1235/quic/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE","/ip6/::1/tcp/1235/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE","/ip6/::1/udp/1235/quic/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE"]]
16:34:17.555 | INF cmd/bacalhau/serve.go:506 > Internal IPFS node available [NodeID:QmWg7m5G] [ipfs_swarm_addresses:["/ip4/192.168.1.224/tcp/53291/p2p/QmdCLbe2pUoGjCzffd75U8w1LTiVpSap88rNjzXsBhWkL2","/ip4/127.0.0.1/tcp/53291/p2p/QmdCLbe2pUoGjCzffd75U8w1LTiVpSap88rNjzXsBhWkL2"]]

To connect another node to this private one, run the following command in your shell:

bacalhau serve --private-internal-ipfs --peer /ip4/192.168.1.224/tcp/1235/p2p/QmWg7m5GyAhocrd8o18dtntua7dQeEHpuHxC3niRH4pnvE --ipfs-swarm-addr /ip4/192.168.1.224/tcp/53291/p2p/QmdCLbe2pUoGjCzffd75U8w1LTiVpSap88rNjzXsBhWkL2

To use this requester node from the client, run the following commands in your shell:
export BACALHAU_IPFS_SWARM_ADDRESSES=/ip4/192.168.1.224/tcp/53291/p2p/QmdCLbe2pUoGjCzffd75U8w1LTiVpSap88rNjzXsBhWkL2
export BACALHAU_API_HOST=0.0.0.0
export BACALHAU_API_PORT=1234

Compute Nodes

To connect another node to this private one, run the following command in your shell:

bacalhau serve --private-internal-ipfs --peer /ip4/<ip-address>/tcp/1235/p2p/<peer-id> --ipfs-swarm-addr /ip4/<ip-address>/tcp/<port>/p2p/<peer-id>

:::tip The exact command will be different on each computer and is outputted by the bacalhau serve --node-type requester ... command :::

The command bacalhau serve --private-internal-ipfs --peer ... starts up a compute node and adds it to the cluster.

Submitting Jobs

To use this cluster from the client, run the following commands in your shell:

export BACALHAU_IPFS_SWARM_ADDRESSES=/ip4/<ip-address>/tcp/<port>/p2p/<peer-id>
export BACALHAU_API_HOST=0.0.0.0
export BACALHAU_API_PORT=1234

:::tip The exact command will be different on each computer and is outputted by the bacalhau serve --node-type requester ... command :::

The command export BACALHAU_IPFS_SWARM_ADDRESSES=... sends jobs into the cluster from the command line client.

Public IPFS Network

Instructions for connecting to the public IPFS network via the private Bacalhau cluster:

On all nodes, start ipfs:

ipfs init

Then run the following command in your shell:

export IPFS_CONNECT=$(ipfs id |grep tcp |grep 127.0.0.1 |sed s/4001/5001/|sed s/,//g |sed 's/"//g')

On the first node execute the following:

export LOG_LEVEL=debug
bacalhau serve --peer none --ipfs-connect $IPFS_CONNECT --node-type requester,compute

Monitor the output log for: 11:16:03.827 | DBG pkg/transport/bprotocol/compute_handler.go:39 > ComputeHandler started on host QmWXAaSHbbP7mU4GrqDhkgUkX9EscfAHPMCHbrBSUi4A35

On all other nodes execute the following:

export PEER_ADDR=/ip4/<public-ip>/tcp/1235/p2p/<above>

Replace the values in the command above with your own value

Here is our example:

export PEER_ADDR=/ip4/192.18.129.124/tcp/1235/p2p/QmWXAaSHbbP7mU4GrqDhkgUkX9EscfAHPMCHbrBSUi4A35
bacalhau serve --peer $PEER_ADDR --ipfs-connect $IPFS_CONNECT --node-type compute

Then from any client set the following before invoking your Bacalhau job:

export BACALHAU_API_HOST=address-of-first-node

Deploy a private cluster

A private cluster is a network of Bacalhau nodes completely isolated from any public node. That means you can safely process private jobs and data on your cloud or on-premise hosts!

Good news. Spinning up a private cluster is really a piece of cake :

Install Bacalhau curl -sL https://get.bacalhau.org/install.sh | bash on every host
Run bacalhau serve only on one host, this will be our "bootstrap" machine
Copy and paste the command it outputs under the "To connect another node to this private one, run the following command in your shell..." line to the other hosts
Copy and paste the env vars it outputs under the "To use this requester node from the client, run the following commands in your shell..." line to a client machine
Run bacalhau docker run ubuntu echo hello on the client machine
That's all folks!

Optionally, set up systemd units make Bacalhau daemons permanent, here's an example systemd service file.

Please contact us on Slack #bacalhau channel for questions and feedback!

Networking Fundamentals in Bacalhau

Bacalhau uses libp2p under the hood to communicate with other nodes on the network.

Peer identity

Because bacalhau is built using libp2p, the concept of peer identity is used to identify nodes on the network.

When you start a bacalhau node using bacalhau serve, it will look for an RSA private key in the ~/.bacalhau directory. If it doesn't find one, it will generate a new one and save it there.

You can override the directory where the private key is stored using the BACALHAU_PATH environment variable.

Private keys are named after the port used for the libp2p connection which defaults to 1235. By default when first starting a node, the private key will be stored in ~/.bacalhau/private_key.1235.

The peer identity is derived from the private key and is used to identify the node on the network. You can get the peer identity of a node by running bacalhau id:

bacalhau id

Configure peers

By default, running bacalhau serve will connect to the following nodes (which are the default bootstrap nodes run by Protocol labs):

/ip4/35.245.115.191/tcp/1235/p2p/QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL
/ip4/35.245.61.251/tcp/1235/p2p/QmXaXu9N5GNetatsvwnTfQqNtSeKAD6uCmarbh3LMRYAcF
/ip4/35.245.251.239/tcp/1235/p2p/QmYgxZiySj3MRkwLSL4X2MF5F9f2PMhAE3LV49XkfNL1o3

Bacalhau uses libp2p multiaddresses to identify nodes on the network.

If you want to connect to other nodes, and you know their Peer IDs you can use the --peer flag to specify additional peers to connect to (comma-separated list).

bacalhau serve \
  --peer /ip4/35.245.115.191/tcp/1235/p2p/QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL,/ip4/35.245.61.251/tcp/1235/p2p/QmXaXu9N5GNetatsvwnTfQqNtSeKAD6uCmarbh3LMRYAcF

If you want to connect to a requester node, and you know it's IP but not it's Peer ID, you can use the following which will contact the requester API directly and ask for the current Peer ID instead.

bacalhau serve \
  --peer /ip4/35.245.115.191/tcp/1234/http

libp2p swarm port

The default port the libp2p swarm listens on is 1235.

You can configure the swarm port using the --port flag:

bacalhau serve \
  --port 1235

To ensure that the node can communicate with other nodes on the network, make sure the swarm port is open and accessible by other nodes.

REST API port

The Bacalhau node exposes a REST API that can be used to query the node for information.

The default port the REST API listens on is 1234.

The default network interface the REST API listens on is 0.0.0.0.

You can configure the REST API port using the --api-port flag:

You can also configure which network interface the REST API will bind to using the --host flag:

bacalhau serve \
  --api-port 1234 \
  --host 127.0.0.1

:::tip

You can use the --host flag to restrict network access to the REST API.

:::

Generic Endpoint

You can call http://dashboard.bacalhau.org:1000/api/v1/run with the POST body as a JSON serialized spec

curl -XPOST -d '{"Engine": "Docker", "Docker": {"Image": "ubuntu", "Entrypoint": ["echo"], "Parameters": ["hello"]}, "Deal": {"Concurrency": 1}, "Verifier": "Noop", "PublisherSpec":{"Type":"IPFS"}}' 'http://dashboard.bacalhau.org:1000/api/v1/run';

Once you run the command above, you'll get a CID output:

"cid": "QmeNhAA97qtdGHQtd1Qvgk13C6GHkn6aTCT8ih53JLN7vL"

Accessing the Internet from Jobs

By default, Bacalhau jobs do not have any access to the internet. This is to keep both compute providers and users safe from malicious activities.

However, by using data volumes you can read and access your data from within jobs and write back results.

Using Data Volumes

When you submit a Bacalhau job, you'll need to specify the internet locations to download data from and write results to. Both Docker and WebAssembly jobs support these features.

Specifying Jobs to Access the Internet

To run Docker jobs on Bacalhau to access the internet, you'll need to specify one of the following:

full: unfiltered networking for any protocol --network=full
http: HTTP(S)-only networking to a specified list of domains --network=http
none: no networking at all, the default --network=none

:::tip Specifying none will still allow Bacalhau to download and upload data before and after the job. :::

:::caution Bacalhau jobs are explicitly prevented from starting other Bacalhau jobs, even if a Bacalhau requester node is specified on the HTTP allowlist. :::

Support for networked jobs on the public network

Compute nodes that join the Bacalhau network do not accept networked jobs by default (i.e. they only accept jobs that specify --network=none, which is also the default).

The public compute nodes provided by the Bacalhau network will accept jobs that require HTTP networking as long as the domains are from this allowlist.