Bacalhau Docs
GithubSlackBlogEnterprise
v1.5.x
  • Documentation
  • Use Cases
  • CLI & API
  • References
  • Community
v1.5.x
  • Welcome
  • Getting Started
    • How Bacalhau Works
    • Installation
    • Create Network
    • Hardware Setup
    • Container Onboarding
      • Docker Workloads
      • WebAssembly (Wasm) Workloads
  • Setting Up
    • Running Nodes
      • Node Onboarding
      • GPU Installation
      • Job selection policy
      • Access Management
      • Node persistence
      • Connect Storage
      • Configuring Transport Level Security
      • Limits and Timeouts
      • Test Network Locally
      • Bacalhau WebUI
      • Private IPFS Network Setup
    • Workload Onboarding
      • Container
        • Docker Workload Onboarding
        • WebAssembly (Wasm) Workloads
        • Bacalhau Docker Image
        • How To Work With Custom Containers in Bacalhau
      • Python
        • Building and Running Custom Python Container
        • Running Pandas on Bacalhau
        • Running a Python Script
        • Running Jupyter Notebooks on Bacalhau
        • Scripting Bacalhau with Python
      • R (language)
        • Building and Running your Custom R Containers on Bacalhau
        • Running a Simple R Script on Bacalhau
      • Run CUDA programs on Bacalhau
      • Running a Prolog Script
      • Reading Data from Multiple S3 Buckets using Bacalhau
      • Running Rust programs as WebAssembly (WASM)
      • Generate Synthetic Data using Sparkov Data Generation technique
    • Data Ingestion
      • Copy Data from URL to Public Storage
      • Pinning Data
      • Running a Job over S3 data
    • Networking Instructions
      • Accessing the Internet from Jobs
      • Utilizing NATS.io within Bacalhau
    • GPU Workloads Setup
    • Automatic Update Checking
    • Marketplace Deployments
      • Google Cloud Marketplace
  • Guides
    • (Updated) Configuration Management
    • Write a config.yaml
    • Write a SpecConfig
  • Examples
    • Data Engineering
      • Using Bacalhau with DuckDB
      • Ethereum Blockchain Analysis with Ethereum-ETL and Bacalhau
      • Convert CSV To Parquet Or Avro
      • Simple Image Processing
      • Oceanography - Data Conversion
      • Video Processing
    • Model Inference
      • EasyOCR (Optical Character Recognition) on Bacalhau
      • Running Inference on Dolly 2.0 Model with Hugging Face
      • Speech Recognition using Whisper
      • Stable Diffusion on a GPU
      • Stable Diffusion on a CPU
      • Object Detection with YOLOv5 on Bacalhau
      • Generate Realistic Images using StyleGAN3 and Bacalhau
      • Stable Diffusion Checkpoint Inference
      • Running Inference on a Model stored on S3
    • Model Training
      • Training Pytorch Model with Bacalhau
      • Training Tensorflow Model
      • Stable Diffusion Dreambooth (Finetuning)
    • Molecular Dynamics
      • Running BIDS Apps on Bacalhau
      • Coresets On Bacalhau
      • Genomics Data Generation
      • Gromacs for Analysis
      • Molecular Simulation with OpenMM and Bacalhau
  • References
    • Jobs Guide
      • Job Specification
        • Job Types
        • Task Specification
          • Engines
            • Docker Engine Specification
            • WebAssembly (WASM) Engine Specification
          • Publishers
            • IPFS Publisher Specification
            • Local Publisher Specification
            • S3 Publisher Specification
          • Sources
            • IPFS Source Specification
            • Local Source Specification
            • S3 Source Specification
            • URL Source Specification
          • Network Specification
          • Input Source Specification
          • Resources Specification
          • ResultPath Specification
        • Constraint Specification
        • Labels Specification
        • Meta Specification
      • Job Templates
      • Queuing & Timeouts
        • Job Queuing
        • Timeouts Specification
      • Job Results
        • State
    • CLI Guide
      • Single CLI commands
        • Agent
          • Agent Overview
          • Agent Alive
          • Agent Node
          • Agent Version
        • Config
          • Config Overview
          • Config Auto-Resources
          • Config Default
          • Config List
          • Config Set
        • Job
          • Job Overview
          • Job Describe
          • Job Exec
          • Job Executions
          • Job History
          • Job List
          • Job Logs
          • Job Run
          • Job Stop
        • Node
          • Node Overview
          • Node Approve
          • Node Delete
          • Node List
          • Node Describe
          • Node Reject
      • Command Migration
    • API Guide
      • Bacalhau API overview
      • Best Practices
      • Agent Endpoint
      • Orchestrator Endpoint
      • Migration API
    • Node Management
    • Authentication & Authorization
    • Database Integration
    • Debugging
      • Debugging Failed Jobs
      • Debugging Locally
    • Running Locally In Devstack
    • Setting up Dev Environment
  • Help & FAQ
    • Bacalhau FAQs
    • Glossary
    • Release Notes
      • v1.5.0 Release Notes
      • v1.4.0 Release Notes
  • Integrations
    • Apache Airflow Provider for Bacalhau
    • Lilypad
    • Bacalhau Python SDK
    • Observability for WebAssembly Workloads
  • Community
    • Social Media
    • Style Guide
    • Ways to Contribute
Powered by GitBook
LogoLogo

Use Cases

  • Distributed ETL
  • Edge ML
  • Distributed Data Warehousing
  • Fleet Management

About Us

  • Who we are
  • What we value

News & Blog

  • Blog

Get Support

  • Request Enterprise Solutions

Expanso (2025). All Rights Reserved.

On this page

Was this helpful?

Export as PDF

Last updated 7 months ago

Was this helpful?

By default, Bacalhau jobs do not have any access to the internet. This is to keep both compute providers and users safe from malicious activities.

However, by using data volumes, you can read and access your data from within jobs and write back results.

Using Data Volumes

When you submit a Bacalhau job, you must specify the internet locations from which to download data and write results. Both and jobs support these features.

When submitting a Bacalhau job, you can specify the CID (Content IDentifier) or HTTP(S) URL from which to download data. The data will be retrieved before the job starts and made available to the job as a directory on the filesystem. When running Bacalhau jobs, you can specify as many CIDs or URLs as needed using --input which is accepted by both bacalhau docker run and bacalhau wasm run. See for more information.

You can write back results from your Bacalhau jobs to your public storage location. By default, jobs will write results to the storage provider using the --publisher command line flag. See on how to configure this.

To use these features, the data to be downloaded has to be known before the job starts. For some workloads, the required data is computed as part of the job if the purpose of the job is to process web results. In these cases, networking may be possible during job execution.

Specifying Jobs to Access the Internet

To run Docker jobs on Bacalhau to access the internet, you'll need to specify one of the following:

  1. full: unfiltered networking for any protocol --network=full

  2. http: HTTP(S)-only networking to a specified list of domains --network=http

  3. none: no networking at all, the default --network=none

Specifying none will still allow Bacalhau to download and upload data before and after the job.

Jobs using http must specify the domains they want to access when the job is submitted. When the job runs, only HTTP requests to those domains will be possible, and data transfer will be rate-limited to 10Mbit/sec in either direction to prevent DDOS.

Jobs will be provided with which contain a TCP address of an HTTP proxy to connect through. Most tools and libraries will use these environment variables by default. If not, they must be used by user code to configure HTTP proxy usage.

The required networking can be specified using the --network flag. For http networking, the required domains can be specified using the --domain flag, multiple times for as many domains as required. Specifying a domain starting with a . means that all sub-domains will be included. For example, specifying .example.com will cover some.thing.example.com as well as example.com.

Bacalhau jobs are explicitly prevented from starting other Bacalhau jobs, even if a Bacalhau requester node is specified on the HTTP allowlist.

Support for networked jobs on the public network

Bacalhau has support for describing jobs that can access the internet during job execution. The ability for compute nodes to run jobs that require internet access depends on what compute nodes are currently part of the network.

Compute nodes that join the Bacalhau network do not accept networked jobs by default (i.e. they only accept jobs that specify --network=none, which is also the default).

The public compute nodes provided by the Bacalhau network will accept jobs that require HTTP networking as long as the domains are from .

If you need to access a domain that isn't on the allowlist, you can make a request to the to include your required domains. You can also set up your own compute node that implements the allowlist you need.

  1. Setting Up
  2. Networking Instructions

Accessing the Internet from Jobs

PreviousNetworking InstructionsNextUtilizing NATS.io within Bacalhau
  • Using Data Volumes
  • Specifying Jobs to Access the Internet
  • Support for networked jobs on the public network
Docker
WebAssembly
command line flags
command line flags
http_proxy and https_proxy environment variables
this allowlist
Bacalhau Project team