Bacalhau Docs
GithubSlackBlogEnterprise
v1.5.x
  • Documentation
  • Use Cases
  • CLI & API
  • References
  • Community
v1.5.x
  • Welcome
  • Getting Started
    • How Bacalhau Works
    • Installation
    • Create Network
    • Hardware Setup
    • Container Onboarding
      • Docker Workloads
      • WebAssembly (Wasm) Workloads
  • Setting Up
    • Running Nodes
      • Node Onboarding
      • GPU Installation
      • Job selection policy
      • Access Management
      • Node persistence
      • Connect Storage
      • Configuring Transport Level Security
      • Limits and Timeouts
      • Test Network Locally
      • Bacalhau WebUI
      • Private IPFS Network Setup
    • Workload Onboarding
      • Container
        • Docker Workload Onboarding
        • WebAssembly (Wasm) Workloads
        • Bacalhau Docker Image
        • How To Work With Custom Containers in Bacalhau
      • Python
        • Building and Running Custom Python Container
        • Running Pandas on Bacalhau
        • Running a Python Script
        • Running Jupyter Notebooks on Bacalhau
        • Scripting Bacalhau with Python
      • R (language)
        • Building and Running your Custom R Containers on Bacalhau
        • Running a Simple R Script on Bacalhau
      • Run CUDA programs on Bacalhau
      • Running a Prolog Script
      • Reading Data from Multiple S3 Buckets using Bacalhau
      • Running Rust programs as WebAssembly (WASM)
      • Generate Synthetic Data using Sparkov Data Generation technique
    • Data Ingestion
      • Copy Data from URL to Public Storage
      • Pinning Data
      • Running a Job over S3 data
    • Networking Instructions
      • Accessing the Internet from Jobs
      • Utilizing NATS.io within Bacalhau
    • GPU Workloads Setup
    • Automatic Update Checking
    • Marketplace Deployments
      • Google Cloud Marketplace
  • Guides
    • (Updated) Configuration Management
    • Write a config.yaml
    • Write a SpecConfig
  • Examples
    • Data Engineering
      • Using Bacalhau with DuckDB
      • Ethereum Blockchain Analysis with Ethereum-ETL and Bacalhau
      • Convert CSV To Parquet Or Avro
      • Simple Image Processing
      • Oceanography - Data Conversion
      • Video Processing
    • Model Inference
      • EasyOCR (Optical Character Recognition) on Bacalhau
      • Running Inference on Dolly 2.0 Model with Hugging Face
      • Speech Recognition using Whisper
      • Stable Diffusion on a GPU
      • Stable Diffusion on a CPU
      • Object Detection with YOLOv5 on Bacalhau
      • Generate Realistic Images using StyleGAN3 and Bacalhau
      • Stable Diffusion Checkpoint Inference
      • Running Inference on a Model stored on S3
    • Model Training
      • Training Pytorch Model with Bacalhau
      • Training Tensorflow Model
      • Stable Diffusion Dreambooth (Finetuning)
    • Molecular Dynamics
      • Running BIDS Apps on Bacalhau
      • Coresets On Bacalhau
      • Genomics Data Generation
      • Gromacs for Analysis
      • Molecular Simulation with OpenMM and Bacalhau
  • References
    • Jobs Guide
      • Job Specification
        • Job Types
        • Task Specification
          • Engines
            • Docker Engine Specification
            • WebAssembly (WASM) Engine Specification
          • Publishers
            • IPFS Publisher Specification
            • Local Publisher Specification
            • S3 Publisher Specification
          • Sources
            • IPFS Source Specification
            • Local Source Specification
            • S3 Source Specification
            • URL Source Specification
          • Network Specification
          • Input Source Specification
          • Resources Specification
          • ResultPath Specification
        • Constraint Specification
        • Labels Specification
        • Meta Specification
      • Job Templates
      • Queuing & Timeouts
        • Job Queuing
        • Timeouts Specification
      • Job Results
        • State
    • CLI Guide
      • Single CLI commands
        • Agent
          • Agent Overview
          • Agent Alive
          • Agent Node
          • Agent Version
        • Config
          • Config Overview
          • Config Auto-Resources
          • Config Default
          • Config List
          • Config Set
        • Job
          • Job Overview
          • Job Describe
          • Job Exec
          • Job Executions
          • Job History
          • Job List
          • Job Logs
          • Job Run
          • Job Stop
        • Node
          • Node Overview
          • Node Approve
          • Node Delete
          • Node List
          • Node Describe
          • Node Reject
      • Command Migration
    • API Guide
      • Bacalhau API overview
      • Best Practices
      • Agent Endpoint
      • Orchestrator Endpoint
      • Migration API
    • Node Management
    • Authentication & Authorization
    • Database Integration
    • Debugging
      • Debugging Failed Jobs
      • Debugging Locally
    • Running Locally In Devstack
    • Setting up Dev Environment
  • Help & FAQ
    • Bacalhau FAQs
    • Glossary
    • Release Notes
      • v1.5.0 Release Notes
      • v1.4.0 Release Notes
  • Integrations
    • Apache Airflow Provider for Bacalhau
    • Lilypad
    • Bacalhau Python SDK
    • Observability for WebAssembly Workloads
  • Community
    • Social Media
    • Style Guide
    • Ways to Contribute
Powered by GitBook
LogoLogo

Use Cases

  • Distributed ETL
  • Edge ML
  • Distributed Data Warehousing
  • Fleet Management

About Us

  • Who we are
  • What we value

News & Blog

  • Blog

Get Support

  • Request Enterprise Solutions

Expanso (2025). All Rights Reserved.

On this page
  • Introduction​
  • TL;DR​
  • Prerequisite​
  • Test Run with Sample Data​
  • Structure of the command​
  • Declarative job description​
  • Checking the State of your Jobs​
  • Viewing Output​
  • Using Custom Images as an Input​
  • Support​

Was this helpful?

Export as PDF
  1. Examples
  2. Model Inference

Object Detection with YOLOv5 on Bacalhau

PreviousStable Diffusion on a CPUNextGenerate Realistic Images using StyleGAN3 and Bacalhau

Was this helpful?

Introduction​

The identification and localization of objects in images and videos is a computer vision task called object detection. Several algorithms have emerged in the past few years to tackle the problem. One of the most popular algorithms to date for real-time object detection is , initially proposed

Traditionally, models like YOLO required enormous amounts of training data to yield reasonable results. People might not have access to such high-quality labeled data. Thankfully, open-source communities and researchers have made it possible to utilize pre-trained models to perform inference. In other words, you can use models that have already been trained on large datasets to perform object detection on your own data.

Bacalhau is a highly scalable decentralized computing platform and is well suited to running massive object detection jobs. In this example, you can take advantage of the GPUs available on the Bacalhau Network and perform an end-to-end object detection inference, using the

TL;DR​

Load your dataset into S3/IPFS, specify it and pre-trained weights via the --input flags, choose a suitable container, specify the command and path to save the results - done!

Prerequisite​

To get started, you need to install the Bacalhau client, see more information

Test Run with Sample Data​

To get started, let's run a test job with a small sample dataset that is included in the YOLOv5 Docker Image. This will give you a chance to familiarise yourself with the process of running a job on Bacalhau.

In addition to the usual Bacalhau flags, you will also see example of using the --gpu 1 flag in order to specify the use of a GPU.

Remember that by default Bacalhau does not provide any network connectivity when running a job. So you need to either provide all assets at job submission time, or use the --network=full or --network=http flags to access the data at task time. See the page for more details

The model requires pre-trained weights to run and by default downloads them from within the container. Bacalhau jobs don't have network access so we will pass in the weights at submission time, saving them to /usr/src/app/yolov5s.pt. You may also provide your own weights here.

The container has its own options that we must specify:

  1. --project specifies the output volume that the model will save its results to. Bacalhau defaults to using /outputs as the output directory, so we save it there.

One final additional hack that we have to do is move the weights file to a location with the standard name. As of writing this, Bacalhau downloads the file to a UUID-named file, which the model is not expecting. This is because GitHub 302 redirects the request to a random file in its backend.

Structure of the command​

  1. export JOB_ID=$( ... ) exports the job ID as environment variable

  2. The --gpu 1 flag is set to specify hardware requirements, a GPU is needed to run such a job

  3. The --timeout flag is set to make sure that if the job is not completed in the specified time, it will be terminated

  4. The --wait flag is set to wait for the job to complete before return

  5. The --wait-timeout-secs flag is set together with --wait to define how long should app wait for the job to complete

  6. The --id-only flag is set to print only job id

  7. The --input flags are used to specify the sources of input data

  8. -- /bin/bash -c 'find /inputs -type f -exec cp {} /outputs/yolov5s.pt \; ; python detect.py --weights /outputs/yolov5s.pt --source $(pwd)/data/images --project /outputs' tells the model where to find input data and where to write output

export JOB_ID=$(bacalhau docker run \
--gpu 1 \
--timeout 3600 \
--wait \
--wait-timeout-secs 3600 \
--id-only \
--input https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s.pt \
ultralytics/yolov5:v6.2 \
-- /bin/bash -c 'find /inputs -type f -exec cp {} /outputs/yolov5s.pt \; ; python detect.py --weights /outputs/yolov5s.pt --source $(pwd)/data/images --project /outputs')

This should output a UUID (like 59c59bfb-4ef8-45ac-9f4b-f0e9afd26e70), which will be stored in the environment variable JOB_ID. This is the ID of the job that was created. You can check the status of the job using the commands below.

Declarative job description​

name: Object Detection with YOLOv5
type: batch
count: 1
tasks:
  - name: My main task
    Engine:
      type: docker
      params:
        Image: ultralytics/yolov5:v6.2
        Entrypoint:
          - /bin/bash
        Parameters:
          - -c
          - "find /inputs -type f -exec cp {} /outputs/yolov5s.pt \\; ; python detect.py --weights /outputs/yolov5s.pt --source $(pwd)/data/images --project /outputs"
    Publisher:
      Type: ipfs
    ResultPaths:
      - Name: outputs
        Path: /outputs
    InputSources:
      - Source:
          Type: "urlDownload"
          Params:
            URL: "https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt"
        Target: "/inputs"

Checking the State of your Jobs​

Job status: You can check the status of the job using bacalhau job list:

bacalhau job list --id-filter ${JOB_ID}

When it says Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau job describe:

bacalhau job describe ${JOB_ID}

Job download: You can download your job results directly by using bacalhau job get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.

rm -rf results && mkdir results
bacalhau job get ${JOB_ID} --output-dir results

Viewing Output​

After the download has finished we can see the results in the results/outputs/exp folder.

Using Custom Images as an Input​

Let's run a the same job again, but this time use the images above.

export JOB_ID=$(bacalhau docker run \
--gpu 1 \
--timeout 3600 \
--wait \
--wait-timeout-secs 3600 \
--id-only \
--input https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s.pt \
--input ipfs://bafybeicyuddgg4iliqzkx57twgshjluo2jtmlovovlx5lmgp5uoh3zrvpm:/datasets \
ultralytics/yolov5:v6.2 \
-- /bin/bash -c 'find /inputs -type f -exec cp {} /outputs/yolov5s.pt \; ; python detect.py --weights /outputs/yolov5s.pt --source /datasets --project /outputs')

Just as in the example above, this should output a UUID, which will be stored in the environment variable JOB_ID. You can check the status of the job using the commands below.

Support​

--input to select which pre-trained weights you desire with details on the

For more container flags refer to the yolov5/detect.py file in the .

Some of the jobs presented in the Examples section may require more resources than are currently available on the demo network. Consider or running less resource-intensive jobs on the demo network

The same job can be presented in the format. In this case, the description will look like this:

Now let's use some custom images. First, you will need to ingest your images onto IPFS or S3 storage. For more information about how to do that see the section.

This example will use the dataset.

We have already uploaded this dataset to the IPFS storage under the CID: bafybeicyuddgg4iliqzkx57twgshjluo2jtmlovovlx5lmgp5uoh3zrvpm. You can browse to this dataset via .

To check the state of the job and view job output refer to the .

If you have questions or need support or guidance, please reach out to the (#general channel).

YOLO (You Only Look Once)
by Redmond et al.
YOLOv5 Docker Image developed by Ultralytics.
here
Internet Access
yolov5 release page
YOLO repository
starting your own network
data ingestion
Cyclist Dataset for Object Detection | Kaggle
a HTTP IPFS proxy
Bacalhau team via Slack
declarative
instructions above