Bacalhau Docs
GithubSlackBlogEnterprise
v1.6.x
  • Documentation
  • Use Cases
  • CLI & API
  • References
  • Community
v1.6.x
  • Welcome
  • Getting Started
    • How Bacalhau Works
    • Getting Started
      • Step 1: Install the Bacalhau CLI
      • Step 2: Running Your Own Job
      • Step 3: Checking on the Status of Your Job
    • Creating Your Own Bacalhau Network
      • Setting Up a Cluster on Amazon Web Services (AWS) with Terraform 🚀
      • Setting Up a Cluster on Google Cloud Platform (GCP) With Terraform 🚀
      • Setting Up a Cluster on Azure with Terraform 🚀
    • Hardware Setup
    • Container Onboarding
      • Docker Workloads
      • WebAssembly (Wasm) Workloads
  • Setting Up
    • Running Nodes
      • Node Onboarding
      • GPU Installation
      • Job selection policy
      • Access Management
      • Node persistence
      • Configuring Your Input Sources
      • Configuring Transport Level Security
      • Limits and Timeouts
      • Test Network Locally
      • Bacalhau WebUI
      • Private IPFS Network Setup
    • Workload Onboarding
      • Container
        • Docker Workload Onboarding
        • WebAssembly (Wasm) Workloads
        • Bacalhau Docker Image
        • How To Work With Custom Containers in Bacalhau
      • Python
        • Building and Running Custom Python Container
        • Running Pandas on Bacalhau
        • Running a Python Script
        • Running Jupyter Notebooks on Bacalhau
        • Scripting Bacalhau with Python
      • R (language)
        • Building and Running your Custom R Containers on Bacalhau
        • Running a Simple R Script on Bacalhau
      • Run CUDA programs on Bacalhau
      • Running a Prolog Script
      • Reading Data from Multiple S3 Buckets using Bacalhau
      • Running Rust programs as WebAssembly (WASM)
      • Generate Synthetic Data using Sparkov Data Generation technique
    • Networking Instructions
      • Accessing the Internet from Jobs
      • Utilizing NATS.io within Bacalhau
    • GPU Workloads Setup
    • Automatic Update Checking
    • Marketplace Deployments
      • Google Cloud Marketplace
    • Inter-Nodes TLS
  • Guides
    • Configuration Management
    • Write a config.yaml
    • Write a SpecConfig
    • Using Labels and Constraints
  • Examples
    • Table of Contents for Bacalhau Examples
    • Data Engineering
      • Using Bacalhau with DuckDB
      • Ethereum Blockchain Analysis with Ethereum-ETL and Bacalhau
      • Convert CSV To Parquet Or Avro
      • Simple Image Processing
      • Oceanography - Data Conversion
      • Video Processing
      • Bacalhau and BigQuery
    • Data Ingestion
      • Copy Data from URL to Public Storage
      • Pinning Data
      • Running a Job over S3 data
    • Model Inference
      • EasyOCR (Optical Character Recognition) on Bacalhau
      • Running Inference on Dolly 2.0 Model with Hugging Face
      • Speech Recognition using Whisper
      • Stable Diffusion on a GPU
      • Stable Diffusion on a CPU
      • Object Detection with YOLOv5 on Bacalhau
      • Generate Realistic Images using StyleGAN3 and Bacalhau
      • Stable Diffusion Checkpoint Inference
      • Running Inference on a Model stored on S3
    • Model Training
      • Training Pytorch Model with Bacalhau
      • Training Tensorflow Model
      • Stable Diffusion Dreambooth (Finetuning)
    • Molecular Dynamics
      • Running BIDS Apps on Bacalhau
      • Coresets On Bacalhau
      • Genomics Data Generation
      • Gromacs for Analysis
      • Molecular Simulation with OpenMM and Bacalhau
    • Systems Engineering
      • Ad-hoc log query using DuckDB
  • References
    • Jobs Guide
      • Job Specification
        • Job Types
        • Task Specification
          • Engines
            • Docker Engine Specification
            • WebAssembly (WASM) Engine Specification
          • Publishers
            • IPFS Publisher Specification
            • Local Publisher Specification
            • S3 Publisher Specification
          • Sources
            • IPFS Source Specification
            • Local Source Specification
            • S3 Source Specification
            • URL Source Specification
          • Network Specification
          • Input Source Specification
          • Resources Specification
          • ResultPath Specification
        • Constraint Specification
        • Labels Specification
        • Meta Specification
      • Job Templates
      • Queuing & Timeouts
        • Job Queuing
        • Timeouts Specification
      • Job Results
        • State
    • CLI Guide
      • Single CLI commands
        • Agent
          • Agent Overview
          • Agent Alive
          • Agent Node
          • Agent Version
        • Config
          • Config Overview
          • Config Auto-Resources
          • Config Default
          • Config List
          • Config Set
        • Job
          • Job Overview
          • Job Describe
          • Job Executions
          • Job History
          • Job List
          • Job Logs
          • Job Run
          • Job Stop
        • Node
          • Node Overview
          • Node Approve
          • Node Delete
          • Node List
          • Node Describe
          • Node Reject
      • Command Migration
    • API Guide
      • Bacalhau API overview
      • Best Practices
      • Agent Endpoint
      • Orchestrator Endpoint
      • Migration API
    • Node Management
    • Authentication & Authorization
    • Database Integration
    • Debugging
      • Debugging Failed Jobs
      • Debugging Locally
    • Running Locally In Devstack
    • Setting up Dev Environment
  • Help & FAQ
    • Bacalhau FAQs
    • Glossary
    • Release Notes
      • v1.5.0 Release Notes
      • v1.4.0 Release Notes
  • Integrations
    • Apache Airflow Provider for Bacalhau
    • Lilypad
    • Bacalhau Python SDK
    • Observability for WebAssembly Workloads
  • Community
    • Social Media
    • Style Guide
    • Ways to Contribute
Powered by GitBook
LogoLogo

Use Cases

  • Distributed ETL
  • Edge ML
  • Distributed Data Warehousing
  • Fleet Management

About Us

  • Who we are
  • What we value

News & Blog

  • Blog

Get Support

  • Request Enterprise Solutions

Expanso (2025). All Rights Reserved.

On this page
  • Understanding Labels and Constraints
  • Label Configuration
  • Constraint Operators
  • Job Submission Patterns
  • Label Management Best Practices
  • Label Inheritance and Templates
  • Maintenance and Operations
  • Security and Compliance
  • Advanced Use Cases
  • Troubleshooting Common Issues
  • Conclusion

Was this helpful?

Export as PDF
  1. Guides

Using Labels and Constraints

This guide provides a comprehensive overview of Bacalhau's label and constraint system, which enables fine-grained control over job scheduling and resource allocation.

Understanding Labels and Constraints

Labels in Bacalhau are key-value pairs attached to nodes that describe their characteristics, capabilities, and properties. Constraints are rules you define when submitting jobs to ensure they run on nodes with specific labels.

Label Configuration

Command Line Configuration

Labels are defined when starting a Bacalhau node using the -c Labels flag:

bacalhau serve -c Labels="env=prod,gpu=true,arch=x64"

Configuration File

You can also define labels in a YAML configuration file:

# config.yaml
labels:
  env: prod
  gpu: true
  arch: x64
  region: us-west

Then start the node with:

bacalhau serve --config-file config.yaml

Verifying Labels

Check node labels using:

bacalhau node list

Constraint Operators

Bacalhau supports various operators for precise node selection:

Operator
Example
Description

=

region=us-east

Exact match

!=

env!=staging

Not equal

exists

gpu

Key exists

!

!temporary

Key doesn't exist

in

zone in (a,b,c)

Value in set

gt

mem-gb gt 32

Greater than

lt

cpu-cores lt 16

Less than

Job Submission Patterns

Basic Constraint Usage

Here are common patterns for submitting jobs with constraints:

# Single constraint
bacalhau docker run --constraints "env=prod" alpine

# Multiple constraints
bacalhau docker run \
  --constraints "env=prod" \
  --constraints "gpu=true" \
  nvidia/cuda:11.0-base nvidia-smi

# Data processing with specific architecture requirements
bacalhau docker run \
  --constraints "arch in (x64,arm64)" \
  --constraints "mem-gb gt 16" \
  --constraints "storage-tier!=hdd" \
  my-data-processing-job

# Common failure scenarios
bacalhau docker run --constraints "disk=ssd" alpine echo "failed"  # No SSD nodes
bacalhau docker run --constraints "cpu-cores gt 64" alpine echo "failed"  # Insufficient CPU

Environment-Specific Patterns

# Production workloads
bacalhau run --constraints "env=prod,data-tier=hot" spark-job

# Development/testing
bacalhau run --constraints "env=dev" test-runner

# Geographic requirements
bacalhau run --constraints "region=eu,compliance=gdpr" data-processor

# Multi-zone deployments
bacalhau run --constraints "zone in (us-east-1a,us-east-1b)" ha-service

### Hardware-Specific Patterns

```bash
# GPU workloads
bacalhau run \
  --constraints "gpu-model=a100" \
  --constraints "gpu-count gt 1" \
  llm-training

# High-memory workloads
bacalhau run --constraints "mem-gb gt 64" in-memory-db

### Advanced Resource Patterns

```bash
# Resource requirements
bacalhau docker run \
  --constraints "mem-gb gt 16" \
  --constraints "cpu-cores gt 4" \
  --constraints "gpu-count gt 1" \
  heavy-workload

# Geographic constraints
bacalhau docker run \
  --constraints "region=eu-west" \
  --constraints "zone in (a,b,c)" \
  geo-specific-job

Label Management Best Practices

Naming Conventions

Follow these patterns for consistent label naming:

  • Use lowercase alphanumeric characters

  • Separate words with hyphens

  • Use descriptive prefixes for categorization

Examples:

team-ml-gpu
env-prod-tier1
storage-ssd-nvme

Label Hierarchies

Organize labels hierarchically for better management:

# Parent node
bacalhau serve -c Labels="tier=core,env=prod"

# Specialized child node
bacalhau serve -c Labels="tier=edge,env=prod,gpu=true"

Label Inheritance and Templates

Dynamic Label Assignment

# Timestamped labels for rotation
bacalhau serve -c Labels="deploy-group=$(date +%Y-%m)"

# Environment-based inheritance
bacalhau serve -c Labels="infra-tier=core,env=prod"
bacalhau serve -c Labels="infra-tier=edge,env=prod,gpu=true"

Constraint Composition

# AND logic (all must match)
bacalhau run \
  --constraints "storage=ssd" \
  --constraints "cpu-arch=x64" \
  high-performance-job

# OR logic with value lists
bacalhau run \
  --constraints "zone in (us-east1,us-west2)" \
  multi-region-job

# Exclusion patterns
bacalhau run \
  --constraints "maintenance!=true" \
  time-sensitive-job

Maintenance and Operations

Node Updates

# Exclude maintenance nodes
bacalhau run --constraints "!maintenance" critical-job

Monitoring and Troubleshooting

# List all node labels
bacalhau node list --output json | jq '.[] | .Labels'

# Check job constraint matches
bacalhau job describe JOB_ID --include-events

Security and Compliance

Secure Workload Placement

# Ensure compliance requirements
bacalhau run \
  --constraints "security=hipaa" \
  --constraints "encryption=enabled" \
  sensitive-data-job

# Network isolation
bacalhau run \
  --constraints "network=private" \
  --constraints "public-access=false" \
  internal-job

Advanced Use Cases

Multi-Dimensional Constraints

# Complex GPU requirements
bacalhau run \
  --constraints "gpu-brand=nvidia" \
  --constraints "gpu-mem-gb gt 24" \
  --constraints "gpu-count >= 2" \
  rendering-job

# Security-hardened environments
bacalhau run \
  --constraints "security-profile=hipaa" \
  --constraints "encryption=on" \
  sensitive-data-job

### Resource Optimization

```bash
# Cost-optimized scheduling
bacalhau run \
  --constraints "instance-type=spot" \
  --constraints "cost-tier=low" \
  batch-job

# Performance optimization
bacalhau run \
  --constraints "storage-type=nvme" \
  --constraints "network-speed gt 10" \
  latency-sensitive-job

Multi-team Coordination

# Label deprecation management
bacalhau serve -c Labels="legacy-system=phase-out,retirement-date=2025-Q1"

# Team resource allocation
bacalhau run \
  --constraints "team=(data,research)" \
  --constraints "project=genomics-2024" \
  shared-resource-job

# Validation workflows
bacalhau job validate --constraints "gpu-type=a100" job-spec.yaml

### Capacity Planning

```bash
# Shared resource access
bacalhau run \
  --constraints "team in (research,engineering)" \
  --constraints "project=genomics-2024" \
  shared-resource-job

Troubleshooting Common Issues

No Matching Nodes

If your job fails with no matching nodes:

  1. Check available nodes and their labels:

    bacalhau node list --output json
  2. Verify your constraints aren't too restrictive:

    # Instead of
    --constraints "mem-gb gt 128"
    # Try
    --constraints "mem-gb gt 64"
  3. Ensure required nodes are online:

    bacalhau node list --labels "required-label=value"

Label Updates Not Taking Effect

Remember that label changes require node restarts. After updating labels:

  1. Gracefully stop the node

  2. Apply new configuration

  3. Restart the node

  4. Verify labels with bacalhau node list

Conclusion

Effective use of Bacalhau's label and constraint system enables precise control over workload placement and resource utilization. Follow these best practices:

  1. Use consistent naming conventions

  2. Document your label taxonomy

  3. Regularly audit and clean up unused labels

  4. Test constraints before production deployment

  5. Monitor constraint patterns for optimization opportunities

For additional support, consult the Bacalhau documentation or community resources.

PreviousWrite a SpecConfigNextTable of Contents for Bacalhau Examples

Was this helpful?