This guide provides a comprehensive overview of Bacalhau's label and constraint system, which enables fine-grained control over job scheduling and resource allocation.
Labels in Bacalhau are key-value pairs attached to nodes that describe their characteristics, capabilities, and properties. Constraints are rules you define when submitting jobs to ensure they run on nodes with specific labels.
Labels are defined when starting a Bacalhau node using the -c Labels
flag:
You can also define labels in a YAML configuration file:
Then start the node with:
Check node labels using:
Bacalhau supports various operators for precise node selection:
=
region=us-east
Exact match
!=
env!=staging
Not equal
exists
gpu
Key exists
!
!temporary
Key doesn't exist
in
zone in (a,b,c)
Value in set
gt
mem-gb gt 32
Greater than
lt
cpu-cores lt 16
Less than
Here are common patterns for submitting jobs with constraints:
Follow these patterns for consistent label naming:
Use lowercase alphanumeric characters
Separate words with hyphens
Use descriptive prefixes for categorization
Examples:
Organize labels hierarchically for better management:
If your job fails with no matching nodes:
Check available nodes and their labels:
Verify your constraints aren't too restrictive:
Ensure required nodes are online:
Remember that label changes require node restarts. After updating labels:
Gracefully stop the node
Apply new configuration
Restart the node
Verify labels with bacalhau node list
Effective use of Bacalhau's label and constraint system enables precise control over workload placement and resource utilization. Follow these best practices:
Use consistent naming conventions
Document your label taxonomy
Regularly audit and clean up unused labels
Test constraints before production deployment
Monitor constraint patterns for optimization opportunities
For additional support, consult the Bacalhau documentation or community resources.
Efficient job management and resource optimization are significant considerations. In our continued effort to support scalable distributed computing and data processing, we are excited to introduce job queuing in Bacalhau v1.4.0
.
The Job Queuing feature was only added to the Bacalhau in version 1.4 and is not supported in previous versions. Consider upgrading to the latest version to optimize resource usage with Job Queuing.
Job Queuing allows to deal with the situation when there are no suitable nodes available on the network to execute a job. In this case, a user-defined period of time can be configured for the job, during which the job will wait for suitable nodes to become available or free in the network. This feature enables better flexibility and reliability in managing your distributed workloads.
The job queuing feature is not automatically enabled, and it needs to be explicitly set in your or requester node using the QueueTimeout
parameter. This parameter activates the queuing feature and defines the amount of time your job should wait for available nodes in the network.
Node availability in your network is determined by capacity as well as job constraints such as label selectors, engines or publishers. For example, jobs will be queued if all nodes are currently busy, as well as if idle nodes do not match parameters in your job specification.
Bacalhau compute nodes regularly update their node, resource and health information every 30 seconds to the requester nodes in the network. During this update period, multiple jobs may be allocated to a node, oversubscribing and potentially exceeding its immediate available capacity. A local job queue is created at the compute node, efficiently handling the high demand as resources become available over time.
At the requester node level, you can set default queuing behavior for all jobs by defining the QueueTimeout
parameter in the node's configuration file. Alternatively, within the job specification, you can include the QueueTimeout
parameter directly in the configuration YAML. This flexibility allows you to tailor the queuing behavior to meet the specific needs of your distributed computing environment, ensuring that jobs are efficiently managed and resources are optimally utilized.
Here’s an example requester node configuration that sets the default job queuing time for an hour
The QueueBackoff
parameter determines the interval between retry attempts by the requester node to assign queued jobs.
Here’s a sample job specification setting the QueueTimeout
for this specific job, overwriting any node defaults.
You can also define timeouts for your jobs directly through the CLI using the --queue-timeout
flag. This method provides a convenient way to specify queuing behavior on a per-job basis, allowing you to manage job execution dynamically without modifying configuration files.
For example, here is how you can submit a job with a specified queue timeout using the CLI:
Timeouts in Bacalhau are generally governed by the TotalTimeout
value for your yaml specifications and the --timeout
flag for your CLI commands. The default total timeout value is 30 minutes. Declaring any queue timeout that is larger than that without changing the total timeout value will result in a validation error.
Jobs will be queued when all available nodes are busy and when there is no node that matches your job specifications. Let’s take a look at how queuing will be executed within your network.
Queued Jobs will initially display the Queued
status. Using the bacalhau job describe
command will showcase both the state of the job and the reason behind queuing.
For busy nodes:
For no matching nodes in the network:
Once appropriate node resources become available, these jobs will transition to either a Running
or Completed
status, allowing more jobs to be assigned to matching nodes.
As Bacalhau continues to evolve, our commitment to making distributed computing and data processing more accessible and efficient remains strong. We want to hear what you think about this feature so that we can make Bacalhau better and meet all the diverse needs and requirements of you, our users.
For questions, feedback, please reach out in our Slack.