Note that in version v1.5.0
the configuration management approach was completely changed and certain limits were deprecated.
Check out the release notes to learn about all the changes in configuration management: CLI commands syntax and configuration files management.
These are the configuration keys that control the capacity of the Bacalhau node, and the limits for jobs that might be run.
Compute.AllocatedCapacity.CPU
Specifies the amount of CPU a compute node allocates for running jobs. It
can be expressed as a percentage (e.g., 85%
) or a Kubernetes resource string
Compute.AllocatedCapacity.Disk
Specifies the amount of Disk space a compute node allocates for running
jobs. It can be expressed as a percentage (e.g., 85%
) or a Kubernetes resource string (e.g., 10Gi
)
Compute.AllocatedCapacity.GPU
Specifies the amount of GPU a compute node allocates for running jobs. It can be expressed as a percentage (e.g., 85%
) or a Kubernetes resource string (e.g., 1
).
Note: When using percentages, the result is always rounded up to the nearest whole GPU
Compute.AllocatedCapacity.Memory
Specifies the amount of Memory a compute node allocates for running jobs. It can be expressed as a percentage (e.g., 85%
) or a Kubernetes resource string (e.g., 1Gi
)
It is also possible to additionally specify the number of resources to be allocated to each job by default, if the required number of resources is not specified in the job itself. JobDefaults.<
Job type
>.Task.Resources.<Resource Type>
configuration keys are used for this purpose. E.g. to provide each Ops job with 2Gb of RAM the following key is used: JobDefaults.Ops.Task.Resources.Memory
:
See the complete configuration keys list for more details.
Resource limits are not supported for Docker jobs running on Windows. Resource limits will be applied at the job bid stage based on reported job requirements but will be silently unenforced. Jobs will be able to access as many resources as requested at runtime.
Running a Windows-based node is not officially supported, so your mileage may vary. Some features (like resource limits) are not present in Windows-based nodes.
Bacalhau currently makes the assumption that all containers are Linux-based. Users of the Docker executor will need to manually ensure that their Docker engine is running and configured appropriately to support Linux containers, e.g. using the WSL-based backend.
Bacalhau can limit the total time a job spends executing. A job that spends too long executing will be cancelled, and no results will be published.
By default, a Bacalhau node does not enforce any limit on job execution time. Both node operators and job submitters can supply a maximum execution time limit. If a job submitter asks for a longer execution time than permitted by a node operator, their job will be rejected.
Applying job timeouts allows node operators to more fairly distribute the work submitted to their nodes. It also protects users from transient errors that result in their jobs waiting indefinitely.
Job submitters can pass the --timeout
flag to any Bacalhau job submission CLI to set a maximum job execution time. The supplied value should be a whole number of seconds with no unit.
The timeout can also be added to an existing job spec by adding the Timeout
property to the Spec
.
Node operators can use configuration keys to specify default and maximum job execution time limits. The supplied values should be a numeric value followed by a time unit (one of s
for seconds, m
for minutes or h
for hours).
Here is a list of the relevant properties:
JobDefaults.Batch.Task.Timeouts.ExecutionTimeout
Default value for batch job execution timeouts on your current compute node. It will be assigned to batch jobs with no timeout requirement defined
JobDefaults.Ops.Task.Timeouts.ExecutionTimeout
Default value for ops job execution timeouts on your current compute node. It will be assigned to ops jobs with no timeout requirement defined
JobDefaults.Batch.Task.Timeouts.TotalTimeout
Default value for the maximum execution timeout this compute node supports for batch jobs. Jobs with higher timeout requirements will not be bid on
JobDefaults.Ops.Task.Timeouts.TotalTimeout
Default value for the maximum execution timeout this compute node supports for ops jobs. Jobs with higher timeout requirements will not be bid on
Note, that timeouts can not be configured for Daemon and Service jobs.