To view the configuration that bacalhau will receive when a command is executed against it, users can run the command. Users who wish to see Bacalhau’s config represented as YAML may run bacalhau config list --output=yaml
.
In Bacalhau v1.5.0, there have been changes to how Bacalhau handles configuration:
The bacalhau repo ~/.bacalhau
is not the default for the Bacalhau config file.
Bacalhau searches for a default config file. The location is OS-dependent:
Linux: ~/.config/bacalhau/config.yaml
OSX: ~/.config/Application\ Support/bacalhau/config.yaml
Windows: $AppData\bacalhau\config.yaml
. Usually, this is something like C:\Users\username\bacalhau\config.yaml
As described above, bacalhau still has the concept of a default config file, which, for the sake of simplicity, we’ll say lives in ~/.config/bacalhau/config.yaml
. There are two ways this file can be modified:
A text editor vim ~/.config/bacalhau/config.yaml
.
The command.
The --config
(or -c
) flag allows flexible configuration of bacalhau through various methods. You can use this flag multiple times to combine different configuration sources. To specify a config file to bacalhau, users may use the --config
flag, passing a path to a config file for bacalhau to use. When this flag is provided bacalhau will not search for a default config, and will instead use the configuration provided to it by the --config
flag.
In Bacalhau, configuration keys are structured identifiers used to configure and customize the behavior of the application. They represent specific settings that control various aspects of Bacalhau's functionality, such as network parameters, API endpoints, node operations, and user interface options. The configuration file is organized in a tree-like structure using nested mappings (dictionaries) in YAML format. Each level of indentation represents a deeper level in the hierarchy.
Example: part of the config file
In this YAML configuration file:
Top-Level Keys (Categories): API
, Orchestrator
Sub-Level Keys (Subcategories): Under API
, we have Host
and Port
; Under Orchestrator
we have Host
, Port
and NodeManager
Leaf Nodes (Settings): Host
, Port
, NameProvider
, DataDir
, DisconnectTimeout
— these contain the actual configuration values.
Config keys use dot notation to represent the path from the root of the configuration hierarchy down to a specific leaf node. Each segment in the key corresponds to a level in the hierarchy. Syntax is Category.Subcategory(s)...LeafNode
config set
, config list
and --config
The bacalhau config list
returns all keys and their corresponding value. The bacalhau config set
command accepts a key and a value to set it to. The --config
flag accepts a key and a value that will be applied to Bacalhau when it runs.
API Host
Using bacalhau config set
in the Default Config File:Run bacalhau config list
to find the appropriate key
Run the bacalhau config set
command
Observe how bacalhau config list
reflects the new setting
Observe the change has been reflected in the default config file
How to Modify the API Host Using bacalhau config set
a Custom Config File
Run the config set command with the flag
Observe the created config file
Observe the default config and output of bacalhau config list
does not reflect this change.
How to Start Bacalhau With a Custom Config File
--config
Flag The --config
(or -c
) flag allows flexible configuration of bacalhau through various methods. You can use this flag multiple times to combine different configuration sources.
or using the short form:
YAML Config Files: Specify paths to YAML configuration files. Example:
Key-Value Pairs: Set specific configuration values using dot notation. Example:
Boolean Flags: Enable boolean options by specifying the key alone. Example:
When multiple configuration options are provided, they are applied in the following order of precedence (highest to lowest):
Command-line key-value pairs and boolean flags
YAML configuration files
Default values
Within each category, options specified later override earlier ones.
Using a single config file:
Merging multiple config files:
Overriding specific values:
Combining file and multiple overrides:
In the last example, WebUI.Enabled
will be set to true
, API.Host
will be 192.168.1.5
, and other values will be loaded from config.yaml
if present.
Remember, later options override earlier ones, allowing for flexible configuration management.
bacalhau completion
CommandThe bacalhau completion
command will generate shell completion for your shell. You can use the command like:
After running the above command, commands like bacalhau config set
and bacalhau --config
will have auto-completion for all possible configuration values along with their descriptions
If you have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#general channel).
How to write the config.yaml file to configure your nodes
On installation, Bacalhau creates a .bacalhau
directory that includes a config.yaml
file tailored for your specific settings. This configuration file is the central repository for custom settings for your Bacalhau nodes.
When initializing a Bacalhau node, the system determines its configuration by following a specific hierarchy. First, it checks the default settings, then the config.yaml
file, followed by environment variables, and finally, any command line flags specified during execution. Configurations are set and overridden in that sequence. This layered approach allows the default Bacalhau settings to provide a baseline, while environment variables and command-line flags offer added flexibility. However, the config.yaml
file offers a reliable way to predefine all necessary settings before node creation across environments, ensuring consistency and ease of management.
Modifications to the config.yaml
file are not dynamically applied to existing nodes. A restart of the Bacalhau node is required for any changes to take effect.
Your config.yaml
file starts off empty. However, you can see all available settings using the following command
This command showcases over a hundred configuration parameters related to users, security, metrics, updates, and node configuration, providing a comprehensive overview of the customization options available for your Bacalhau setup.
Let’s go through the different options and how your configuration file is structured.
The bacalhau config list
command displays your configuration paths, segmented with periods to indicate each part you are configuring.
Consider these configuration settings: NameProvider
and Labels
. These settings help set name and labels for your Bacalhau node.
In your config.yaml
, these settings will be formatted like this:
Here are your Bacalhau configuration options in alphabetical order:
API.Auth.AccessPolicyPath
AccessPolicyPath is the path to a file or directory that will be loaded as the policy to apply to all inbound API requests. If unspecified, a policy that permits access to all API endpoints to both authenticated and unauthenticated users (the default as of v1.2.0) will be used.
API.Auth.Methods
Methods maps "method names" to authenticator implementations. A method name is a human-readable string chosen by the person configuring the system that is shown to users to help them pick the authentication method they want to use. There can be multiple usages of the same Authenticator type but with different configs and parameters, each identified with a unique method name. For example, if an implementation wants to allow users to log in with Github or Bitbucket, they might both use an authenticator implementation of type "oidc", and each would appear once on this provider with key / method name "github" and "bitbucket". By default, only a single authentication method that accepts authentication via client keys will be enabled.
API.Host
Host specifies the hostname or IP address on which the API server listens or the client connects.
API.Port
Port specifies the port number on which the API server listens or the client connects.
API.TLS.AutoCert
AutoCert specifies the domain for automatic certificate generation.
API.TLS.AutoCertCachePath
AutoCertCachePath specifies the directory to cache auto-generated certificates.
API.TLS.CAFile
CAFile specifies the path to the Certificate Authority file.
API.TLS.CertFile
CertFile specifies the path to the TLS certificate file.
API.TLS.Insecure
Insecure allows insecure TLS connections (e.g., self-signed certificates).
API.TLS.KeyFile
KeyFile specifies the path to the TLS private key file.
API.TLS.SelfSigned
SelfSigned indicates whether to use a self-signed certificate.
API.TLS.UseTLS
UseTLS indicates whether to use TLS for client connections.
Compute.AllocatedCapacity.CPU
CPU specifies the amount of CPU a compute node allocates for running jobs. It can be expressed as a percentage (e.g., "85%") or a Kubernetes resource string (e.g., "100m").
Compute.AllocatedCapacity.Disk
Disk specifies the amount of Disk space a compute node allocates for running jobs. It can be expressed as a percentage (e.g., "85%") or a Kubernetes resource string (e.g., "10Gi").
Compute.AllocatedCapacity.GPU
GPU specifies the amount of GPU a compute node allocates for running jobs. It can be expressed as a percentage (e.g., "85%") or a Kubernetes resource string (e.g., "1"). Note: When using percentages, the result is always rounded up to the nearest whole GPU.
Compute.AllocatedCapacity.Memory
Memory specifies the amount of Memory a compute node allocates for running jobs. It can be expressed as a percentage (e.g., "85%") or a Kubernetes resource string (e.g., "1Gi").
Compute.AllowListedLocalPaths
AllowListedLocalPaths specifies a list of local file system paths that the compute node is allowed to access.
Compute.Auth.Token
Token specifies the key for compute nodes to be able to access the orchestrator.
Compute.Enabled
Enabled indicates whether the compute node is active and available for job execution.
Compute.Heartbeat.InfoUpdateInterval
InfoUpdateInterval specifies the time between updates of non-resource information to the orchestrator.
Compute.Heartbeat.Interval
Interval specifies the time between heartbeat signals sent to the orchestrator.
Compute.Heartbeat.ResourceUpdateInterval
ResourceUpdateInterval specifies the time between updates of resource information to the orchestrator.
Compute.Orchestrators
Orchestrators specifies a list of orchestrator endpoints that this compute node connects to.
Compute.TLS.CACert
CACert specifies the CA file path that the compute node trusts when connecting to orchestrator.
Compute.TLS.RequireTLS
RequireTLS specifies if the compute node enforces encrypted communication with orchestrator.
DataDir
DataDir specifies a location on disk where the bacalhau node will maintain state.
DisableAnalytics
DisableAnalytics, when true, disables sharing anonymous analytics data with the Bacalhau development team
Engines.Disabled
Disabled specifies a list of engines that are disabled.
Engines.Types.Docker.ManifestCache.Refresh
Refresh specifies the refresh interval for cache entries.
Engines.Types.Docker.ManifestCache.Size
Size specifies the size of the Docker manifest cache.
Engines.Types.Docker.ManifestCache.TTL
TTL specifies the time-to-live duration for cache entries.
InputSources.Disabled
Disabled specifies a list of storages that are disabled.
InputSources.MaxRetryCount
ReadTimeout specifies the maximum number of attempts for reading from a storage.
InputSources.ReadTimeout
ReadTimeout specifies the maximum time allowed for reading from a storage.
InputSources.Types.IPFS.Endpoint
Endpoint specifies the multi-address to connect to for IPFS. e.g /ip4/127.0.0.1/tcp/5001
JobAdmissionControl.AcceptNetworkedJobs
AcceptNetworkedJobs indicates whether to accept jobs that require network access.
JobAdmissionControl.Locality
Locality specifies the locality of the job input data.
JobAdmissionControl.ProbeExec
ProbeExec specifies the command to execute for probing job submission.
JobAdmissionControl.ProbeHTTP
ProbeHTTP specifies the HTTP endpoint for probing job submission.
JobAdmissionControl.RejectStatelessJobs
RejectStatelessJobs indicates whether to reject stateless jobs, i.e. jobs without inputs.
JobDefaults.Batch.Priority
Priority specifies the default priority allocated to a batch or ops job. This value is used when the job hasn't explicitly set its priority requirement.
JobDefaults.Batch.Task.Publisher.Params
Params specifies the publisher configuration data.
JobDefaults.Batch.Task.Publisher.Type
Type specifies the publisher type. e.g. "s3", "local", "ipfs", etc.
JobDefaults.Batch.Task.Resources.CPU
CPU specifies the default amount of CPU allocated to a task. It uses Kubernetes resource string format (e.g., "100m" for 0.1 CPU cores). This value is used when the task hasn't explicitly set its CPU requirement.
JobDefaults.Batch.Task.Resources.Disk
Disk specifies the default amount of disk space allocated to a task. It uses Kubernetes resource string format (e.g., "1Gi" for 1 gibibyte). This value is used when the task hasn't explicitly set its disk space requirement.
JobDefaults.Batch.Task.Resources.GPU
GPU specifies the default number of GPUs allocated to a task. It uses Kubernetes resource string format (e.g., "1" for 1 GPU). This value is used when the task hasn't explicitly set its GPU requirement.
JobDefaults.Batch.Task.Resources.Memory
Memory specifies the default amount of memory allocated to a task. It uses Kubernetes resource string format (e.g., "256Mi" for 256 mebibytes). This value is used when the task hasn't explicitly set its memory requirement.
JobDefaults.Batch.Task.Timeouts.ExecutionTimeout
ExecutionTimeout is the maximum time allowed for task execution
JobDefaults.Batch.Task.Timeouts.TotalTimeout
TotalTimeout is the maximum total time allowed for a task
JobDefaults.Daemon.Priority
Priority specifies the default priority allocated to a service or daemon job. This value is used when the job hasn't explicitly set its priority requirement.
JobDefaults.Daemon.Task.Resources.CPU
CPU specifies the default amount of CPU allocated to a task. It uses Kubernetes resource string format (e.g., "100m" for 0.1 CPU cores). This value is used when the task hasn't explicitly set its CPU requirement.
JobDefaults.Daemon.Task.Resources.Disk
Disk specifies the default amount of disk space allocated to a task. It uses Kubernetes resource string format (e.g., "1Gi" for 1 gibibyte). This value is used when the task hasn't explicitly set its disk space requirement.
JobDefaults.Daemon.Task.Resources.GPU
GPU specifies the default number of GPUs allocated to a task. It uses Kubernetes resource string format (e.g., "1" for 1 GPU). This value is used when the task hasn't explicitly set its GPU requirement.
JobDefaults.Daemon.Task.Resources.Memory
Memory specifies the default amount of memory allocated to a task. It uses Kubernetes resource string format (e.g., "256Mi" for 256 mebibytes). This value is used when the task hasn't explicitly set its memory requirement.
JobDefaults.Ops.Priority
Priority specifies the default priority allocated to a batch or ops job. This value is used when the job hasn't explicitly set its priority requirement.
JobDefaults.Ops.Task.Publisher.Params
Params specifies the publisher configuration data.
JobDefaults.Ops.Task.Publisher.Type
Type specifies the publisher type. e.g. "s3", "local", "ipfs", etc.
JobDefaults.Ops.Task.Resources.CPU
CPU specifies the default amount of CPU allocated to a task. It uses Kubernetes resource string format (e.g., "100m" for 0.1 CPU cores). This value is used when the task hasn't explicitly set its CPU requirement.
JobDefaults.Ops.Task.Resources.Disk
Disk specifies the default amount of disk space allocated to a task. It uses Kubernetes resource string format (e.g., "1Gi" for 1 gibibyte). This value is used when the task hasn't explicitly set its disk space requirement.
JobDefaults.Ops.Task.Resources.GPU
GPU specifies the default number of GPUs allocated to a task. It uses Kubernetes resource string format (e.g., "1" for 1 GPU). This value is used when the task hasn't explicitly set its GPU requirement.
JobDefaults.Ops.Task.Resources.Memory
Memory specifies the default amount of memory allocated to a task. It uses Kubernetes resource string format (e.g., "256Mi" for 256 mebibytes). This value is used when the task hasn't explicitly set its memory requirement.
JobDefaults.Ops.Task.Timeouts.ExecutionTimeout
ExecutionTimeout is the maximum time allowed for task execution
JobDefaults.Ops.Task.Timeouts.TotalTimeout
TotalTimeout is the maximum total time allowed for a task
JobDefaults.Service.Priority
Priority specifies the default priority allocated to a service or daemon job. This value is used when the job hasn't explicitly set its priority requirement.
JobDefaults.Service.Task.Resources.CPU
CPU specifies the default amount of CPU allocated to a task. It uses Kubernetes resource string format (e.g., "100m" for 0.1 CPU cores). This value is used when the task hasn't explicitly set its CPU requirement.
JobDefaults.Service.Task.Resources.Disk
Disk specifies the default amount of disk space allocated to a task. It uses Kubernetes resource string format (e.g., "1Gi" for 1 gibibyte). This value is used when the task hasn't explicitly set its disk space requirement.
JobDefaults.Service.Task.Resources.GPU
GPU specifies the default number of GPUs allocated to a task. It uses Kubernetes resource string format (e.g., "1" for 1 GPU). This value is used when the task hasn't explicitly set its GPU requirement.
JobDefaults.Service.Task.Resources.Memory
Memory specifies the default amount of memory allocated to a task. It uses Kubernetes resource string format (e.g., "256Mi" for 256 mebibytes). This value is used when the task hasn't explicitly set its memory requirement.
Labels
Labels are key-value pairs used to describe and categorize the nodes.
Logging.Level
Level sets the logging level. One of: trace, debug, info, warn, error, fatal, panic.
Logging.LogDebugInfoInterval
LogDebugInfoInterval specifies the interval for logging debug information.
Logging.Mode
Mode specifies the logging mode. One of: default, json.
NameProvider
NameProvider specifies the method used to generate names for the node. One of: hostname, aws, gcp, uuid, puuid.
Orchestrator.Advertise
Advertise specifies URL to advertise to other servers.
Orchestrator.Auth.Token
Token specifies the key for compute nodes to be able to access the orchestrator
Orchestrator.Cluster.Advertise
Advertise specifies the address to advertise to other cluster members.
Orchestrator.Cluster.Host
Host specifies the hostname or IP address for cluster communication.
Orchestrator.Cluster.Name
Name specifies the unique identifier for this orchestrator cluster.
Orchestrator.Cluster.Peers
Peers is a list of other cluster members to connect to on startup.
Orchestrator.Cluster.Port
Port specifies the port number for cluster communication.
Orchestrator.Enabled
Enabled indicates whether the orchestrator node is active and available for job submission.
Orchestrator.EvaluationBroker.MaxRetryCount
MaxRetryCount specifies the maximum number of times an evaluation can be retried before being marked as failed.
Orchestrator.EvaluationBroker.VisibilityTimeout
VisibilityTimeout specifies how long an evaluation can be claimed before it's returned to the queue.
Orchestrator.Host
Host specifies the hostname or IP address on which the Orchestrator server listens for compute node connections.
Orchestrator.NodeManager.DisconnectTimeout
DisconnectTimeout specifies how long to wait before considering a node disconnected.
Orchestrator.NodeManager.ManualApproval
ManualApproval, if true, requires manual approval for new compute nodes joining the cluster.
Orchestrator.Port
Host specifies the port number on which the Orchestrator server listens for compute node connections.
Orchestrator.Scheduler.HousekeepingInterval
HousekeepingInterval specifies how often to run housekeeping tasks.
Orchestrator.Scheduler.HousekeepingTimeout
HousekeepingTimeout specifies the maximum time allowed for a single housekeeping run.
Orchestrator.Scheduler.QueueBackoff
QueueBackoff specifies the time to wait before retrying a failed job.
Orchestrator.Scheduler.WorkerCount
WorkerCount specifies the number of concurrent workers for job scheduling.
Orchestrator.SupportReverseProxy
SupportReverseProxy configures the orchestrator node to run behind a reverse proxy
Orchestrator.TLS.CACert
CACert specifies the CA file path that the orchestrator node trusts when connecting to NATS server.
Orchestrator.TLS.ServerCert
ServerCert specifies the certificate file path given to NATS server to serve TLS connections.
Orchestrator.TLS.ServerKey
ServerKey specifies the private key file path given to NATS server to serve TLS connections.
Orchestrator.TLS.ServerTimeout
ServerTimeout specifies the TLS timeout, in seconds, set on the NATS server.
Publishers.Disabled
Disabled specifies a list of publishers that are disabled.
Publishers.Types.IPFS.Endpoint
Endpoint specifies the multi-address to connect to for IPFS. e.g /ip4/127.0.0.1/tcp/5001
Publishers.Types.Local.Address
Address specifies the endpoint the publisher serves on.
Publishers.Types.Local.Port
Port specifies the port the publisher serves on.
Publishers.Types.S3.PreSignedURLDisabled
PreSignedURLDisabled specifies whether pre-signed URLs are enabled for the S3 provider.
Publishers.Types.S3.PreSignedURLExpiration
PreSignedURLExpiration specifies the duration before a pre-signed URL expires.
ResultDownloaders.Disabled
Disabled is a list of downloaders that are disabled.
ResultDownloaders.Timeout
Timeout specifies the maximum time allowed for a download operation.
ResultDownloaders.Types.IPFS.Endpoint
Endpoint specifies the multi-address to connect to for IPFS. e.g /ip4/127.0.0.1/tcp/5001
StrictVersionMatch
StrictVersionMatch indicates whether to enforce strict version matching.
UpdateConfig.Interval
Interval specifies the time between update checks, when set to 0 update checks are not performed.
WebUI.Backend
Backend specifies the address and port of the backend API server. If empty, the Web UI will use the same address and port as the API server.
WebUI.Enabled
Enabled indicates whether the Web UI is enabled.
WebUI.Listen
Listen specifies the address and port on which the Web UI listens.