How to write the config.yaml file to configure your nodes
On installation, Bacalhau creates a `.bacalhau` directory that includes a `config.yaml` file tailored for your specific settings. This configuration file is the central repository for custom settings for your Bacalhau nodes.
When initializing a Bacalhau node, the system determines its configuration by following a specific hierarchy. First, it checks the default settings, then the `config.yaml` file, followed by environment variables, and finally, any command line flags specified during execution. Configurations are set and overridden in that sequence. This layered approach allows the default Bacalhau settings to provide a baseline, while environment variables and command-line flags offer added flexibility. However, the `config.yaml` file offers a reliable way to predefine all necessary settings before node creation across environments, ensuring consistency and ease of management.
Modifications to the `config.yaml` file are not dynamically applied to existing nodes. A restart of the Bacalhau node is required for any changes to take effect.
Your `config.yaml` file starts off empty. However, you can see all available settings using the following command
This command showcases over a hundred configuration parameters related to users, security, metrics, updates, and node configuration, providing a comprehensive overview of the customization options available for your Bacalhau setup.
Let’s go through the different options and how your configuration file is structured.
The `bacalhau config list` command displays your configuration paths, segmented with periods to indicate each part you are configuring.
Consider these configuration settings; `user.installationid
`, `node.name
`, `node.compute.executionstore.path
`, `node.compute.executionstore.type
`, `node.requester.jobstore.type
`, and `node.requester.jobstore.path
`. These settings help set an identifier tag for your Bacalhau user and node then establish storage options for your jobs and execution results.
In your `config.yaml`, these settings will be formatted like this:
Your yaml file hierarchy follows the period delineation - node > compute > executionstore > type for `node.compute.executionstore.type
`.
Here are your Bacalhau configuration options in alphabetical order
auth.accesspolicypath
String path to where your security policy is stored
auth.methods
Set authentication method for your Bacalhau network
auth.tokenspath
String path to where your security token is stored
metrics.eventtracerpath
For observability, the path to your event trace records
node.allowlistedlocalpaths
A list of the local paths that should be allowed to be mounted into jobs within your network
node.clientapi.clienttls.cacert
The location of your node client’s chosen Certificate Authority certificate file when self-signed certificates are used
node.clientapi.clienttls.insecure
Boolean binary indicating if the client TLS is insecure, when true instructs the client to use HTTPS (TLS), but not to attempt to verify the certificate
node.clientapi.clienttls.usetls
Boolean indicating if TLS should be used for client connections
node.clientapi.host
The host for the client and server to communicate on (via REST). Ignored if BACALHAU_API_HOST environment variable is set
node.clientapi.port
The port for the client and server to communicate on (via REST). Ignored if BACALHAU_API_PORT environment variable is set
node.clientapi.tls.autocert
Hostname for a certificate to be automatically obtained via ACME
node.clientapi.tls.autocertcachepath
The directory where the autocert process will cache certificates to avoid rate limits
node.clientapi.tls.selfsigned
Boolean indicating if a self-signed security certificate is being used
node.clientapi.tls.servercertificate
The location of a TLS certificate to be used by the requester to serve TLS requests
node.clientapi.tls.serverkey
The TLS server key to match the certificate to allow the requester to serve TLS
node.compute.capacity.defaultjobresourcelimits.cpu
Sets default CPU resource limits for jobs on your Compute node
node.compute.capacity.defaultjobresourcelimits.disk
Sets default disk resource limits for jobs on your Compute node
node.compute.capacity.defaultjobresourcelimits.gpu
Sets default GPU resource limits for jobs on your Compute node
node.compute.capacity.defaultjobresourcelimits.memory
Sets default memory resource limits for jobs on your Compute node
node.compute.capacity.ignorephysicalresourcelimits
Boolean that tells the compute node to ignore its physical resource limits when true
node.compute.capacity.jobresourcelimits.cpu
Sets the specific per job amount of CPU the system can use at one time
node.compute.capacity.jobresourcelimits.disk
Sets the specific per job amount of disk the system can use at one time
node.compute.capacity.jobresourcelimits.gpu
Sets the specific per job amount of GPU the system can use at one time
node.compute.capacity.jobresourcelimits.memory
Sets the specific per job amount of memory the system can use at one time
node.compute.capacity.totalresourcelimits.cpu
Total amount of CPU the system can use at one time in aggregate for all jobs
node.compute.capacity.totalresourcelimits.disk
Total amount of disk the system can use at one time in aggregate for all jobs
node.compute.capacity.totalresourcelimits.gpu
Total amount of GPU the system can use at one time in aggregate for all jobs
node.compute.capacity.totalresourcelimits.memory
Total amount of memory the system can use at one time in aggregate for all jobs
node.compute.controlplanesettings.heartbeatfrequency
How often the compute node will send a heartbeat to the requester node to let it know that the compute node is still alive. This should be less than the requester's configured heartbeat timeout to avoid flapping.
node.compute.controlplanesettings.heartbeattopic
This is the pubsub topic that the compute node will use to send heartbeats to the requester node
node.compute.controlplanesettings.infoupdatefrequency
The frequency with which the compute node will send node info (including current labels) to the controlling requester node
node.compute.controlplanesettings.resourceupdatefrequency
How often the compute node will send current resource availability to the requester node
node.compute.executionstore.path
A metadata store of job executions handled by the current compute node
node.compute.executionstore.type
The type of store used by the compute node
node.compute.jobselection.acceptnetworkedjobs
Boolean signifying if jobs that specify networking should be accepted
node.compute.jobselection.locality
Sets job selection policy based on where the data for the job is located. ‘local’ or ‘anywhere’
node.compute.jobselection.probeexec
Use the result of an executed external program to decide if a job should be accepted. Overrides data locality settings
node.compute.jobselection.probehttp
Use the result of a HTTP POST to decide if a job should be accepted. Overrides data locality settings
node.compute.jobselection.rejectstatelessjobs
Boolean signifying if jobs that don’t specify any data should be rejected
node.compute.jobtimeouts.defaultjobexecutiontimeout
Default value for job execution timeouts on your current compute node. It will be assigned to jobs with no timeout requirement defined
node.compute.jobtimeouts.jobexecutiontimeoutclientidbypasslist
List of clients that are allowed to bypass the job execution timeout check
node.compute.jobtimeouts.jobnegotiationtimeout
Default timeout value to hold a bid for a job
node.compute.jobtimeouts.maxjobexecutiontimeout
Default value for the maximum execution timeout this compute node supports. Jobs with higher timeout requirements will not be bid on
node.compute.jobtimeouts.minjobexecutiontimeout
Default value for the minimum execution timeout this compute node supports. Jobs with lower timeout requirements will not be bid on
node.compute.localpublisher.address
The address for the local publisher's server to bind to
node.compute.localpublisher.directory
The directory where the local publisher will store content
node.compute.localpublisher.port
The port for the local publisher's server to bind to (default: 6001)
node.compute.logging.logrunningexecutionsinterval
The duration interval your compute node should generate logs on the running job executions
node.compute.logstreamconfig.channelbuffersize
How many messages to buffer in the log stream channel, per stream
node.compute.manifestcache.duration
The default time-to-live for each record in the manifest cache
node.compute.manifestcache.frequency
The frequency that the checks for stale records is performed
node.compute.manifestcache.size
Specifies the number of items that can be held in the manifest cache
node.computestoragepath
Path to the storage repository for your execution data within your compute node
node.disabledfeatures.engines
List of Engine types to disable
node.disabledfeatures.publishers
List of Publisher types to disable
node.disabledfeatures.storages
List of Storage types to disable
node.downloadurlrequestretries
Number of retries attempted for the download requests in your node
node.downloadurlrequesttimeout
Duration before a timeout when processing download requests
node.executorpluginpath
Path to the directory for your executor plugins
node.ipfs.connect
The ipfs host multiaddress to connect to, otherwise an in-process IPFS node will be created if not set
node.labels
List of labels to apply to the node that can be used for node selection and filtering
node.loggingmode
Switch between available logging formats for your node - default, station, json, combined, event
node.name
The name of the node. If not set, the node name will be generated automatically based on the chosen name provider
node.nameprovider
The name provider to use to generate the node name, if no name is set
node.network.advertisedaddress
Address to advertise to compute nodes to connect to
node.network.authsecret
Authentication secret for network connections
node.network.cluster.advertisedaddress
Address to advertise to other orchestrators to connect to
node.network.cluster.name
Name of the cluster to join
node.network.cluster.peers
Comma-separated list of other orchestrators to connect to form a cluster
node.network.cluster.port
Port to listen for connections from other orchestrators to form a cluster
node.network.orchestrators
Comma-separated list of orchestrators to connect to. Applies to compute nodes
node.network.port
Port to listen for connections from other nodes. Applies to orchestrator nodes
node.network.storedir
Directory that the network can use for storage
node.requester.controlplanesettings.heartbeatcheckfrequency
This setting is the time period after which a compute node is considered to be unresponsive. If the compute node misses two of these frequencies, it will be marked as unknown. The compute node should have a frequency setting less than this one to ensure that it does not keep switching between unknown and active too frequently
node.requester.controlplanesettings.heartbeattopic
This is the pubsub topic that the compute node will use to send heartbeats to the requester node
node.requester.controlplanesettings.nodedisconnectedafter
This is the time period after which a compute node is considered to be disconnected. If the compute node does not deliver a heartbeat every NodeDisconnectedAfter
then it is considered disconnected
node.requester.defaultpublisher
A default publisher to apply to all jobs without a publisher
node.requester.evaluationbroker.evalbrokerinitialretrydelay
Initial retry delay for the evaluation broker
node.requester.evaluationbroker.evalbrokermaxretrycount
Maximum retry count for the evaluation broker
node.requester.evaluationbroker.evalbrokersubsequentretrydelay
Subsequent retry delay for the evaluation broker
node.requester.evaluationbroker.evalbrokervisibilitytimeout
Visibility timeout for the evaluation broker
node.requester.externalverifierhook
URL specifying where to send external verification requests to
node.requester.failureinjectionconfig.isbadactor
Boolean indicating if failure injection config is a bad actor
node.requester.housekeepingbackgroundtaskinterval
Duration between Bacalhau housekeeping runs
node.requester.jobdefaults.executiontimeout
The maximum amount of time a task is allowed to run in seconds. Zero means no timeout, such as for a daemon task
node.requester.jobselectionpolicy.acceptnetworkedjobs
Boolean signifying if jobs that specify networking should be accepted
node.requester.jobselectionpolicy.locality
Sets job selection policy based on where the data for the job is located. ‘local’ or ‘anywhere’
node.requester.jobselectionpolicy.probeexec
Use the result of an executed external program to decide if a job should be accepted. Overrides data locality settings
node.requester.jobselectionpolicy.probehttp
Use the result of a HTTP POST to decide if a job should be accepted. Overrides data locality settings
node.requester.jobselectionpolicy.rejectstatelessjobs
Boolean signifying if jobs that don’t specify any data should be rejected
node.requester.jobstore.path
The path used for the requester job store store when using BoltDB
node.requester.jobstore.type
The type of job store used by the requester node (BoltDB)
node.requester.manualnodeapproval
Boolean signifying if new nodes should only be manually approved to your network. Default false
node.requester.nodeinfostorettl
Sets the duration for which node information is retained in the node info store after which it is automatically removed from the store
node.requester.noderankrandomnessrange
Description missing
node.requester.overaskforbidsfactor
Number of compute nodes the requester node should ask to bid for a job when deciding on scheduling
node.requester.scheduler.nodeoversubscriptionfactor
Numerical value representing the sum of a node’s total active capacity and queue capacity. With a default value of 1.5, your node can handle 50% more of its total capacity before being excluded from job queueing consideration.
node.requester.scheduler.queuebackoff
The interval between retry attempts by the requester node to assign queued jobs
node.requester.storageprovider.s3.presignedurldisabled
Boolean deciding if a secure S3 URL should be generated and used. Default false, Disabled if true.
node.requester.storageprovider.s3.presignedurlexpiration
Defined expiration interval for your secure S3 urls
node.requester.tagcache.duration
The default time-to-live for each record in the tag cache
node.requester.tagcache.frequency
The frequency that the checks for stale records is performed
node.requester.tagcache.size
Specifies the number of items that can be held in the tag cache
node.requester.translationenabled
Whether jobs should be translated at the requester node or not. Default: false
node.requester.worker.workercount
Number of workers that should be generated under your requester node
node.requester.worker.workerevaldequeuebasebackoff
Default time for your workers to be taken off the evaluation list for new tasks
node.requester.worker.workerevaldequeuemaxbackoff
Maximum time for your workers to be taken off the evaluation list for new tasks
node.requester.worker.workerevaldequeuetimeout
Time for your workers to be evaluated within the queue
node.serverapi.clienttls.cacert
The location of your server’s chosen Certificate Authority certificate file when self-signed certificates are used
node.serverapi.clienttls.insecure
Boolean binary indicating if the server TLS is insecure, when true instructs the server to use HTTPS (TLS), but not to attempt to verify the certificate
node.serverapi.clienttls.usetls
Boolean indicating if TLS should be used for server connections
node.serverapi.host
The host to serve on
node.serverapi.port
The port to serve on
node.serverapi.tls.autocert
Specifies a host name for which ACME is used to obtain a TLS Certificate. Using this option results in the API serving over HTTPS
node.serverapi.tls.autocertcachepath
The directory where the autocert process will cache certificates to avoid rate limits
node.serverapi.tls.selfsigned
Boolean indicating if a self-signed security certificate is being used
node.serverapi.tls.servercertificate
Specifies a TLS certificate file to be used by the requester node
node.serverapi.tls.serverkey
Specifies a TLS key file matching the certificate to be used by the requester node
node.strictversionmatch
Description missing
node.type
Whether the node is a compute, requester or both
node.volumesizerequesttimeout
Duration before a timeout when parsing a node’s volume size.
node.webui.enabled
Whether to start the web UI alongside the bacalhau node
node.webui.port
The port number to listen on for web-ui connections
update.checkfrequency
The frequency with which your system checks for version updates
update.checkstatepath
Version state is stored in this directory
update.skipchecks
Boolean, checks are skipped on your system if true
user.installationid
String tag applied to your user on installation
user.keypath
Path to user authentication key. Client key will be used if a private key is not specified