Write a config.yaml
How to write the config.yaml file to configure your nodes
On installation, Bacalhau creates a `.bacalhau` directory that includes a `config.yaml` file tailored for your specific settings. This configuration file is the central repository for custom settings for your Bacalhau nodes.
When initializing a Bacalhau node, the system determines its configuration by following a specific hierarchy. First, it checks the default settings, then the `config.yaml` file, followed by environment variables, and finally, any command line flags specified during execution. Configurations are set and overridden in that sequence. This layered approach allows the default Bacalhau settings to provide a baseline, while environment variables and command-line flags offer added flexibility. However, the `config.yaml` file offers a reliable way to predefine all necessary settings before node creation across environments, ensuring consistency and ease of management.
Modifications to the `config.yaml` file are not dynamically applied to existing nodes. A restart of the Bacalhau node is required for any changes to take effect.
Your `config.yaml` file starts off empty. However, you can see all available settings using the following command
This command showcases over a hundred configuration parameters related to users, security, metrics, updates, and node configuration, providing a comprehensive overview of the customization options available for your Bacalhau setup.
Let’s go through the different options and how your configuration file is structured.
Config.yaml Structure
The `bacalhau config list` command displays your configuration paths, segmented with periods to indicate each part you are configuring.
Consider these configuration settings; `user.installationid
`, `node.name
`, `node.compute.executionstore.path
`, `node.compute.executionstore.type
`, `node.requester.jobstore.type
`, and `node.requester.jobstore.path
`. These settings help set an identifier tag for your Bacalhau user and node then establish storage options for your jobs and execution results.
In your `config.yaml`, these settings will be formatted like this:
Your yaml file hierarchy follows the period delineation - node > compute > executionstore > type for `node.compute.executionstore.type
`.
Configuration Options
Here are your Bacalhau configuration options in alphabetical order
Configuration Option | Description |
---|---|
auth.accesspolicypath | String path to where your security policy is stored |
auth.methods | Set authentication method for your Bacalhau network |
auth.tokenspath | String path to where your security token is stored |
metrics.eventtracerpath | For observability, the path to your event trace records |
node.allowlistedlocalpaths | A list of the local paths that should be allowed to be mounted into jobs within your network |
node.clientapi.clienttls.cacert | The location of your node client’s chosen Certificate Authority certificate file when self-signed certificates are used |
node.clientapi.clienttls.insecure | Boolean binary indicating if the client TLS is insecure, when true instructs the client to use HTTPS (TLS), but not to attempt to verify the certificate |
node.clientapi.clienttls.usetls | Boolean indicating if TLS should be used for client connections |
node.clientapi.host | The host for the client and server to communicate on (via REST). Ignored if BACALHAU_API_HOST environment variable is set |
node.clientapi.port | The port for the client and server to communicate on (via REST). Ignored if BACALHAU_API_PORT environment variable is set |
node.clientapi.tls.autocert | Hostname for a certificate to be automatically obtained via ACME |
node.clientapi.tls.autocertcachepath | The directory where the autocert process will cache certificates to avoid rate limits |
node.clientapi.tls.selfsigned | Boolean indicating if a self-signed security certificate is being used |
node.clientapi.tls.servercertificate | The location of a TLS certificate to be used by the requester to serve TLS requests |
node.clientapi.tls.serverkey | The TLS server key to match the certificate to allow the requester to serve TLS |
node.compute.capacity.defaultjobresourcelimits.cpu | Sets default CPU resource limits for jobs on your Compute node |
node.compute.capacity.defaultjobresourcelimits.disk | Sets default disk resource limits for jobs on your Compute node |
node.compute.capacity.defaultjobresourcelimits.gpu | Sets default GPU resource limits for jobs on your Compute node |
node.compute.capacity.defaultjobresourcelimits.memory | Sets default memory resource limits for jobs on your Compute node |
node.compute.capacity.ignorephysicalresourcelimits | Boolean that tells the compute node to ignore its physical resource limits when true |
node.compute.capacity.jobresourcelimits.cpu | Sets the specific per job amount of CPU the system can use at one time |
node.compute.capacity.jobresourcelimits.disk | Sets the specific per job amount of disk the system can use at one time |
node.compute.capacity.jobresourcelimits.gpu | Sets the specific per job amount of GPU the system can use at one time |
node.compute.capacity.jobresourcelimits.memory | Sets the specific per job amount of memory the system can use at one time |
node.compute.capacity.totalresourcelimits.cpu | Total amount of CPU the system can use at one time in aggregate for all jobs |
node.compute.capacity.totalresourcelimits.disk | Total amount of disk the system can use at one time in aggregate for all jobs |
node.compute.capacity.totalresourcelimits.gpu | Total amount of GPU the system can use at one time in aggregate for all jobs |
node.compute.capacity.totalresourcelimits.memory | Total amount of memory the system can use at one time in aggregate for all jobs |
node.compute.controlplanesettings.heartbeatfrequency | How often the compute node will send a heartbeat to the requester node to let it know that the compute node is still alive. This should be less than the requester's configured heartbeat timeout to avoid flapping. |
node.compute.controlplanesettings.heartbeattopic | This is the pubsub topic that the compute node will use to send heartbeats to the requester node |
node.compute.controlplanesettings.infoupdatefrequency | The frequency with which the compute node will send node info (including current labels) to the controlling requester node |
node.compute.controlplanesettings.resourceupdatefrequency | How often the compute node will send current resource availability to the requester node |
node.compute.executionstore.path | A metadata store of job executions handled by the current compute node |
node.compute.executionstore.type | The type of store used by the compute node |
node.compute.jobselection.acceptnetworkedjobs | Boolean signifying if jobs that specify networking should be accepted |
node.compute.jobselection.locality | Sets job selection policy based on where the data for the job is located. ‘local’ or ‘anywhere’ |
node.compute.jobselection.probeexec | Use the result of an executed external program to decide if a job should be accepted. Overrides data locality settings |
node.compute.jobselection.probehttp | Use the result of a HTTP POST to decide if a job should be accepted. Overrides data locality settings |
node.compute.jobselection.rejectstatelessjobs | Boolean signifying if jobs that don’t specify any data should be rejected |
node.compute.jobtimeouts.defaultjobexecutiontimeout | Default value for job execution timeouts on your current compute node. It will be assigned to jobs with no timeout requirement defined |
node.compute.jobtimeouts.jobexecutiontimeoutclientidbypasslist | List of clients that are allowed to bypass the job execution timeout check |
node.compute.jobtimeouts.jobnegotiationtimeout | Default timeout value to hold a bid for a job |
node.compute.jobtimeouts.maxjobexecutiontimeout | Default value for the maximum execution timeout this compute node supports. Jobs with higher timeout requirements will not be bid on |
node.compute.jobtimeouts.minjobexecutiontimeout | Default value for the minimum execution timeout this compute node supports. Jobs with lower timeout requirements will not be bid on |
node.compute.localpublisher.address | The address for the local publisher's server to bind to |
node.compute.localpublisher.directory | The directory where the local publisher will store content |
node.compute.localpublisher.port | The port for the local publisher's server to bind to (default: 6001) |
node.compute.logging.logrunningexecutionsinterval | The duration interval your compute node should generate logs on the running job executions |
node.compute.logstreamconfig.channelbuffersize | How many messages to buffer in the log stream channel, per stream |
node.compute.manifestcache.duration | The default time-to-live for each record in the manifest cache |
node.compute.manifestcache.frequency | The frequency that the checks for stale records is performed |
node.compute.manifestcache.size | Specifies the number of items that can be held in the manifest cache |
node.computestoragepath | Path to the storage repository for your execution data within your compute node |
node.disabledfeatures.engines | List of Engine types to disable |
node.disabledfeatures.publishers | List of Publisher types to disable |
node.disabledfeatures.storages | List of Storage types to disable |
node.downloadurlrequestretries | Number of retries attempted for the download requests in your node |
node.downloadurlrequesttimeout | Duration before a timeout when processing download requests |
node.executorpluginpath | Path to the directory for your executor plugins |
node.ipfs.connect | The ipfs host multiaddress to connect to, otherwise an in-process IPFS node will be created if not set |
node.labels | List of labels to apply to the node that can be used for node selection and filtering |
node.loggingmode | Switch between available logging formats for your node - default, station, json, combined, event |
node.name | The name of the node. If not set, the node name will be generated automatically based on the chosen name provider |
node.nameprovider | The name provider to use to generate the node name, if no name is set |
node.network.advertisedaddress | Address to advertise to compute nodes to connect to |
node.network.authsecret | Authentication secret for network connections |
node.network.cluster.advertisedaddress | Address to advertise to other orchestrators to connect to |
node.network.cluster.name | Name of the cluster to join |
node.network.cluster.peers | Comma-separated list of other orchestrators to connect to form a cluster |
node.network.cluster.port | Port to listen for connections from other orchestrators to form a cluster |
node.network.orchestrators | Comma-separated list of orchestrators to connect to. Applies to compute nodes |
node.network.port | Port to listen for connections from other nodes. Applies to orchestrator nodes |
node.network.storedir | Directory that the network can use for storage |
node.requester.controlplanesettings.heartbeatcheckfrequency | This setting is the time period after which a compute node is considered to be unresponsive. If the compute node misses two of these frequencies, it will be marked as unknown. The compute node should have a frequency setting less than this one to ensure that it does not keep switching between unknown and active too frequently |
node.requester.controlplanesettings.heartbeattopic | This is the pubsub topic that the compute node will use to send heartbeats to the requester node |
node.requester.controlplanesettings.nodedisconnectedafter | This is the time period after which a compute node is considered to be disconnected. If the compute node does not deliver a heartbeat every |
node.requester.defaultpublisher | A default publisher to apply to all jobs without a publisher |
node.requester.evaluationbroker.evalbrokerinitialretrydelay | Initial retry delay for the evaluation broker |
node.requester.evaluationbroker.evalbrokermaxretrycount | Maximum retry count for the evaluation broker |
node.requester.evaluationbroker.evalbrokersubsequentretrydelay | Subsequent retry delay for the evaluation broker |
node.requester.evaluationbroker.evalbrokervisibilitytimeout | Visibility timeout for the evaluation broker |
node.requester.externalverifierhook | URL specifying where to send external verification requests to |
node.requester.failureinjectionconfig.isbadactor | Boolean indicating if failure injection config is a bad actor |
node.requester.housekeepingbackgroundtaskinterval | Duration between Bacalhau housekeeping runs |
node.requester.jobdefaults.executiontimeout | The maximum amount of time a task is allowed to run in seconds. Zero means no timeout, such as for a daemon task |
node.requester.jobselectionpolicy.acceptnetworkedjobs | Boolean signifying if jobs that specify networking should be accepted |
node.requester.jobselectionpolicy.locality | Sets job selection policy based on where the data for the job is located. ‘local’ or ‘anywhere’ |
node.requester.jobselectionpolicy.probeexec | Use the result of an executed external program to decide if a job should be accepted. Overrides data locality settings |
node.requester.jobselectionpolicy.probehttp | Use the result of a HTTP POST to decide if a job should be accepted. Overrides data locality settings |
node.requester.jobselectionpolicy.rejectstatelessjobs | Boolean signifying if jobs that don’t specify any data should be rejected |
node.requester.jobstore.path | The path used for the requester job store store when using BoltDB |
node.requester.jobstore.type | The type of job store used by the requester node (BoltDB) |
node.requester.manualnodeapproval | Boolean signifying if new nodes should only be manually approved to your network. Default false |
node.requester.nodeinfostorettl | Sets the duration for which node information is retained in the node info store after which it is automatically removed from the store |
node.requester.noderankrandomnessrange | Description missing |
node.requester.overaskforbidsfactor | Number of compute nodes the requester node should ask to bid for a job when deciding on scheduling |
node.requester.scheduler.nodeoversubscriptionfactor | Numerical value representing the sum of a node’s total active capacity and queue capacity. With a default value of 1.5, your node can handle 50% more of its total capacity before being excluded from job queueing consideration. |
node.requester.scheduler.queuebackoff | The interval between retry attempts by the requester node to assign queued jobs |
node.requester.storageprovider.s3.presignedurldisabled | Boolean deciding if a secure S3 URL should be generated and used. Default false, Disabled if true. |
node.requester.storageprovider.s3.presignedurlexpiration | Defined expiration interval for your secure S3 urls |
node.requester.tagcache.duration | The default time-to-live for each record in the tag cache |
node.requester.tagcache.frequency | The frequency that the checks for stale records is performed |
node.requester.tagcache.size | Specifies the number of items that can be held in the tag cache |
node.requester.translationenabled | Whether jobs should be translated at the requester node or not. Default: false |
node.requester.worker.workercount | Number of workers that should be generated under your requester node |
node.requester.worker.workerevaldequeuebasebackoff | Default time for your workers to be taken off the evaluation list for new tasks |
node.requester.worker.workerevaldequeuemaxbackoff | Maximum time for your workers to be taken off the evaluation list for new tasks |
node.requester.worker.workerevaldequeuetimeout | Time for your workers to be evaluated within the queue |
node.serverapi.clienttls.cacert | The location of your server’s chosen Certificate Authority certificate file when self-signed certificates are used |
node.serverapi.clienttls.insecure | Boolean binary indicating if the server TLS is insecure, when true instructs the server to use HTTPS (TLS), but not to attempt to verify the certificate |
node.serverapi.clienttls.usetls | Boolean indicating if TLS should be used for server connections |
node.serverapi.host | The host to serve on |
node.serverapi.port | The port to serve on |
node.serverapi.tls.autocert | Specifies a host name for which ACME is used to obtain a TLS Certificate. Using this option results in the API serving over HTTPS |
node.serverapi.tls.autocertcachepath | The directory where the autocert process will cache certificates to avoid rate limits |
node.serverapi.tls.selfsigned | Boolean indicating if a self-signed security certificate is being used |
node.serverapi.tls.servercertificate | Specifies a TLS certificate file to be used by the requester node |
node.serverapi.tls.serverkey | Specifies a TLS key file matching the certificate to be used by the requester node |
node.strictversionmatch | Description missing |
node.type | Whether the node is a compute, requester or both |
node.volumesizerequesttimeout | Duration before a timeout when parsing a node’s volume size. |
node.webui.enabled | Whether to start the web UI alongside the bacalhau node |
node.webui.port | The port number to listen on for web-ui connections |
update.checkfrequency | The frequency with which your system checks for version updates |
update.checkstatepath | Version state is stored in this directory |
update.skipchecks | Boolean, checks are skipped on your system if true |
user.installationid | String tag applied to your user on installation |
user.keypath | Path to user authentication key. Client key will be used if a private key is not specified |
Last updated