1 of 11

Job Specification

A Job represents a discrete unit of work that can be scheduled and executed. It carries all the necessary information to define the nature of the work, how it should be executed, and the resources it requires.

Type: batch
Count: 1
Priority: 50
Meta:
  version: "1.2.5"
Labels:
  project: "my-project"
Constraints:
  - Key: Architecture
    Operator: '='
    Values:
      - arm64
  - Key: region
    Operator: '='
    Values:
      - us-west-2
Tasks:
  #...

`job` Parameters

Name (string : <optional>): A logical name to refer to the job. Defaults to job ID.
Namespace (string: "default"): The namespace in which the job is running. ClientID is used as a namespace in the public demo network.
Type (string: <required>): The type of the job, such as batch, ops, daemon or service. You can learn more about the supported jobs types in the Job Types guide.
Priority (int: 0): Determines the scheduling priority.
Count (int: <required): Number of replicas to be scheduled. This is only applicable for jobs of type batch and service.
Meta (Meta : nil): Arbitrary metadata associated with the job.
Labels (Label[] : nil): Arbitrary labels associated with the job for filtering purposes.
Constraints (Constraint[] : nil): These are selectors which must be true for a compute node to run this job.
Tasks (Task[] : <required>):: Task associated with the job, which defines a unit of work within the job. Today we are only supporting single task per job, but with future plans to extend this.

Server-Generated Parameters

The following parameters are generated by the server and should not be set directly.

ID (string): A unique identifier assigned to this job. It's auto-generated by the server and should not be set directly. Used for distinguishing between jobs with similar names.
State (State): Represents the current state of the job.
Version (int): A monotonically increasing version number incremented on job specification update.
Revision (int): A monotonically increasing revision number incremented on each update to the job's state or specification.
CreateTime (int): Timestamp of job creation.
ModifyTime (int): Timestamp of last job modification.

Constraint

A Constraint represents a condition that must be met for a compute node to be eligible to run a given job. Operators have the flexibility to manually define node labels when initiating a node using the bacalhau serve command. Additionally, Bacalhau boasts features like automatic resource detection and dynamic labeling, further enhancing its capability.

By defining constraints, you can ensure that jobs are scheduled on nodes that have the necessary requirements or conditions.

`Constraint` Parameters:

Key: The name of the attribute or property to check on the compute node. This could be anything from a specific hardware feature, operating system version, or any other node property.
Operator: Determines the kind of comparison to be made against the Key's value, which can be:
1. in: Checks if the Key's value exists within the provided list of values.
2. notin: Ensures the Key's value doesn't match any in the provided list of values.
3. exists: Verifies that a value for the specified Key is present, regardless of its actual value.
4. !: Confirms the absence of the specified Key. i.e DoesNotExist
5. gt: Assesses if the Key's value is greater than the provided value.
6. lt: Assesses if the Key's value is less than the provided value.
7. = & ==: Both are used to compare the Key's value for an exact match with the provided value.
8. !=: Ensures the Key's value is not the same as the provided value.
Values (optional): A list of values that the node attribute, specified by the Key, is compared against using the Operator. This is not needed for operators like exists or !.

Example:

Consider a scenario where a job should only run on nodes with a GPU and an operating system version greater than 2.0. The constraints for such a requirement might look like:

constraints:
  - key: "hardware.gpu"
    operator: "exists"
  - key: "Operating-System"
    operator: "="
    values: ["linux"]
  - key: "region"
    operator: "in"
    values: ["eu-west-1,eu-west-2"]

In this example, the first constraint checks if the node has a GPU, the second constraint ensures the OS is linux, and deployed in eu-west-1 or eu-west-2`.

Notes:

Constraints are evaluated as a logical AND, meaning all constraints must be satisfied for a node to be eligible.
Using too many specific constraints can lead to a job not being scheduled if no nodes satisfy all the conditions.
It's essential to balance the specificity of constraints with the broader needs and resources available in the cluster.

Labels

The Labels block within a Job specification plays a crucial role in Bacalhau, serving as a mechanism for filtering jobs. By attaching specific labels to jobs, users can quickly and effectively filter and manage jobs via both the Command Line Interface (CLI) and Application Programming Interface (API) based on various criteria.

`Labels` Parameters

Labels are essentially key-value pairs attached to jobs, allowing for detailed categorizations and filtrations. Each label consists of a Key and a Value. These labels can be filtered using operators to pinpoint specific jobs fitting certain criteria.

Filtering Operators

Jobs can be filtered using the following operators:

in: Checks if the key's value matches any within a specified list of values.
notin: Validates that the key's value isn’t within a provided list of values.
exists: Checks for the presence of a specified key, regardless of its value.
!: Validates the absence of a specified key. (i.e., DoesNotExist)
gt: Checks if the key's value is greater than a specified value.
lt: Checks if the key's value is less than a specified value.
= & ==: Used for exact match comparisons between the key’s value and a specified value.
!=: Validates that the key’s value doesn't match a specified value.

Example Usage

Filter jobs with a label whose key is "environment" and value is "development":

bacalhau job list --labels 'environment=development'

Filter jobs with a label whose key is "version" and value is greater than "2.0":

bacalhau job list --labels 'version gt 2.0'

Filter jobs with a label "project" existing:

bacalhau job list --labels 'project'

Filter jobs without a "project" label:

bacalhau job list --labels '!project'

Practical Applications

Job Management: Enables efficient management of jobs by categorizing them based on distinct attributes or criteria.
Automation: Facilitates the automation of job deployment and management processes by allowing scripts and tools to target specific categories of jobs.
Monitoring & Analytics: Enhances monitoring and analytics by grouping jobs into meaningful categories, allowing for detailed insights and analysis.

Conclusion

The Labels block is instrumental in the enhanced management, filtering, and operation of jobs within Bacalhau. By understanding and utilizing the available operators and label parameters effectively, users can optimize their workflow, automate processes, and achieve detailed insights into their jobs.

Network

The Network object offers a method to specify the networking requirements of a Task. It defines the scope and constraints of the network connectivity based on the demands of the task.

`Network` Parameters:

Type (string: "None"): Indicates the network configuration's nature. There are several network modes available:
- None: This mode implies that the task does not necessitate any networking capabilities.
- Full: Specifies that the task mandates unrestricted, raw IP networking without any imposed filters.
- HTTP: This mode constrains the task to only require HTTP networking with specific domains. In this model:
  - The job specifier puts forward a job, stipulating the domain(s) it intends to communicate with.
  - The compute provider assesses the inherent risk of the job based on these domains and bids accordingly.
  - At runtime, the network traffic remains strictly confined to the designated domain(s).

A typical command for this might resemble: bacalhau docker run —network=http —domain=crates.io —domain=github.com -i ipfs://Qmy1234myd4t4,dst=/code rust/compile

The primary risks for the compute provider center around possible violations of its terms, its hosting provider's terms, or even prevailing laws in its jurisdiction. This encompasses issues such as unauthorized access or distribution of illicit content and potential cyber-attacks.

Conversely, the job specifier's primary risk involves operating in a paid environment. External entities might seek to exploit this environment, for instance, through a compromised package download that initiates a crypto mining operation, depleting the allocated, prepaid job time. By limiting traffic strictly to the pre-specified domains, the potential for such cyber threats diminishes considerably.

While a compute provider might impose its limits through other means, having domains declared upfront allows it to selectively bid on jobs that it can execute without issues, improving the user experience for job specifiers.

Domains (string[]: <optional>): A list of domain strings, relevant primarily when the Type is set to HTTP. It dictates the specific domains the task can communicate with over HTTP.

Understanding and utilizing these configurations aptly can ensure that tasks are executed in an environment that aligns with their networking requirements, bolstering efficiency and security.

Input Source

An InputSource defines where and how to retrieve specific artifacts needed for a Task, such as files or data, and where to mount them within the task's context. This ensures the necessary data is present before the task's execution begins.

Bacalhau's InputSource natively supports fetching data from remote sources like S3 and IPFS and can also mount local directories. It is intended to be flexible for future expansion.

`InputSource` Parameters:

Source (SpecConfig : <required>): Specifies the origin of the artifact, which could be a URL, an S3 bucket, or other locations.
Alias (string: <optional>): An optional identifier for this input source. It's particularly useful for dynamic operations within a task, such as dynamically importing data in WebAssembly using an alias.
Target (string: <required>): Defines the path inside the task's environment where the retrieved artifact should be mounted or stored. This ensures that the task can access the data during its execution.

Usage Examples

InputSources:
  - Source:
      Type: s3
      Params:
        Bucket: my_bucket
        Region: us-west-1
    Target: /my_s3_data
  - Source:
      Type: localDirectory
      Params:
        SourcePath: /path/to/local/directory
        ReadWrite: true
    Target: /my_local_data

In this example, the first input source fetches data from an S3 bucket and mounts it at /my_s3_data within the task. The second input source mounts a local directory at /my_local_data and allows the task to read and write data to it.

Resources

The Resources provides a structured way to detail the computational resources a Task requires. By specifying these requirements, you ensure that the task is scheduled on a node with adequate resources, optimizing performance and avoiding potential issues linked to resource constraints.

`Resources` Parameters:

CPU (string: <optional>): Defines the CPU resources required for the task. Units can be specified in cores (e.g., 2 for 2 CPU cores) or in milliCPU units (e.g., 250m or 0.25 for 250 milliCPU units). For instance, if you have half a CPU core, you can represent it as 500m or 0.5.
Memory (string: <optional>): Highlights the amount of RAM needed for the task. You can specify the memory in various units such as:
- Kb for Kilobytes
- Mb for Megabytes
- Gb for Gigabytes
- Tb for Terabytes
Disk (string: <optional>): States the disk storage space needed for the task. Similarly, the disk space can be expressed in units like Gb for Gigabytes, Mb for Megabytes, and so on. As an example, 10Gb indicates 10 Gigabytes of storage space.
GPU (string: <optional>): Denotes the number of GPU units required. For example, 2 signifies the requirement of 2 GPU units. This is crucial for tasks involving heavy computational processes, machine learning models, or tasks that leverage GPU acceleration.

ResultPath

A ResultPath denotes a specific location within a Task that contains meaningful output or results. By specifying a ResultPath, you can pinpoint which files or directories are essential and should be retained or published after the task's execution.

`ResultPath` Parameters:

Name: A descriptive label or identifier for the result, allowing for easier referencing and understanding of the output's nature or significance.
Path: Specifies the exact location, either a file or a directory, within the task's environment where the result or output is stored. This ensures that after the task completes, the critical data at this path can be accessed, retained, or published as necessary.

Task

A Task signifies a distinct unit of work within the broader context of a Job. It defines the specifics of how the task should be executed, where the results should be published, what environment variables are needed, among other configurations

`Task` Parameters

Name (string : <required>): A unique identifier representing the name of the task.
Engine (SpecConfig : required): Configures the execution engine for the task, such as Docker or WebAssembly.
Publisher (SpecConfig : optional): Specifies where the results of the task should be published, such as S3 and IPFS publishers. Only applicable for tasks of type batch and ops.
Env (map[string]string : optional): A set of environment variables for the driver.
Meta (Meta : optional): Allows association of arbitrary metadata with this task.
InputSources (InputSource[] : optional): Lists remote artifacts that should be downloaded before task execution and mounted within the task, such as from S3 or HTTP/HTTPs.
ResultPaths (ResultPath[] : optional): Indicates volumes within the task that should be included in the published result. Only applicable for tasks of type batch and ops.
Resources (Resources : optional): Details the resources that this task requires.
Network (Network : optional): Configurations related to the networking aspects of the task.
Timeouts (Timeouts : optional): Configurations concerning any timeouts associated with the task.

Timeouts

The Timeouts object provides a mechanism to impose timing constraints on specific task operations, particularly execution. By setting these timeouts, users can ensure tasks don't run indefinitely and align them with intended durations.

`Timeouts` Parameters:

ExecutionTimeout (int: <optional>): Defines the maximum duration (in seconds) that a task is permitted to run. A value of zero indicates that there's no set timeout. This could be particularly useful for tasks that function as daemons and are designed to run indefinitely.

Utilizing the Timeouts judiciously helps in managing resource utilization and ensures tasks adhere to expected timelines, thereby enhancing the efficiency and predictability of job executions.

Type

The different job types available in Bacalhau

Bacalhau has recently introduced different job types in v1.1, providing more control and flexibility over the orchestration and scheduling of those jobs - depending on their type.

Despite the differences in job types, all jobs benefit from core functionalities provided by Bacalhau, including:

Node selection - the appropriate nodes are selected based on several criteria, including resource availability, priority and feedback from the nodes.
Job monitoring - jobs are monitored to ensure they complete, and that they stay in a healthy state.
Retries - within limits, Bacalhau will retry certain jobs a set number of times should it fail to complete successfully when requested.

Batch Jobs

Batch jobs are executed on demand, running on a specified number of Bacalhau nodes. These jobs either run until completion or until they reach a timeout. They are designed to carry out a single, discrete task before finishing. This is the only job type.

Ideal for intermittent yet intensive data dives, for instance performing computation over large datasets before publishing the response. This approach eliminates the continuous processing overhead, focusing on specific, in-depth investigations and computation.

Batch Job Example

This example shows a sample Batch job description with all available parameters.

The example demonstrates a job that:

Has a priority of 100
Will be executed on 2 nodes
Will be executed only on nodes with Linux OS
Uses the docker engine
Executes a python script with multiple arguments
Preloads and mounts IPFS data as a local directory
Publishes the results to the IPFS
Has network access type HTTP and 2 allowed domains

# This example shows a sample job file. 
# Parameters, marked as Optional can be skipped - the default values will be used


# Name of the job. Optional. Default value - job ID
Name: Batch Job Example


# Type of the job
Type: batch


# The namespace in which the job is running. Default value - “default”
Namespace: default


# Priority - determines the scheduling priority. By default is 0
Priority: 100


# Count - number of replicas to be scheduled. 
# This is only applicable for jobs of type batch and service.
Count: 2


# Meta - arbitrary metadata associated with the job. 
# Optional
Meta:
  Job purpose : Provide detailed example of the batch job
  Meta purpose: Describe the job


# Labels - Arbitrary labels associated with the job for filtering purposes. 
# Optional
Labels:
  Some option: Some text
  Some other option: Some other text


# Constraint - a condition that must be met for a compute node to be eligible to run a given job. 
# Should be specified in a following format: key - operator - value
# Optional.
Constraints:
- Key: "Operating-System"
  Operator: "="
  Values: ["linux"]


# Task associated with the job, which defines a unit of work within the job. 
# Currently, only one task per job is supported.
Tasks:
  # Name - unique identifier for a task. Default value - “main”
  - Name: Important Calculations


    # Engine - the execution engine for the task. 
    # Defines engine type (docker or wasm) and relevant parameters. 
    # In this example, docker engine will be used.  
    Engine:
      Type: docker


    # Params: A set of key-value pairs that provide the specific configurations for the chosen type
      Params:

        # Image: docker image to be used in the task.
        Image: alek5eyk/batchjobexample:1.1


        # Entrypoint defines a command that will be executed when container starts. 
        # For this example we don't need any so default value 'null' can be used
        Entrypoint: null


        # Parameters define CLI commands, executed after entrypoint        
        Parameters:
          - python
          - supercalc.py
          - "5"
          - /outputs/result.txt


        # WorkingDirectory sets a working directory for entrypoint and paramters' commands.
        # Default value - empty string ""
        WorkingDirectory: ""


        # EnvironmentVariables sets environment variables for the engine
        EnvironmentVariables:
        - DEFAULT_USER_NAME = root
        - API_KEY = none


        # Meta - arbitrary metadata associated with the task. 
        # Optional
        Meta:
          Task goal : show how to create declarative descriptions

    # Publisher specifies where the results of the task should be published - S3, IPFS, Local or none
    # Optional
    # To use IPFS publisher you need to specify only type
    # To use S3 publisher you need to specify bucket, key, region and endpoint
    # See S3 Publisher specification for more details
    Publisher:
      Type: ipfs


    # InputSources lists remote artifacts that should be downloaded before task execution 
    # and mounted within the task
    # Optional
    InputSources:
      - Target: /data
        Source:
          Type: ipfs
          Params:
            CID: "QmSYE8dVx6RTdDFFhBu51JjFG1fwwPdUJoXZ4ZNXvfoK2V"



    # ResultPaths indicate volumes within the task that should be included in the published result
    # Only applicable for batch and ops jobs.
    # Optional
    ResultPaths:
      - Name: outputs
        Path: /outputs


    # Resources is a structured way to detail the required computational resources for the task. 
    # Optional
    Resources:
      # CPU can be specified in cores (e.g. 1) or in milliCPU units (e.g. 250m or 0.25)
      CPU: 250m
      
      # Memory highlights amount of RAM for a job. Can be specified in Kb, Mb, Gb, Tb
      Memory: 1Gb
      
      # Disk states disk storage space, needed for the task.
      Disk: 100mb

      # Denotes the number of GPU units required.
      GPU: "0"


    # Network specifies networking requirements.  
    # Optional
    # Job may have full access to the network,
    # may have no access at all,
    # or may have limited HTTP(S) access to a specific list of domains
    Network:
      Domains:
      - example.com
      - ghcr.io
      Type: HTTP


    # Timeouts define configurations concerning any timeouts associated with the task. 
    # Optional
    Timeouts:
      # QueueTimeout defines how long will job wait for suitable nodes in the network
      # if none are currently available.
      QueueTimeout: 101

      # TotalTimeout defines job execution timeout. When it is reached the job will be terminated
      TotalTimeout: 301

Ops Jobs

Similar to batch jobs, ops jobs have a broader reach. They are executed on all nodes that align with the job specification, but otherwise behave like batch jobs.

Ops jobs are perfect for urgent investigations, granting direct access to logs on host machines, where previously you may have had to wait for the logs to arrive at a central location before being able to query them. They can also be used for delivering configuration files for other systems should you wish to deploy an update to many machines at once.

Ops Job Example

This example shows a sample Ops job description with all available parameters.

The example demonstrates a job that:

Has a priority of 100
Will be executed on all suitable nodes
Will be executed only on nodes with label = WebService
Uses the docker engine
Executes a query with manually specified parameters
Has access to a local directory
Publishes the results to the IPFS, if any
Has network access type HTTP and 2 allowed domains

# This example shows a sample ops job file. 
# Parameters, marked as Optional can be skipped - the default values will be used
# Example from the https://blog.bacalhau.org/p/real-time-log-analysis-with-bacalhau is used


# Name of the job. Optional. Default value - job ID
Name: Live logs processing


# Type of the job
Type: ops


# The namespace in which the job is running. Default value - “default”
Namespace: logging


# Priority - determines the scheduling priority. By default is 0
Priority: 100


# Meta - arbitrary metadata associated with the job. 
# Optional
Meta:
  Job purpose : Provide detailed example of the ops job
  Meta purpose: Describe the job


# Labels - Arbitrary labels associated with the job for filtering purposes. 
# Optional
Labels:
  Job type: ops job
  Ops job feature: To be executed on all suitable nodes


# Constraint - a condition that must be met for a compute node to be eligible to run a given job. 
# Should be specified in a following format: key - operator - value
# Optional.
Constraints:
  - Key: service
    Operator: ==
    Values:
      - WebService


# Task associated with the job, which defines a unit of work within the job. 
# Currently, only one task per job is supported.
Tasks:
  # Name - unique identifier for a task. Default value - “main”
  - Name: LiveLogProcessing


    # Engine - the execution engine for the task. 
    # Defines engine type (docker or wasm) and relevant parameters. 
    # In this example, docker engine will be used.  
    Engine:
      Type: docker


    # Params: A set of key-value pairs that provide the specific configurations for the chosen type
      Params:

        # Image: docker image to be used in the task.
        Image: expanso/nginx-access-log-processor:1.0.0


        # Entrypoint defines a command that will be executed when container starts. 
        # For this example we don't need any so default value 'null' can be used
        Entrypoint: null


        # Parameters define CLI commands, executed after entrypoint        
        Parameters:
          - --query
          - {{.query}}
          - --start-time
          - {{or (index . "start-time") ""}}
          - --end-time
          - {{or (index . "end-time") ""}}


        # WorkingDirectory sets a working directory for entrypoint and paramters' commands.
        # Default value - empty string ""
        WorkingDirectory: ""


        # EnvironmentVariables sets environment variables for the engine
        EnvironmentVariables:
        - DEFAULT_USER_NAME = root
        - API_KEY = none


        # Meta - arbitrary metadata associated with the task. 
        # Optional
        Meta:
          Task goal : show how to create declarative descriptions

    # Publisher specifies where the results of the task should be published - S3, IPFS, Local or none
    # Optional
    # To use IPFS publisher you need to specify only type
    # To use S3 publisher you need to specify bucket, key, region and endpoint
    # See S3 Publisher specification for more details
    Publisher:
      Type: ipfs


    # InputSources lists remote artifacts that should be downloaded before task execution 
    # and mounted within the task.
    # Ensure that localDirectory source is enabled on the nodes
    # Optional
    InputSources:
      - Target: /logs
        Source:
          Type: localDirectory
          Params:
            SourcePath: /data/log-orchestration/logs



    # ResultPaths indicate volumes within the task that should be included in the published result
    # Only applicable for batch and ops jobs.
    # Optional
    ResultPaths:
      - Name: outputs
        Path: /outputs


    # Resources is a structured way to detail the required computational resources for the task. 
    # Optional
    Resources:
      # CPU can be specified in cores (e.g. 1) or in milliCPU units (e.g. 250m or 0.25)
      CPU: 250m
      
      # Memory highlights amount of RAM for a job. Can be specified in Kb, Mb, Gb, Tb
      Memory: 1Gb
      
      # Disk states disk storage space, needed for the task.
      Disk: 100mb

      # Denotes the number of GPU units required.
      GPU: "0"


    # Network specifies networking requirements.  
    # Optional
    # Job may have full access to the network,
    # may have no access at all,
    # or may have limited HTTP(S) access to a specific list of domains
    Network:
      Domains:
      - example.com
      - ghcr.io
      Type: HTTP


    # Timeouts define configurations concerning any timeouts associated with the task. 
    # Optional
    Timeouts:
      # QueueTimeout defines how long will job wait for suitable nodes in the network
      # if none are currently available.
      QueueTimeout: 101

      # TotalTimeout defines job execution timeout. When it is reached the job will be terminated
      TotalTimeout: 301

Daemon Jobs

Daemon jobs run continuously on all nodes that meet the criteria given in the job specification. Should any new compute nodes join the cluster after the job was started, and should they meet the criteria, the job will be scheduled to run on that node too.

A good application of daemon jobs is to handle continuously generated data on every compute node. This might be from edge devices like sensors, or cameras, or from logs where they are generated. The data can then be aggregated and compressed them before sending it onwards. For logs, the aggregated data can be relayed at regular intervals to platforms like Kafka or Kinesis, or directly to other logging services with edge devices potentially delivering results via MQTT.

Daemon Job Example

The example demonstrates a job that:

Has a priority of 100
Will be executed continuously on all suitable nodes
Will be executed only on nodes with label = WebService
Uses the docker engine
Executes a query with manually specified parameters
Has access to 2 local directories with logs
Publishes the results to the IPFS, if any
Has network access type Full in order to send data to the S3 storage

# This example shows a sample daemon job file. 
# Parameters, marked as Optional can be skipped - the default values will be used
# Example from the https://blog.bacalhau.org/p/tutorial-save-25-m-yearly-by-managing is used

# Name of the job. Optional. Default value - job ID
Name: Logstash


# Type of the job
Type: daemon


# The namespace in which the job is running. Default value - “default”
Namespace: logging


# Priority - determines the scheduling priority. By default is 0
Priority: 100


# Meta - arbitrary metadata associated with the job. 
# Optional
Meta:
  Job purpose : Provide detailed example of the daemon job
  Meta purpose: Describe the job


# Labels - Arbitrary labels associated with the job for filtering purposes. 
# Optional
Labels:
  Job type: daemon job
  Daemon job feature: To be executed continuously on all suitable nodes


# Constraint - a condition that must be met for a compute node to be eligible to run a given job. 
# Should be specified in a following format: key - operator - value
# Optional.
Constraints:
  - Key: service
    Operator: ==
    Values:
      - WebService


# Task associated with the job, which defines a unit of work within the job. 
# Currently, only one task per job is supported.
Tasks:
  # Name - unique identifier for a task. Default value - “main”
  - Name: main


    # Engine - the execution engine for the task. 
    # Defines engine type (docker or wasm) and relevant parameters. 
    # In this example, docker engine will be used.  
    Engine:
      Type: docker


    # Params: A set of key-value pairs that provide the specific configurations for the chosen type
      Params:

        # Image: docker image to be used in the task.
        Image: expanso/nginx-access-log-agent:1.0.0


        # Entrypoint defines a command that will be executed when container starts. 
        # For this example we don't need any so default value 'null' can be used
        Entrypoint: null


        # Parameters define CLI commands, executed after entrypoint        
        Parameters:
          - --query
          - {{.query}}
          - --start-time
          - {{or (index . "start-time") ""}}
          - --end-time
          - {{or (index . "end-time") ""}}


        # WorkingDirectory sets a working directory for entrypoint and paramters' commands.
        # Default value - empty string ""
        WorkingDirectory: ""


        # EnvironmentVariables sets environment variables for the engine
        EnvironmentVariables:
          - OPENSEARCH_ENDPOINT={{.OpenSearchEndpoint}}
          - S3_BUCKET={{.AccessLogBucket}}
          - AWS_REGION={{.AWSRegion}}
          - AGGREGATE_DURATION=10
          - S3_TIME_FILE=60


        # Meta - arbitrary metadata associated with the task. 
        # Optional
        Meta:
          Task goal : show how to create declarative descriptions

    # Publisher specifies where the results of the task should be published - S3, IPFS, Local or none
    # Optional
    # To use IPFS publisher you need to specify only type
    # To use S3 publisher you need to specify bucket, key, region and endpoint
    # See S3 Publisher specification for more details
    Publisher:
      Type: ipfs


    # InputSources lists remote artifacts that should be downloaded before task execution 
    # and mounted within the task.
    # Ensure that localDirectory source is enabled on the nodes
    # Optional
    InputSources:
      - Target: /app/logs
        Source:
          Type: localDirectory
          Params:
            SourcePath: /data/log-orchestration/logs
      - Target: /app/state
        Source:
          Type: localDirectory
          Params:
            SourcePath: /data/log-orchestration/state
            ReadWrite: true



    # ResultPaths indicate volumes within the task that should be included in the published result
    # Only applicable for batch and ops jobs.
    # Optional
    ResultPaths:
      - Name: outputs
        Path: /outputs


    # Resources is a structured way to detail the required computational resources for the task. 
    # Optional
    Resources:
      # CPU can be specified in cores (e.g. 1) or in milliCPU units (e.g. 250m or 0.25)
      CPU: 250m
      
      # Memory highlights amount of RAM for a job. Can be specified in Kb, Mb, Gb, Tb
      Memory: 1Gb
      
      # Disk states disk storage space, needed for the task.
      Disk: 100mb

      # Denotes the number of GPU units required.
      GPU: "0"


    # Network specifies networking requirements.  
    # Optional
    # Job may have full access to the network,
    # may have no access at all,
    # or may have limited HTTP(S) access to a specific list of domains
    Network:
      Type: Full


    # Timeouts define configurations concerning any timeouts associated with the task. 
    # Optional
    Timeouts:
      # QueueTimeout defines how long will job wait for suitable nodes in the network
      # if none are currently available.
      QueueTimeout: 101

      # TotalTimeout defines job execution timeout. When it is reached the job will be terminated
      TotalTimeout: 301

Service Jobs

Service jobs run continuously on a specified number of nodes that meet the criteria given in the job specification. Bacalhau's orchestrator selects the optimal nodes to run the job, and continuously monitors its health, performance. If required, it will reschedule on other nodes.

This job type is good for long-running consumers such as streaming or queuing services, or real-time event listeners.

Service Job Example

The example demonstrates a job that:

Has a priority of 100
Will be executed continuously on all suitable nodes
Will be executed only on nodes with architecture = arm64 and located in the us-west-2 region
Uses the docker engine
Executes a query with multiple parameters
Has access to 2 local directories with logs
Publishes the results to the IPFS, if any
Has network access type Full in order to send data to the S3 storage

# This example shows a sample daemon job file. 
# Parameters, marked as Optional can be skipped - the default values will be used
# Example from the https://blog.bacalhau.org/p/introducing-new-job-types-new-horizons is used

# Name of the job. Optional. Default value - job ID
Name: Kinesis Consumer


# Type of the job
Type: service


# The namespace in which the job is running. Default value - “default”
Namespace: service


# Priority - determines the scheduling priority. By default is 0
Priority: 100


# Meta - arbitrary metadata associated with the job. 
# Optional
Meta:
  Job purpose : Provide detailed example of the service job
  Meta purpose: Describe the job


# Labels - Arbitrary labels associated with the job for filtering purposes. 
# Optional
Labels:
  Job type: service job
  Daemon job feature: To be executed continuously on a certain amount of suitable nodes


# Constraint - a condition that must be met for a compute node to be eligible to run a given job. 
# Should be specified in a following format: key - operator - value
# Optional.
Constraints:
  - Key: Architecture
    Operator: '='
    Values:
      - arm64
  - Key: region
    Operator: '='
    Values:
      - us-west-2


# Task associated with the job, which defines a unit of work within the job. 
# Currently, only one task per job is supported.
Tasks:
  # Name - unique identifier for a task. Default value - “main”
  - Name: main


    # Engine - the execution engine for the task. 
    # Defines engine type (docker or wasm) and relevant parameters. 
    # In this example, docker engine will be used.  
    Engine:
      Type: docker


    # Params: A set of key-value pairs that provide the specific configurations for the chosen type
      Params:

        # Image: docker image to be used in the task.
        Image: my-kinesis-consumer:latest


        # Entrypoint defines a command that will be executed when container starts. 
        # For this example we don't need any so default value 'null' can be used
        Entrypoint: null


        # Parameters define CLI commands, executed after entrypoint        
        Parameters:
          - -stream-arn
          - arn:aws:kinesis:us-west-2:123456789012:stream/my-kinesis-stream
          - -shard-iterator
          - TRIM_HORIZON


        # WorkingDirectory sets a working directory for entrypoint and paramters' commands.
        # Default value - empty string ""
        WorkingDirectory: ""


        # EnvironmentVariables sets environment variables for the engine
        EnvironmentVariables:
          - DEFAULT_USER_NAME = root
          - API_KEY = none


        # Meta - arbitrary metadata associated with the task. 
        # Optional
        Meta:
          Task goal : show how to create declarative descriptions

    # Publisher specifies where the results of the task should be published - S3, IPFS, Local or none
    # Optional
    # To use IPFS publisher you need to specify only type
    # To use S3 publisher you need to specify bucket, key, region and endpoint
    # See S3 Publisher specification for more details
    Publisher:
      Type: ipfs


    # InputSources lists remote artifacts that should be downloaded before task execution 
    # and mounted within the task.
    # Ensure that localDirectory source is enabled on the nodes
    # Optional
    InputSources:
      - Target: /app/logs
        Source:
          Type: localDirectory
          Params:
            SourcePath: /data/log-orchestration/logs
      - Target: /app/state
        Source:
          Type: localDirectory
          Params:
            SourcePath: /data/log-orchestration/state
            ReadWrite: true



    # ResultPaths indicate volumes within the task that should be included in the published result
    # Only applicable for batch and ops jobs.
    # Optional
    ResultPaths:
      - Name: outputs
        Path: /outputs


    # Resources is a structured way to detail the required computational resources for the task. 
    # Optional
    Resources:
      # CPU can be specified in cores (e.g. 1) or in milliCPU units (e.g. 250m or 0.25)
      CPU: 250m
      
      # Memory highlights amount of RAM for a job. Can be specified in Kb, Mb, Gb, Tb
      Memory: 4Gb
      
      # Disk states disk storage space, needed for the task.
      Disk: 100mb

      # Denotes the number of GPU units required.
      GPU: "0"


    # Network specifies networking requirements.  
    # Optional
    # Job may have full access to the network,
    # may have no access at all,
    # or may have limited HTTP(S) access to a specific list of domains
    Network:
      Type: Full


    # Timeouts define configurations concerning any timeouts associated with the task. 
    # Optional
    Timeouts:
      # QueueTimeout defines how long will job wait for suitable nodes in the network
      # if none are currently available.
      QueueTimeout: 101

      # TotalTimeout defines job execution timeout. When it is reached the job will be terminated
      TotalTimeout: 301

Type

The different job types available in Bacalhau

Bacalhau has recently introduced different job types in v1.1, providing more control and flexibility over the orchestration and scheduling of those jobs - depending on their type.

Despite the differences in job types, all jobs benefit from core functionalities provided by Bacalhau, including:

Node selection - the appropriate nodes are selected based on several criteria, including resource availability, priority and feedback from the nodes.
Job monitoring - jobs are monitored to ensure they complete, and that they stay in a healthy state.
Retries - within limits, Bacalhau will retry certain jobs a set number of times should it fail to complete successfully when requested.

Batch Jobs

Batch Job Example

This example shows a sample Batch job description with all available parameters.

The example demonstrates a job that:

Has a priority of 100
Will be executed on 2 nodes
Will be executed only on nodes with Linux OS
Uses the docker engine
Executes a python script with multiple arguments
Preloads and mounts IPFS data as a local directory
Publishes the results to the IPFS
Has network access type HTTP and 2 allowed domains

# This example shows a sample job file. 
# Parameters, marked as Optional can be skipped - the default values will be used


# Name of the job. Optional. Default value - job ID
Name: Batch Job Example


# Type of the job
Type: batch


# The namespace in which the job is running. Default value - “default”
Namespace: default


# Priority - determines the scheduling priority. By default is 0
Priority: 100


# Count - number of replicas to be scheduled. 
# This is only applicable for jobs of type batch and service.
Count: 2


# Meta - arbitrary metadata associated with the job. 
# Optional
Meta:
  Job purpose : Provide detailed example of the batch job
  Meta purpose: Describe the job


# Labels - Arbitrary labels associated with the job for filtering purposes. 
# Optional
Labels:
  Some option: Some text
  Some other option: Some other text


# Constraint - a condition that must be met for a compute node to be eligible to run a given job. 
# Should be specified in a following format: key - operator - value
# Optional.
Constraints:
- Key: "Operating-System"
  Operator: "="
  Values: ["linux"]


# Task associated with the job, which defines a unit of work within the job. 
# Currently, only one task per job is supported.
Tasks:
  # Name - unique identifier for a task. Default value - “main”
  - Name: Important Calculations


    # Engine - the execution engine for the task. 
    # Defines engine type (docker or wasm) and relevant parameters. 
    # In this example, docker engine will be used.  
    Engine:
      Type: docker


    # Params: A set of key-value pairs that provide the specific configurations for the chosen type
      Params:

        # Image: docker image to be used in the task.
        Image: alek5eyk/batchjobexample:1.1


        # Entrypoint defines a command that will be executed when container starts. 
        # For this example we don't need any so default value 'null' can be used
        Entrypoint: null


        # Parameters define CLI commands, executed after entrypoint        
        Parameters:
          - python
          - supercalc.py
          - "5"
          - /outputs/result.txt


        # WorkingDirectory sets a working directory for entrypoint and paramters' commands.
        # Default value - empty string ""
        WorkingDirectory: ""


        # EnvironmentVariables sets environment variables for the engine
        EnvironmentVariables:
        - DEFAULT_USER_NAME = root
        - API_KEY = none


        # Meta - arbitrary metadata associated with the task. 
        # Optional
        Meta:
          Task goal : show how to create declarative descriptions

    # Publisher specifies where the results of the task should be published - S3, IPFS, Local or none
    # Optional
    # To use IPFS publisher you need to specify only type
    # To use S3 publisher you need to specify bucket, key, region and endpoint
    # See S3 Publisher specification for more details
    Publisher:
      Type: ipfs


    # InputSources lists remote artifacts that should be downloaded before task execution 
    # and mounted within the task
    # Optional
    InputSources:
      - Target: /data
        Source:
          Type: ipfs
          Params:
            CID: "QmSYE8dVx6RTdDFFhBu51JjFG1fwwPdUJoXZ4ZNXvfoK2V"



    # ResultPaths indicate volumes within the task that should be included in the published result
    # Only applicable for batch and ops jobs.
    # Optional
    ResultPaths:
      - Name: outputs
        Path: /outputs


    # Resources is a structured way to detail the required computational resources for the task. 
    # Optional
    Resources:
      # CPU can be specified in cores (e.g. 1) or in milliCPU units (e.g. 250m or 0.25)
      CPU: 250m
      
      # Memory highlights amount of RAM for a job. Can be specified in Kb, Mb, Gb, Tb
      Memory: 1Gb
      
      # Disk states disk storage space, needed for the task.
      Disk: 100mb

      # Denotes the number of GPU units required.
      GPU: "0"


    # Network specifies networking requirements.  
    # Optional
    # Job may have full access to the network,
    # may have no access at all,
    # or may have limited HTTP(S) access to a specific list of domains
    Network:
      Domains:
      - example.com
      - ghcr.io
      Type: HTTP


    # Timeouts define configurations concerning any timeouts associated with the task. 
    # Optional
    Timeouts:
      # QueueTimeout defines how long will job wait for suitable nodes in the network
      # if none are currently available.
      QueueTimeout: 101

      # TotalTimeout defines job execution timeout. When it is reached the job will be terminated
      TotalTimeout: 301

Ops Jobs

Similar to batch jobs, ops jobs have a broader reach. They are executed on all nodes that align with the job specification, but otherwise behave like batch jobs.

Ops Job Example

This example shows a sample Ops job description with all available parameters.

The example demonstrates a job that:

Has a priority of 100
Will be executed on all suitable nodes
Will be executed only on nodes with label = WebService
Uses the docker engine
Executes a query with manually specified parameters
Has access to a local directory
Publishes the results to the IPFS, if any
Has network access type HTTP and 2 allowed domains

# This example shows a sample ops job file. 
# Parameters, marked as Optional can be skipped - the default values will be used
# Example from the https://blog.bacalhau.org/p/real-time-log-analysis-with-bacalhau is used


# Name of the job. Optional. Default value - job ID
Name: Live logs processing


# Type of the job
Type: ops


# The namespace in which the job is running. Default value - “default”
Namespace: logging


# Priority - determines the scheduling priority. By default is 0
Priority: 100


# Meta - arbitrary metadata associated with the job. 
# Optional
Meta:
  Job purpose : Provide detailed example of the ops job
  Meta purpose: Describe the job


# Labels - Arbitrary labels associated with the job for filtering purposes. 
# Optional
Labels:
  Job type: ops job
  Ops job feature: To be executed on all suitable nodes


# Constraint - a condition that must be met for a compute node to be eligible to run a given job. 
# Should be specified in a following format: key - operator - value
# Optional.
Constraints:
  - Key: service
    Operator: ==
    Values:
      - WebService


# Task associated with the job, which defines a unit of work within the job. 
# Currently, only one task per job is supported.
Tasks:
  # Name - unique identifier for a task. Default value - “main”
  - Name: LiveLogProcessing


    # Engine - the execution engine for the task. 
    # Defines engine type (docker or wasm) and relevant parameters. 
    # In this example, docker engine will be used.  
    Engine:
      Type: docker


    # Params: A set of key-value pairs that provide the specific configurations for the chosen type
      Params:

        # Image: docker image to be used in the task.
        Image: expanso/nginx-access-log-processor:1.0.0


        # Entrypoint defines a command that will be executed when container starts. 
        # For this example we don't need any so default value 'null' can be used
        Entrypoint: null


        # Parameters define CLI commands, executed after entrypoint        
        Parameters:
          - --query
          - {{.query}}
          - --start-time
          - {{or (index . "start-time") ""}}
          - --end-time
          - {{or (index . "end-time") ""}}


        # WorkingDirectory sets a working directory for entrypoint and paramters' commands.
        # Default value - empty string ""
        WorkingDirectory: ""


        # EnvironmentVariables sets environment variables for the engine
        EnvironmentVariables:
        - DEFAULT_USER_NAME = root
        - API_KEY = none


        # Meta - arbitrary metadata associated with the task. 
        # Optional
        Meta:
          Task goal : show how to create declarative descriptions

    # Publisher specifies where the results of the task should be published - S3, IPFS, Local or none
    # Optional
    # To use IPFS publisher you need to specify only type
    # To use S3 publisher you need to specify bucket, key, region and endpoint
    # See S3 Publisher specification for more details
    Publisher:
      Type: ipfs


    # InputSources lists remote artifacts that should be downloaded before task execution 
    # and mounted within the task.
    # Ensure that localDirectory source is enabled on the nodes
    # Optional
    InputSources:
      - Target: /logs
        Source:
          Type: localDirectory
          Params:
            SourcePath: /data/log-orchestration/logs



    # ResultPaths indicate volumes within the task that should be included in the published result
    # Only applicable for batch and ops jobs.
    # Optional
    ResultPaths:
      - Name: outputs
        Path: /outputs


    # Resources is a structured way to detail the required computational resources for the task. 
    # Optional
    Resources:
      # CPU can be specified in cores (e.g. 1) or in milliCPU units (e.g. 250m or 0.25)
      CPU: 250m
      
      # Memory highlights amount of RAM for a job. Can be specified in Kb, Mb, Gb, Tb
      Memory: 1Gb
      
      # Disk states disk storage space, needed for the task.
      Disk: 100mb

      # Denotes the number of GPU units required.
      GPU: "0"


    # Network specifies networking requirements.  
    # Optional
    # Job may have full access to the network,
    # may have no access at all,
    # or may have limited HTTP(S) access to a specific list of domains
    Network:
      Domains:
      - example.com
      - ghcr.io
      Type: HTTP


    # Timeouts define configurations concerning any timeouts associated with the task. 
    # Optional
    Timeouts:
      # QueueTimeout defines how long will job wait for suitable nodes in the network
      # if none are currently available.
      QueueTimeout: 101

      # TotalTimeout defines job execution timeout. When it is reached the job will be terminated
      TotalTimeout: 301

Daemon Jobs

Daemon Job Example

This example shows a sample Daemon job description with all available parameters.

The example demonstrates a job that:

Has a priority of 100
Will be executed continuously on all suitable nodes
Will be executed only on nodes with label = WebService
Uses the docker engine
Executes a query with manually specified parameters
Has access to 2 local directories with logs
Publishes the results to the IPFS, if any
Has network access type Full in order to send data to the S3 storage

# This example shows a sample daemon job file. 
# Parameters, marked as Optional can be skipped - the default values will be used
# Example from the https://blog.bacalhau.org/p/tutorial-save-25-m-yearly-by-managing is used

# Name of the job. Optional. Default value - job ID
Name: Logstash


# Type of the job
Type: daemon


# The namespace in which the job is running. Default value - “default”
Namespace: logging


# Priority - determines the scheduling priority. By default is 0
Priority: 100


# Meta - arbitrary metadata associated with the job. 
# Optional
Meta:
  Job purpose : Provide detailed example of the daemon job
  Meta purpose: Describe the job


# Labels - Arbitrary labels associated with the job for filtering purposes. 
# Optional
Labels:
  Job type: daemon job
  Daemon job feature: To be executed continuously on all suitable nodes


# Constraint - a condition that must be met for a compute node to be eligible to run a given job. 
# Should be specified in a following format: key - operator - value
# Optional.
Constraints:
  - Key: service
    Operator: ==
    Values:
      - WebService


# Task associated with the job, which defines a unit of work within the job. 
# Currently, only one task per job is supported.
Tasks:
  # Name - unique identifier for a task. Default value - “main”
  - Name: main


    # Engine - the execution engine for the task. 
    # Defines engine type (docker or wasm) and relevant parameters. 
    # In this example, docker engine will be used.  
    Engine:
      Type: docker


    # Params: A set of key-value pairs that provide the specific configurations for the chosen type
      Params:

        # Image: docker image to be used in the task.
        Image: expanso/nginx-access-log-agent:1.0.0


        # Entrypoint defines a command that will be executed when container starts. 
        # For this example we don't need any so default value 'null' can be used
        Entrypoint: null


        # Parameters define CLI commands, executed after entrypoint        
        Parameters:
          - --query
          - {{.query}}
          - --start-time
          - {{or (index . "start-time") ""}}
          - --end-time
          - {{or (index . "end-time") ""}}


        # WorkingDirectory sets a working directory for entrypoint and paramters' commands.
        # Default value - empty string ""
        WorkingDirectory: ""


        # EnvironmentVariables sets environment variables for the engine
        EnvironmentVariables:
          - OPENSEARCH_ENDPOINT={{.OpenSearchEndpoint}}
          - S3_BUCKET={{.AccessLogBucket}}
          - AWS_REGION={{.AWSRegion}}
          - AGGREGATE_DURATION=10
          - S3_TIME_FILE=60


        # Meta - arbitrary metadata associated with the task. 
        # Optional
        Meta:
          Task goal : show how to create declarative descriptions

    # Publisher specifies where the results of the task should be published - S3, IPFS, Local or none
    # Optional
    # To use IPFS publisher you need to specify only type
    # To use S3 publisher you need to specify bucket, key, region and endpoint
    # See S3 Publisher specification for more details
    Publisher:
      Type: ipfs


    # InputSources lists remote artifacts that should be downloaded before task execution 
    # and mounted within the task.
    # Ensure that localDirectory source is enabled on the nodes
    # Optional
    InputSources:
      - Target: /app/logs
        Source:
          Type: localDirectory
          Params:
            SourcePath: /data/log-orchestration/logs
      - Target: /app/state
        Source:
          Type: localDirectory
          Params:
            SourcePath: /data/log-orchestration/state
            ReadWrite: true



    # ResultPaths indicate volumes within the task that should be included in the published result
    # Only applicable for batch and ops jobs.
    # Optional
    ResultPaths:
      - Name: outputs
        Path: /outputs


    # Resources is a structured way to detail the required computational resources for the task. 
    # Optional
    Resources:
      # CPU can be specified in cores (e.g. 1) or in milliCPU units (e.g. 250m or 0.25)
      CPU: 250m
      
      # Memory highlights amount of RAM for a job. Can be specified in Kb, Mb, Gb, Tb
      Memory: 1Gb
      
      # Disk states disk storage space, needed for the task.
      Disk: 100mb

      # Denotes the number of GPU units required.
      GPU: "0"


    # Network specifies networking requirements.  
    # Optional
    # Job may have full access to the network,
    # may have no access at all,
    # or may have limited HTTP(S) access to a specific list of domains
    Network:
      Type: Full


    # Timeouts define configurations concerning any timeouts associated with the task. 
    # Optional
    Timeouts:
      # QueueTimeout defines how long will job wait for suitable nodes in the network
      # if none are currently available.
      QueueTimeout: 101

      # TotalTimeout defines job execution timeout. When it is reached the job will be terminated
      TotalTimeout: 301

Service Jobs

This job type is good for long-running consumers such as streaming or queuing services, or real-time event listeners.

Service Job Example

This example shows a sample Service job description with all available parameters.

The example demonstrates a job that:

Has a priority of 100
Will be executed continuously on all suitable nodes
Will be executed only on nodes with architecture = arm64 and located in the us-west-2 region
Uses the docker engine
Executes a query with multiple parameters
Has access to 2 local directories with logs
Publishes the results to the IPFS, if any
Has network access type Full in order to send data to the S3 storage

# This example shows a sample daemon job file. 
# Parameters, marked as Optional can be skipped - the default values will be used
# Example from the https://blog.bacalhau.org/p/introducing-new-job-types-new-horizons is used

# Name of the job. Optional. Default value - job ID
Name: Kinesis Consumer


# Type of the job
Type: service


# The namespace in which the job is running. Default value - “default”
Namespace: service


# Priority - determines the scheduling priority. By default is 0
Priority: 100


# Meta - arbitrary metadata associated with the job. 
# Optional
Meta:
  Job purpose : Provide detailed example of the service job
  Meta purpose: Describe the job


# Labels - Arbitrary labels associated with the job for filtering purposes. 
# Optional
Labels:
  Job type: service job
  Daemon job feature: To be executed continuously on a certain amount of suitable nodes


# Constraint - a condition that must be met for a compute node to be eligible to run a given job. 
# Should be specified in a following format: key - operator - value
# Optional.
Constraints:
  - Key: Architecture
    Operator: '='
    Values:
      - arm64
  - Key: region
    Operator: '='
    Values:
      - us-west-2


# Task associated with the job, which defines a unit of work within the job. 
# Currently, only one task per job is supported.
Tasks:
  # Name - unique identifier for a task. Default value - “main”
  - Name: main


    # Engine - the execution engine for the task. 
    # Defines engine type (docker or wasm) and relevant parameters. 
    # In this example, docker engine will be used.  
    Engine:
      Type: docker


    # Params: A set of key-value pairs that provide the specific configurations for the chosen type
      Params:

        # Image: docker image to be used in the task.
        Image: my-kinesis-consumer:latest


        # Entrypoint defines a command that will be executed when container starts. 
        # For this example we don't need any so default value 'null' can be used
        Entrypoint: null


        # Parameters define CLI commands, executed after entrypoint        
        Parameters:
          - -stream-arn
          - arn:aws:kinesis:us-west-2:123456789012:stream/my-kinesis-stream
          - -shard-iterator
          - TRIM_HORIZON


        # WorkingDirectory sets a working directory for entrypoint and paramters' commands.
        # Default value - empty string ""
        WorkingDirectory: ""


        # EnvironmentVariables sets environment variables for the engine
        EnvironmentVariables:
          - DEFAULT_USER_NAME = root
          - API_KEY = none


        # Meta - arbitrary metadata associated with the task. 
        # Optional
        Meta:
          Task goal : show how to create declarative descriptions

    # Publisher specifies where the results of the task should be published - S3, IPFS, Local or none
    # Optional
    # To use IPFS publisher you need to specify only type
    # To use S3 publisher you need to specify bucket, key, region and endpoint
    # See S3 Publisher specification for more details
    Publisher:
      Type: ipfs


    # InputSources lists remote artifacts that should be downloaded before task execution 
    # and mounted within the task.
    # Ensure that localDirectory source is enabled on the nodes
    # Optional
    InputSources:
      - Target: /app/logs
        Source:
          Type: localDirectory
          Params:
            SourcePath: /data/log-orchestration/logs
      - Target: /app/state
        Source:
          Type: localDirectory
          Params:
            SourcePath: /data/log-orchestration/state
            ReadWrite: true



    # ResultPaths indicate volumes within the task that should be included in the published result
    # Only applicable for batch and ops jobs.
    # Optional
    ResultPaths:
      - Name: outputs
        Path: /outputs


    # Resources is a structured way to detail the required computational resources for the task. 
    # Optional
    Resources:
      # CPU can be specified in cores (e.g. 1) or in milliCPU units (e.g. 250m or 0.25)
      CPU: 250m
      
      # Memory highlights amount of RAM for a job. Can be specified in Kb, Mb, Gb, Tb
      Memory: 4Gb
      
      # Disk states disk storage space, needed for the task.
      Disk: 100mb

      # Denotes the number of GPU units required.
      GPU: "0"


    # Network specifies networking requirements.  
    # Optional
    # Job may have full access to the network,
    # may have no access at all,
    # or may have limited HTTP(S) access to a specific list of domains
    Network:
      Type: Full


    # Timeouts define configurations concerning any timeouts associated with the task. 
    # Optional
    Timeouts:
      # QueueTimeout defines how long will job wait for suitable nodes in the network
      # if none are currently available.
      QueueTimeout: 101

      # TotalTimeout defines job execution timeout. When it is reached the job will be terminated
      TotalTimeout: 301

Job Specification

job Parameters

Server-Generated Parameters

Constraint

Constraint Parameters:

Example:

Notes:

Labels

Labels Parameters

Filtering Operators

Example Usage

Practical Applications

Conclusion

Meta

Meta Parameters in Job and Task Specs

User-Defined Metadata

Example:

Auto-Generated Metadata by Bacalhau

Bacalhau Auto-Generated Keys:

Example:

Implications and Utility

Network

Network Parameters:

Input Source

InputSource Parameters:

Usage Examples

Resources

Resources Parameters:

ResultPath

ResultPath Parameters:

Task

Task Parameters

Timeouts

Timeouts Parameters:

Type

Batch Jobs

Ops Jobs

Daemon Jobs

Service Jobs

Labels

Labels Parameters

Filtering Operators

Example Usage

Practical Applications

Conclusion

Constraint

Constraint Parameters:

Example:

Notes:

Network

Network Parameters:

Job Specification

job Parameters

Server-Generated Parameters

Resources

Resources Parameters:

Input Source

InputSource Parameters:

Usage Examples

Meta

Meta Parameters in Job and Task Specs

User-Defined Metadata

Example:

Auto-Generated Metadata by Bacalhau

Bacalhau Auto-Generated Keys:

Example:

Implications and Utility

ResultPath

ResultPath Parameters:

Timeouts

Timeouts Parameters:

Task

Task Parameters

Type

Batch Jobs

Ops Jobs

Daemon Jobs

Service Jobs

`job` Parameters

`Constraint` Parameters:

`Labels` Parameters

`Meta` Parameters in Job and Task Specs

`Network` Parameters:

`InputSource` Parameters:

`Resources` Parameters:

`ResultPath` Parameters:

`Task` Parameters

`Timeouts` Parameters:

`Labels` Parameters

`Constraint` Parameters:

`Network` Parameters:

`job` Parameters

`Resources` Parameters:

`InputSource` Parameters:

`Meta` Parameters in Job and Task Specs

`ResultPath` Parameters:

`Timeouts` Parameters:

`Task` Parameters