Write a SpecConfig
Overview
A job specification defines how Bacalhau should execute your workload. This guide provides a complete reference of all supported options, configurations, and their valid values.
Supported Values
Job Types
batch
: Run once and completeservice
: Run continuously with specified replica countdaemon
: Run continuously on all matching nodesops
: Run once on all matching nodes
Engine Types
docker
: Docker container executionwasm
: WebAssembly module execution
Storage Types
ipfs
: IPFS contents3
: Amazon S3 storagelocal
: Local filesystemurlDownload
: HTTP/HTTPS URLss3PreSigned
: Pre-signed S3 URLsinline
: Inline content
Network Types
none
: No network access (default)http
: Limited HTTP/HTTPS accessfull
: Unrestricted network access
Publisher Types
ipfs
: Publish to IPFSs3
: Upload to S3local
: Store locallynoop
: Discard results
Result Types
file
: Single file outputdirectory
: Directory of filesstdout
: Standard outputstderr
: Standard errorexitCode
: Process exit code
Compression Types
none
: No compressiongzip
: GZIP compressionzstd
: Zstandard compression
Format Types
raw
: Binary datatext
: Plain textjson
: JSON datacsv
: CSV data
Basic Structure
A job specification is a JSON document with the following structure:
Field Reference
Job Level Fields
Name
string
Yes
-
Alphanumeric with -
and _
Type
string
Yes
batch
batch
, service
, daemon
, ops
Count
integer
No
1
1 or greater
Priority
integer
No
0
0-100
Namespace
string
No
default
Valid DNS label
Labels
object
No
{}
Key-value string pairs
Meta
object
No
{}
Key-value string pairs
Tasks
array
Yes
-
Array of task objects
Task Level Fields
Name
Yes
Job identifier
Must be unique within namespace. Avoid spaces and special characters except -
and _
Type
Yes
Job type (batch/service/daemon/ops)
Services without proper health checks may restart continuously
Count
No
Number of replicas
For batch jobs, each replica runs once. For services, maintains Count running replicas
Tasks
Yes
Array of task definitions
Order matters for multi-task jobs. First task failure stops subsequent tasks
Task Configuration
Tasks define the actual work to be performed. Each task requires:
Engine configuration (how to run)
Resource requirements (what it needs)
Data handling (inputs/outputs)
Task Level Fields
Name
string
Yes
-
Alphanumeric with -
and _
Engine
object
Yes
-
Engine configuration
Resources
object
Yes
-
Resource requirements
InputSources
array
No
[]
Array of input sources
ResultPaths
array
No
[]
Array of result paths
Network
object
No
{"Type": "none"}
Network configuration
Timeouts
object
No
-
Timeout settings
Env
object
No
{}
Key-value string pairs
Meta
object
No
{}
Key-value string pairs
Engine Configuration
Docker Engine Parameters
Image
string
Yes
Docker image name
Entrypoint
array
No
Container entrypoint
Parameters
array
No
Command parameters
WorkingDirectory
string
No
Working directory
EnvironmentVariables
object
No
Environment variables
Ports
array
No
Port mappings
Example:
Common Edge Cases:
Images without default entrypoints require explicit entrypoint
Environment variables with spaces or special characters need proper escaping
Working directory must exist in container
Large images may exceed node storage limits
WASM Engine Parameters
EntryModule
string
Yes
WASM module path
EntryPoint
string
Yes
Exported function name
Parameters
array
No
Function arguments
EnvironmentVariables
object
No
Environment variables
Example:
Common Edge Cases:
WASM modules must explicitly export entry point function
Memory limits must be within node capabilities
Binary data handling requires careful type conversion
Resource Requirements
Resource Fields
CPU
string
Yes
Decimal
0.1 to 128.0
Memory
string
Yes
Size + Unit
1MB to node max
Disk
string
Yes
Size + Unit
10MB to node max
GPU
string
No
Integer
0 to node max
Units:
Memory/Disk: B, KB, MB, GB, TB
CPU: Decimal cores (e.g., "0.5", "2.0")
GPU: Whole numbers only
Example:
CPU can be fractional (e.g., "0.1" to "128.0")
Memory/Disk require units (B, KB, MB, GB, TB)
GPU allocation is integer-only
Over-requesting resources reduces node availability
Edge Cases:
Minimum Allocations
Maximum Values
Data Handling
Input Source Fields
Source.Type
string
Yes
One of: ipfs, s3, local, urlDownload, s3PreSigned, inline
Source.Params
object
Yes
Source-specific parameters
Target
string
Yes
Absolute mount path
Alias
string
No
Friendly identifier
Source Type Parameters
IPFS:
S3:
URL:
Inline:
Example:
Common Pitfalls:
Target paths must be absolute
Parent directories must exist
Path collisions between sources
Missing access permissions
Edge Cases:
Multiple Source Types
Inline Content with Special Characters
Result Path Fields
Name
string
Yes
Result identifier
Path
string
Yes
Absolute path
Type
string
Yes
One of: file, directory, stdout, stderr, exitCode
Format
string
No
One of: raw, text, json, csv
Compression
string
No
One of: none, gzip, zstd
CompressionLevel
integer
No
Compression level (1-9)
Example:
Important Considerations:
Missing output paths fail the job
Large outputs need compression
Some formats require specific file extensions
Path patterns support wildcards
Edge Cases:
Special File Types
Compression Levels
Advanced Configurations
Job Types and Behaviors
Batch Jobs
Edge Cases:
Partial completion handling
Resource competition
Output collisions
Service Jobs
Edge Cases:
Network port conflicts
State persistence
Update coordination
Daemon Jobs
Edge Cases:
Node failure handling
Resource cleanup
State recovery
Network Configuration
Edge Cases:
Domain Resolution
Wildcards and subdomains
IP address ranges
Internal service discovery
Network Isolation
Complex Routing
Timeout Configuration
Edge Cases:
Long-Running Jobs
Quick-Fail Jobs
Common Patterns and Anti-Patterns
Good Patterns
Resource Gradual Scaling
Proper Error Handling
Anti-Patterns
Over-Provisioning
Insufficient Timeouts
Validation and Testing
Pre-Submission Validation
Common Issues:
Schema Validation
Missing required fields
Invalid field types
Unknown properties
Resource Validation
Invalid resource quantities
Incompatible combinations
Exceeded limits
Network Validation
Invalid domain patterns
Port conflicts
Policy violations
Test Runs
This helps catch:
Resource availability issues
Network access problems
Input source validity
Publisher configuration errors
Troubleshooting Guide
Common Error Messages
Resource Errors
Solution: Check node capabilities and adjust requests
Network Errors
Solution: Verify network policy and domain lists
Timeout Errors
Solution: Adjust timeouts or optimize job
Debug Techniques
Resource Monitoring
Network Diagnostics
State Inspection
Last updated
Was this helpful?