Major configuration management update
Major WebUI update: improved job and nodes monitoring and management
Improved error reporting
Improved job progress visibility
Cross-version compatibility: full support for v1.4.0
or via --config
flag when executing a command:
The integrated web interface has been completely revamped to offer a more intuitive user experience:
Added dark interface theme
Added detailed view for jobs with real time log streaming mode
Added ability to stop a job via WebUI
Added detailed view for nodes
The error return logic has been redesigned in the new version:
Certain error messages have been redesigned and shortened
The color of the error text has been changed to a more prominent - red
Correct HTTP status codes now are used. For example, when requesting a non-existent job, the response is returned with the code 404
instead of 500
, as in v1.4.0
and earlier
The dynamic output of job status to the console has been redesigned:
More informative format, indicating important job execution details that were previously not displayed
Improved progress visibility for jobs with multiple executions
Added --follow
flag which allows tracking job logs right after job start
v.1.5.0
is fully compatible with v1.4.0
between any node types: Compute - Orchestrator and Client - Orchestrator, which provides seamless upgrade experience.
The approach to the Bacalhau configuration management has been significantly redesigned. The changes are described in more details . Another notable change is that the default endpoint is deprecated. So now in order to connect to the public demo network, the address bootstrap.production.bacalhau.org
will need to be manually set in the api.host
key:
Release Notes for Bacalhau v.1.4.0
Embedded libp2p/IPFS deprecation and migration to NATS.
CLI command updates.
Updates to job spec v2, while deprecating job spec v1.
Job queuing was extended with new job timeouts.
Improved error reporting.
Introduction of Node Manager.
We migrated to NATS already in v.1.3.0. (read more here) and will deprecate IPFS/libp2p in v.1.4.0. natively. If you want to migrate to NATS, please make sure to read these docs on the process.
In version 1.4.0 of Bacalhau, all legacy commands will be removed. Here’s a breakdown of the old commands and their new equivalents:
bacalhau create
bacalhau job run
bacalhau cancel
bacalhau job stop
bacalhau list
bacalhau job list
bacalhau logs
bacalhau job logs
bacalhau get
bacalhau job get
bacalhau describe
bacalhau job describe
bacalhau id
bacalhau agent node
bacalhau validate
bacalhau job validate
For some commands there are actions required to migrate to Bacalhau v.1.4.0. In your network. In the following view these actions are specified.
Special Attention to create , validate and describe Commands
create Command
create accepts a v1beta1 job spec.
job run accepts the current job spec.
Users must update their job specifications to align with the new job run requirements.
describe Command
describe returns a v1beta1 job spec and its corresponding state in YAML format.
job describe provides columnar data detailing various parts of the job.
Users should expect a different output format with job describe compared to describe.
validate Command
validate validates a v1beta1 job spec.
job validate validates the current job spec.
v1beta1 job specs will not be considered valid when passed to the job validate command.
If a user tries to use a legacy command, an error message will guide them to the correct new command. For example:
This error depends on the version you are running. There might also appear a failed request warning.
In 1.3.2, we released limited queuing functionality on Compute nodes that would allow a Job to be scheduled on a Compute node if it expected that it would be able to start the job in a reasonable time, and that there wasn’t another node better suited to running it elsewhere. Though a useful enhancement of Job delegation across the network, we feel this isn’t the most optimal path for determining which nodes can execute which Jobs at which time. To that end, we’re introducing a Queuing system in the Requester nodes of a Bacalhau network.
From 1.4.0, if a Job is submitted to a Bacalhau network, but no Compute node has the capacity to either execute, or prepare to execute the Job, the Requester node which received the Job will store it internally and either send it to a Compute node for processing, or until the Job timeout has elapsed. With this change, networks with heavy utilization should see a marked increase in the successful completion of Jobs. Fore more information about this go to this guide.
In the new version of Bacalhau the errors given to users were improved to give more granular feedback on what went wrong and to improve debugging. This makes errors more concise and faster to debug.
In Bacalhau 1.4.0, we’re introducing the Node Manager. This feature simplifies node operations, providing a clear view of all compute nodes and their status. You can approve, deny, or delete nodes as needed, making management straightforward. Heartbeats from nodes keep the Node Manager updated on their connectivity, enhancing overall stability and performance. For more information on this topic, check out the blog post about our release notes for a previous version (v.1.3.1).
Users who are not prepared for the changes in CLI behavior and job specification definitions are advised to remain on Bacalhau v1.3.1. This version continues to support the legacy commands and job specifications. Users can maintain their own private Bacalhau cluster using v1.3.1.
When users are ready to transition to the new CLI behavior and job specification requirements, they can upgrade to Bacalhau v1.4.
We are excited to announce the release of Bacalhau v1.6.0, introducing a new communication architecture that significantly improves the reliability and resilience of distributed compute networks.
At the heart of this release is the new messaging protocol, a complete redesign of node communication that brings significant improvements to network reliability:
Key Benefits
Self-Healing Network: Compute nodes and orchestrators automatically reconnect and sync after network interruptions
Offline-First Operation: Compute nodes can start and operate even when disconnected from the orchestrator
Automatic State Recovery: When nodes reconnect, they automatically share all missed job execution information and results
Zero Data Loss: Ensures no job execution data or results are lost during network disruptions
Seamless Recovery: Network interruptions are handled transparently without requiring manual intervention
Technical Improvements
Reliable Message Delivery: Ordered, at-least-once message delivery between nodes
Automatic Recovery: Built-in failure detection and recovery mechanisms
Connection Health Monitoring: Proactive health checks and connection management
Event-Based Architecture: Decoupled event processing from message delivery
Efficient Checkpointing: Maintains system state for reliable recovery
Backward Compatibility: Maintains compatibility with v1.5 orchestrators
Direct Result Downloads: Download job results directly from the interface
Simplified Configuration: Automatic request routing eliminates manual IP configuration
Improved Architecture: Streamlined backend setup while maintaining security
Reverse Proxy Support: Added capability to run orchestrator behind a reverse proxy
Agent Configuration: New bacalhau agent config
command to inspect agent configuration
TLS Support: Added TLS encryption support for NATS communication
Better Logging: Implemented more human-readable logging patterns
Bacalhau v1.6.0 maintains backward compatibility while introducing the new BMP:
Compute nodes maintain compatibility with v1.5 orchestrators, and vice versa
Support for re-handshake from legacy clients
We're excited for you to experience the enhanced reliability and resilience provided by the BMP in Bacalhau v1.6.0. This release represents a significant architectural advancement in making distributed computing more robust and dependable.