Bacalhau supports the three main 'pillars' of observability - logging, metrics, and tracing. Bacalhau uses the OpenTelemetry Go SDK for metrics and tracing, which can be configured using the standard environment variables. Exporting metrics and traces can be as simple as setting the
OTEL_EXPORTER_OTLP_ENDPOINT environment variables. Custom code is used for logging as the OpenTelemetry Go SDK currently doesn't support logging.
Logging in Bacalhau outputs in human-friendly format to stderr at
INFO level by default, but this can be changed by two environment variables:
LOG_LEVEL- Can be one of
fatalto output more or fewer logging messages as required
LOG_TYPE- Can be one of the following values:
default- output logs to stderr in a human-friendly format
json- log messages outputted to stdout in JSON format
combined- log JSON formatted messages to stdout and human-friendly format to stderr
Log statements should include the relevant trace, span and job ID so it can be tracked back to the work being performed.
Bacalhau produces a number of different metrics including those around the libp2p resource manager (
of the requester HTTP API and the number of jobs accepted/completed/received.
Traces are produced for all major pieces of work when processing a job, although the naming of some spans is still being worked on. You can find relevant traces covering working on a job by searching for the
The metrics and traces can easily be forwarded to a variety of different services as we use OpenTelemetry, such as Honeycomb or Datadog.
To view the data locally, or simply to not use a SaaS offering, you can start up Jaeger and Prometheus placing these three files into a directory then running
docker compose start while running Bacalhau with the
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 environment variables.
- "16686:16686" # Jaeger UI
- "14250:14250" # Jaeger gRPC endpoint
- "8888:8888" # Prometheus metrics exposed by the collector
- "8889:8889" # Prometheus exporter metrics
- "13133:13133" # health_check extension
- "4317:4317" # OTLP gRPC receiver
- "9090:9090" # Prometheus UI
- job_name: 'otel-collector'
- targets: ['otel-collector:8889']
- targets: ['otel-collector:8888']