Bacalhau Python SDK

This is the official Python SDK for Bacalhau, named bacalhau-sdk.

Introduction

It is a high-level SDK that ships the client-side logic (e.g. signing requests) needed to query the endpoints. Please take a look at the examples for snippets to create, list and inspect jobs. Under the hood, this SDK uses bacalhau-apiclient (autogenerated via Swagger/OpenAPI) to interact with the API.

Please make sure to use this SDK library in your Python projects, instead of the lower level bacalhau-apiclient. The latter is listed as a dependency of this SDK and will be installed automatically when you follow the installation instructions below.

Features

  1. List, create and inspect Bacalhau jobs using Python objects

  2. Use the production network, or set the following environment variables to target any Bacalhau network out there:

    1. BACALHAU_API_HOST

    2. BACALHAU_API_PORT

  3. Generate a key pair used to sign requests stored in the path specified by the BACALHAU_DIR env var (default: ~/.bacalhau)

Install

From PyPi:

pip install bacalhau-sdk

From source:

Clone the public repository:

git clone https://github.com/bacalhau-project/bacalhau/

Once you have a copy of the source, you can install it with:

cd python/
pip install .

Initialize

Likewise the Bacalhau CLI, this SDK uses a key pair to be stored in BACALHAU_DIR used for signing requests. If a key pair is not found there, it will create one for you.

Example Use

Let's submit a Hello World job and then fetch its output data's CID. We start by importing this sdk, namely bacalhau_sdk, used to create and submit a job create request. Then we import bacalhau_apiclient (installed automatically with this sdk), it provides various object models that compose a job create request. These are used to populate a simple python dictionary that will be passed over to the submit util method.

import pprint

from bacalhau_sdk.api import submit
from bacalhau_sdk.config import get_client_id
from bacalhau_apiclient.models.storage_spec import StorageSpec
from bacalhau_apiclient.models.spec import Spec
from bacalhau_apiclient.models.job_spec_language import JobSpecLanguage
from bacalhau_apiclient.models.job_spec_docker import JobSpecDocker
from bacalhau_apiclient.models.job_sharding_config import JobShardingConfig
from bacalhau_apiclient.models.job_execution_plan import JobExecutionPlan
from bacalhau_apiclient.models.publisher_spec import PublisherSpec
from bacalhau_apiclient.models.deal import Deal


data = dict(
    APIVersion='V1beta1',
    ClientID=get_client_id(),
    Spec=Spec(
        engine="Docker",
        verifier="Noop",
        publisher_spec=PublisherSpec(type="IPFS"),
        docker=JobSpecDocker(
            image="ubuntu",
            entrypoint=["echo", "Hello World!"],
        ),
        language=JobSpecLanguage(job_context=None),
        wasm=None,
        resources=None,
        timeout=1800,
        outputs=[
            StorageSpec(
                storage_source="IPFS",
                name="outputs",
                path="/outputs",
            )
        ],
        sharding=JobShardingConfig(
            batch_size=1,
            glob_pattern_base_path="/inputs",
        ),
        execution_plan=JobExecutionPlan(shards_total=0),
        deal=Deal(concurrency=1, confidence=0, min_bids=0),
        do_not_track=False,
    ),
)

pprint.pprint(submit(data))

The script above prints the following object, the job.metadata.id value is our newly created job id!

{'job': {'api_version': 'V1beta1',
         'metadata': {'client_id': 'bae9c3b2adfa04cc647a2457e8c0c605cef8ed93bdea5ac5f19f94219f722dfe',
                      'created_at': '2023-02-01T19:30:21.405209538Z',
                      'id': '710a0bc2-81d1-4025-8f80-5327ca3ce170'},
         'spec': {'Deal': {'Concurrency': 1},
                  'Docker': {'Entrypoint': ['echo', 'Hello World!'],
                             'Image': 'ubuntu'},
                  'Engine': 'Docker',
                  'ExecutionPlan': {'ShardsTotal': 1},
                  'Language': {'JobContext': {}},
                  'Network': {'Type': 'None'},
                  'Publisher': 'IPFS',
                  'Resources': {'GPU': ''},
                  'Sharding': {'BatchSize': 1,
                               'GlobPatternBasePath': '/inputs'},
                  'Timeout': 1800,
                  'Verifier': 'Noop',
                  'Wasm': {'EntryModule': {}},
                  'outputs': [{'Name': 'outputs',
                               'StorageSource': 'IPFS',
                               'path': '/outputs'}]},
         'status': {'JobState': {},
                    'Requester': {'RequesterNodeID': 'QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL',
                                  'RequesterPublicKey': 'CAASpgIwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDVRKPgCfY2fgfrkHkFjeWcqno+MDpmp8DgVaY672BqJl/dZFNU9lBg2P8Znh8OTtHPPBUBk566vU3KchjW7m3uK4OudXrYEfSfEPnCGmL6GuLiZjLf+eXGEez7qPaoYqo06gD8ROdD8VVse27E96LlrpD1xKshHhqQTxKoq1y6Rx4DpbkSt966BumovWJ70w+Nt9ZkPPydRCxVnyWS1khECFQxp5Ep3NbbKtxHNX5HeULzXN5q0EQO39UN6iBhiI34eZkH7PoAm3Vk5xns//FjTAvQw6wZUu8LwvZTaihs+upx2zZysq6CEBKoeNZqed9+Tf+qHow0P5pxmiu+or+DAgMBAAE='}}}}

We can then use the results method to fetch, among other fields, the output data's CID.

from bacalhau_sdk.api import results

print(results(job_id="710a0bc2-81d1-4025-8f80-5327ca3ce170"))

The line above prints the following dictionary:

{'results': [{'data': {'cid': 'QmYEqqNDdDrsRhPRShKHzsnZwBq3F59Ti3kQmv9En4i5Sw',
                       'metadata': None,
                       'name': 'job-710a0bc2-81d1-4025-8f80-5327ca3ce170-shard-0-host-QmYgxZiySj3MRkwLSL4X2MF5F9f2PMhAE3LV49XkfNL1o3',
                       'path': None,
                       'source_path': None,
                       'storage_source': 'IPFS',
                       'url': None},
              'node_id': 'QmYgxZiySj3MRkwLSL4X2MF5F9f2PMhAE3LV49XkfNL1o3',
              'shard_index': None}]}

Congrats, that was a good start! Please find more code snippets in the examples folder.

Devstack

You can set the environment variables BACALHAU_API_HOST and BACALHAU_API_PORT to point this SDK to your Bacalhau API local devstack.

Developers guide

We use Poetry to manage this package, take a look at their official docs to install it. Note, all targets in the Makefile use poetry as well!

To develop this SDK locally, create a dedicated poetry virtual environment and install the root package (i.e. bacalhau_sdk) and its dependencies:

poetry install --no-interaction --with test,dev -vvv

Creating virtualenv bacalhau-sdk-9mIcLX8U-py3.9 in /Users/enricorotundo/Library/Caches/pypoetry/virtualenvs
Using virtualenv: /Users/enricorotundo/Library/Caches/pypoetry/virtualenvs/bacalhau-sdk-9mIcLX8U-py3.9
Installing dependencies from lock file
...

Note the line above installs the root package (i.e. bacalhau_sdk) in editable mode, that is, any change to its source code is reflected immediately without the need for re-packaging and re-installing it. Easy-peasy!

Then install the pre-commit hooks and test it:

make install-pre-commit
make pre-commit

Last updated