Skip to main content

Python Hello World

Open In Colab Open In Binder

This example serves as an introduction to Bacalhau. Here, you'll be running a Python file hosted on a website on Bacalhau.

tip

You can run this code on your command line interface (CLI), or you can use the Google Colab or Binder notebooks provided at the top of this example to test the code.

Prerequisites

tip

If you are running this as a notebook the hidden cell below will install the Bacalhau client.

Hello, world

For this example, we'll be using a very simple Python script which displays the traditional first greeting.

%cat hello-world.py
print("Hello, world!")

Submit the workload

To submit a workload to Bacalhau you can use the bacalhau docker run command. While you'll mainly be passing input data into the container using content identifier (CID) volumes, we will be using the -u URL:path argument for simplicity. This results in Bacalhau mounting a data volume inside the container. By default, Bacalhau mounts the input volume at the path /inputs inside the container.

info

Bacalhau overwrites the default entrypoint, so we must run the full command after the -- argument.

bacalhau docker run \
--id-only \
--input-urls https://raw.githubusercontent.com/bacalhau-project/examples/151eebe895151edd83468e3d8b546612bf96cd05/workload-onboarding/trivial-python/hello-world.py \
python:3.10-slim -- python3 /inputs/hello-world.py

Get Results

After the job has finished processing, the next step is to use the get verb to download your outputs locally.

You can run the bacalhau get directly as shown below

%env JOB_ID={job_id}
env: JOB_ID=c2f245d6-43a6-43ec-9a3b-7ce9b6242c88
bacalhau describe ${JOB_ID}
APIVersion: V1beta1
Metadata:
ClientID: 77cf46c04f88ffb1c3e0e4b6e443724e8d2d87074d088ef1a6294a448fa85d2e
CreatedAt: "2023-01-20T13:24:59.165644684Z"
ID: c2f245d6-43a6-43ec-9a3b-7ce9b6242c88
Spec:
Deal:
Concurrency: 1
Docker:
Entrypoint:
- python3
- /inputs/hello-world.py
Image: python:3.10-slim
Engine: Docker
ExecutionPlan:
ShardsTotal: 1
Language:
JobContext: {}
Publisher: Estuary
Resources:
GPU: ""
Sharding:
BatchSize: 1
GlobPatternBasePath: /inputs
Timeout: 1800
Verifier: Noop
Wasm: {}
inputs:
- StorageSource: URLDownload
URL: https://raw.githubusercontent.com/bacalhau-project/examples/151eebe895151edd83468e3d8b546612bf96cd05/workload-onboarding/trivial-python/hello-world.py
path: /inputs
outputs:
- Name: outputs
StorageSource: IPFS
path: /outputs
Status:
JobState:
Nodes:
QmUDAXvv31WPZ8U9CzuRTMn9iFGiopGE7rHiah1X8a6PkT:
Shards:
"0":
NodeId: QmUDAXvv31WPZ8U9CzuRTMn9iFGiopGE7rHiah1X8a6PkT
PublishedResults: {}
State: Cancelled
VerificationResult: {}
QmVAb7r2pKWCuyLpYWoZr9syhhFnTWeFaByHdb8PkkhLQG:
Shards:
"0":
NodeId: QmVAb7r2pKWCuyLpYWoZr9syhhFnTWeFaByHdb8PkkhLQG
PublishedResults: {}
State: Cancelled
VerificationResult: {}
QmXaXu9N5GNetatsvwnTfQqNtSeKAD6uCmarbh3LMRYAcF:
Shards:
"0":
NodeId: QmXaXu9N5GNetatsvwnTfQqNtSeKAD6uCmarbh3LMRYAcF
PublishedResults: {}
State: Cancelled
VerificationResult: {}
QmYgxZiySj3MRkwLSL4X2MF5F9f2PMhAE3LV49XkfNL1o3:
Shards:
"0":
NodeId: QmYgxZiySj3MRkwLSL4X2MF5F9f2PMhAE3LV49XkfNL1o3
PublishedResults: {}
State: Cancelled
VerificationResult: {}
QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL:
Shards:
"0":
NodeId: QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL
PublishedResults:
CID: QmehTNF6ogbESt26EgrSw9YGrApneSWhPesqw1A5T6ezBe
Name: job-c2f245d6-43a6-43ec-9a3b-7ce9b6242c88-shard-0-host-QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL
StorageSource: IPFS
RunOutput:
exitCode: 0
runnerError: ""
stderr: ""
stderrtruncated: false
stdout: |
Hello, world!
stdouttruncated: false
State: Completed
VerificationResult:
Complete: true
Result: true
Requester:
RequesterNodeID: QmdZQ7ZbhnvWY1J12XYKGHApJ6aufKyLNSvf8jZBrBaAVL
RequesterPublicKey: CAASpgIwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDVRKPgCfY2fgfrkHkFjeWcqno+MDpmp8DgVaY672BqJl/dZFNU9lBg2P8Znh8OTtHPPBUBk566vU3KchjW7m3uK4OudXrYEfSfEPnCGmL6GuLiZjLf+eXGEez7qPaoYqo06gD8ROdD8VVse27E96LlrpD1xKshHhqQTxKoq1y6Rx4DpbkSt966BumovWJ70w+Nt9ZkPPydRCxVnyWS1khECFQxp5Ep3NbbKtxHNX5HeULzXN5q0EQO39UN6iBhiI34eZkH7PoAm3Vk5xns//FjTAvQw6wZUu8LwvZTaihs+upx2zZysq6CEBKoeNZqed9+Tf+qHow0P5pxmiu+or+DAgMBAAE=

Alternatively, you can create a directory that will store our job outputs.

rm -rf results && mkdir results
bacalhau get ${JOB_ID} --output-dir results
Fetching results of job 'c2f245d6-43a6-43ec-9a3b-7ce9b6242c88'...
Results for job 'c2f245d6-43a6-43ec-9a3b-7ce9b6242c88' have been written to...
results


2023/01/20 13:25:06 CleanupManager.fnsMutex violation CRITICAL section took 43.424ms 43424000 (threshold 10ms)

At this point, the outputs will be downloaded locally. Each job creates 3 sub_folders: the combined_results, per_shard files, and the raw directory. In each of this sub_folders, you'll find the stdout and stderr

For the scope this of this guide, we will only look at the stdout file. You can go directly to the file folder to inspect the content of the file or use the code belolow


cat results/combined_results/stdout

Hello, world!

Need Support?

If have questions or need support or guidance, please reach out to the Bacalhau team via Slack (#bacalhau channel)