1 of 10

Model Inference

EasyOCR (Optical Character Recognition) on Bacalhau

Introduction

In this example tutorial, we use Bacalhau and Easy OCR to digitize paper records or for recognizing characters or extract text data from images stored on IPFS, S3 or on the web. is a ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic etc. With easy OCR, you use the pre-trained models or use your own fine-tuned model.

TL;DR

Running Easy OCR Locally

Install the required dependencies

Load the different example images

List all the images. You'll see an output like this:

Next, we create a reader to do OCR to get coordinates which represent a rectangle containing text and the text itself:

The docker build command builds Docker images from a Dockerfile.

Before running the command replace:

repo-name with the name of the container, you can name it anything you want
tag this is not required but you can use the latest tag

Next, upload the image to the registry. This can be done by using the Docker hub username, repo name, or tag.

Now that we have an image in the docker hub (your own or an example image from the manual), we can use the container for running on Bacalhau.

Let's look closely at the command below:

export JOB_ID=$( ... ) exports the job ID as environment variable
bacalhau docker run: call to bacalhau
The --gpu 1 flag is set to specify hardware requirements, a GPU is needed to run such a job
The --id-only flag is set to print only job id
-i ipfs://bafybeibvc...... Mounts the model from IPFS
-i https://raw.githubusercontent.com... Mounts the Input Image from a URL
jsacex/easyocr the name and the tag of the docker image we are using
-- easyocr -l ch_sim en -f ./inputs/chinese.jpg --detail=1 --gpu=True execute script with following paramters:
1. -l ch_sim: the name of the model
2. -f ./inputs/chinese.jpg: path to the input Image or directory
3. --detail=1: level of detail
4. --gpu=True: we set this flag to true since we are running inference on a GPU. If you run this on a CPU - set this flag to false

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

The job description should be saved in .yaml format, e.g. easyocr.yaml, and then run with the command:

You can check the status of the job using bacalhau list.

When it says Completed, that means the job is done, and we can get the results.

You can find out more information about your job by using bacalhau describe.

You can download your job results directly by using bacalhau get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.

After the download has finished you should see the following contents in results directory

Now you can find the file in the results/outputs folder. You can view results by running following commands:

Running Inference on Dolly 2.0 Model with Hugging Face

Introduction

Dolly 2.0, the groundbreaking, open-source, instruction-following Large Language Model (LLM) that has been fine-tuned on a human-generated instruction dataset, licensed for both research and commercial purposes. Developed using the EleutherAI Pythia model family, this 12-billion-parameter language model is built exclusively on a high-quality, human-generated instruction following dataset, contributed by Databricks employees.

Dolly 2.0 package is open source, including the training code, dataset, and model weights, all available for commercial use. This unprecedented move empowers organizations to create, own, and customize robust LLMs capable of engaging in human-like interactions, without the need for API access fees or sharing data with third parties.

Running locally

Prerequisites

A NVIDIA GPU
Python

Installing dependencies

pip -q install git+https://github.com/huggingface/transformers # need to install from github
pip -q --upgrade install accelerate # ensure you are using version higher than 0.12.0

Create an inference.py file with following code:

# content of the inference.py file
import argparse
import torch
from transformers import pipeline

def main(prompt_string, model_version):

    # use dolly-v2-12b if you're using Colab Pro+, using pythia-2.8b for Free Colab
    generate_text = pipeline(model=model_version, 
                            torch_dtype=torch.bfloat16, 
                            trust_remote_code=True,
                            device_map="auto")

    print(generate_text(prompt_string))

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--prompt", type=str, required=True, help="The prompt to be used in the GPT model")
    parser.add_argument("--model_version", type=str, default="./databricks/dolly-v2-12b", help="The model version to be used")
    args = parser.parse_args()
    main(args.prompt, args.model_version)

Building the container (optional)

You may want to create your own container for this kind of task. In that case, use the instructions for creating and publishing your own image in the docker hub. Use huggingface/transformers-pytorch-deepspeed-nightly-gpu as base image, install dependencies listed above and copy the inference.py into it. So your Dockerfile will look like this:

FROM huggingface/transformers-pytorch-deepspeed-nightly-gpu
RUN apt-get update -y
RUN pip -q install git+https://github.com/huggingface/transformers
RUN pip -q install accelerate>=0.12.0 
COPY ./inference.py .

Running Inference on Bacalhau

Prerequisite

To get started, you need to install the Bacalhau client, see more information here

Structure of the command

export JOB_ID=$( ... ): Export results of a command execution as environment variable
bacalhau docker run: Run a job using docker executor.
--gpu 1: Flag to specify the number of GPUs to use for the execution. In this case, 1 GPU will be used.
-w /inputs: Flag to set the working directory inside the container to /inputs.
-i gitlfs://huggingface.co/databricks/dolly-v2-3b.git: Flag to clone the Dolly V2-3B model from Hugging Face's repository using Git LFS. The files will be mounted to /inputs/databricks/dolly-v2-3b.
-i https://gist.githubusercontent.com/js-ts/d35e2caa98b1c9a8f176b0b877e0c892/raw/3f020a6e789ceef0274c28fc522ebf91059a09a9/inference.py: Flag to download the inference.py script from the provided URL. The file will be mounted to /inputs/inference.py.
jsacex/dolly_inference:latest: The name and the tag of the Docker image.
The command to run inference on the model: python inference.py --prompt "Where is Earth located ?" --model_version "./databricks/dolly-v2-3b". It consists of:
1. inference.py: The Python script that runs the inference process using the Dolly V2-3B model.
2. --prompt "Where is Earth located ?": Specifies the text prompt to be used for the inference.
3. --model_version "./databricks/dolly-v2-3b": Specifies the path to the Dolly V2-3B model. In this case, the model files are mounted to /inputs/databricks/dolly-v2-3b.

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

export JOB_ID=$(bacalhau docker run \
    --gpu 1 \
    --id-only \
    -w /inputs \
    -i gitlfs://huggingface.co/databricks/dolly-v2-3b.git \
    -i https://gist.githubusercontent.com/js-ts/d35e2caa98b1c9a8f176b0b877e0c892/raw/3f020a6e789ceef0274c28fc522ebf91059a09a9/inference.py \
    jsacex/dolly_inference:latest \
    -- python inference.py --prompt "Where is Earth located ?" --model_version "./databricks/dolly-v2-3b")

Checking the State of your Jobs

Job status: You can check the status of the job using bacalhau list:

bacalhau list --id-filter ${JOB_ID}

When it says Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau describe:

bacalhau describe ${JOB_ID}

Job download: You can download your job results directly by using bacalhau get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.

rm -rf results && mkdir results
bacalhau get ${JOB_ID} --output-dir results

Viewing your Job Output

After the download has finished, we can see the results in the results/outputs folder.

Speech Recognition using Whisper

Introduction

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. It shows that the use of such a large and diverse dataset leads to improved robustness to accents, background noise, and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. Creators are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing. In this example, we will transcribe an audio clip locally, containerize the script and then run the container on Bacalhau.

The advantage of using Bacalhau over managed Automatic Speech Recognition services is that you can run your own containers which can scale to do batch process petabytes of videos or audio for automatic speech recognition

TL;DR

Prerequisite

To get started, you need to install:

Bacalhau client, see more information
Whisper
PyTorch
pandas

Before we create and run the script we need a sample audio file to test the code. For that we download a sample audio clip:

We will create a script that accepts parameters (input file path, output file path, temperature, etc.) and set the default parameters. Also if the input file is in mp4 format, then the script converts it to wav format. The transcript can be saved in various formats. Then the large model is loaded and we pass it the required parameters.

This model is not only limited to English and transcription, it supports many other languages.

Next, let's create an openai-whisper script:

Let's run the script with the default parameters:

To view the outputs, execute following:

To build your own docker container, create a Dockerfile, which contains instructions on how the image will be built, and what extra requirements will be included.

We choose pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime as our base image.

And then install all the dependencies, after that we will add the test audio file and our openai-whisper script to the container, we will also run a test command to check whether our script works inside the container and if the container builds successfully

We will run docker build command to build the container;

Before running the command replace:

repo-name with the name of the container, you can name it anything you want
tag this is not required but you can use the latest tag

In our case:

Next, upload the image to the registry. This can be done by using the Docker hub username, repo name or tag.

In our case:

After the dataset has been uploaded, copy the CID:

Let's look closely at the command below:

export JOB_ID=$( ... ) exports the job ID as environment variable
bacalhau docker run: call to bacalhau
The-i ipfs://bafybeielf6z4cd2nuey5arckect5bjmelhouvn5r: flag to mount the CID which contains our file to the container at the path /inputs
The --gpu 1 flag is set to specify hardware requirements, a GPU is needed to run such a job
jsacex/whisper: the name and the tag of the docker image we are using
python openai-whisper.py: execute the script with following parameters:
1. -p inputs/Apollo_11_moonwalk_montage_720p.mp4 : the input path of our file
2. -o outputs: the path where to store the outputs

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

You can check the status of the job using bacalhau list.

When it says Completed, that means the job is done, and we can get the results.

You can find out more information about your job by using bacalhau describe.

After the download has finished you should see the following contents in results directory

Now you can find the file in the results/outputs folder. To view it, run the following command:

Stable Diffusion on a GPU

This example tutorial demonstrates how to use Stable Diffusion on a GPU and run it on the demo network. is a state of the art text-to-image model that generates images from text and was developed as an open-source alternative to . It is based on a and uses a to generate images from text.

TL;DR

Prerequisite

To get started, you need to install the Bacalhau client, see more information

Quick Test

Here is an example of an image generated by this model.

When you run this code for the first time, it will download the pre-trained weights, which may add a short delay.

When running this code, if you check the GPU RAM usage, you'll see that it's sucked up many GBs, and depending on what GPU you're running, it may OOM (Out of memory) if you run this again.

You can try and reduce RAM usage by playing with batch sizes (although it is only set to 1 above!) or more carefully controlling the TensorFlow session.

To clear the GPU memory we will use numba. This won't be required when running in a single-shot manner.

After writing the code the next step is to run the script.

As a result, you will get something like this:

The following presents additional parameters you can try:

python main.py --p "cat with three eyes - to set prompt
python main.py --p "cat with three eyes" --n 100 - to set the number of iterations to 100
python stable-diffusion.py --p "cat with three eyes" --b 2 to set batch size to 2 (№ of images to generate)

We will run docker build command to build the container;

Before running the command replace following:

repo-name with the name of the container, you can name it anything you want
tag this is not required but you can use the latest tag

In our case:

Next, upload the image to the registry. This can be done by using the Docker hub username, repo name or tag.

In our case:

To submit a job run the Bacalhau command with following structure:

export JOB_ID=$( ... ) exports the job ID as environment variable
The --gpu 1 flag is set to specify hardware requirements, a GPU is needed to run such a job
The --id-only flag is set to print only job id
ghcr.io/bacalhau-project/examples/stable-diffusion-gpu:0.0.1: the name and the tag of the docker image we are using
-- python main.py --o ./outputs --p "meme about tensorflow": The command to run inference on the model. It consists of:
1. main.py path to the script
2. --o ./outputs specifies the output directory
3. --p "meme about tensorflow" specifies the prompt

This will take about 5 minutes to complete and is mainly due to the cold-start GPU setup time. This is faster than the CPU version, but you might still want to grab some fruit or plan your lunchtime run.

Furthermore, the container itself is about 10GB, so it might take a while to download on the node if it isn't cached.

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

Job status: You can check the status of the job using bacalhau list.

When it says Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau describe.

After the download has finished you should see the following contents in results directory

Now you can find the file in the results/outputsfolder:

Stable Diffusion on a CPU

Introduction

is a state of the art text-to-image model that generates images from text and was developed as an open-source alternative to . It is based on a and uses a to generate images from text.

This example demonstrates how to use stable diffusion on a CPU and run it on the demo network. The first section describes the development of the code and the container. The second section demonstrates how to run the job using .

The images presented on this page were generated by this model.

TL;DR

Development

The text-to-image stable diffusion model was trained on a fleet of GPU machines, at great cost. To use this trained model for inference, you also need to run it on a GPU.

However, this isn't always desired or possible. One alternative is to use a project called from Intel that allows you to convert and optimize models from a variety of frameworks (and ONNX if your framework isn't directly supported) to run on a Intel CPU. This is what we will do in this example.

Heads up! This example takes about 10 minutes to generate an image on an average CPU. Whilst this demonstrates it is possible, it might not be practical.

Prerequisites

In order to run this example you need:

Note that these dependencies are only known to work on Ubuntu-based x64 machines.

The following commands clone the example repository, and other required repositories, and install the Python dependencies.

Now that we have all the dependencies installed, we can call the demo.py wrapper, which is a simple CLI, to generate an image from a prompt.

When the generation is complete, you can open the generated hello.png and see something like this:

Lets try another prompt and see what we get:

Now we have a working example, we can convert it into a format that allows us to perform inference in a distributed environment.

This container is using the python:3.9.9-bullseye image and the working directory is set. Next, the Dockerfile installs the same dependencies from earlier in this notebook. Then we add our custom code and pull the dependent repositories.

We've already pushed this image to GHCR, but for posterity, you'd use a command like this to update it:

To submit a job, you can use the Bacalhau CLI. The following command passes a prompt to the model and generates an image in the outputs directory.

This will take about 10 minutes to complete. Go grab a coffee. Or a beer. Or both. If you want to block and wait for the job to complete, add the --wait flag.

Furthermore, the container itself is about 15GB, so it might take a while to download on the node if it isn't cached.

export JOB_ID=$( ... ): Export results of a command execution as environment variable
bacalhau docker run: Run a job using docker executor.
--id-only: Flag to print out only the job id
ghcr.io/bacalhau-project/examples/stable-diffusion-cpu:0.0.1: The name and the tag of the Docker image.
The command to run inference on the model: python demo.py --prompt "First Humans On Mars" --output ../outputs/mars.png. It consists of:
1. demo.py: The Python script that runs the inference process.
2. --prompt "First Humans On Mars": Specifies the text prompt to be used for the inference.
3. --output ../outputs/mars.png: Specifies the path to the output image.

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

Job status: You can check the status of the job using bacalhau list:

When it says Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau describe:

After the download has finished we can see the results in the results/outputs folder.

Object Detection with YOLOv5 on Bacalhau

Introduction

The identification and localization of objects in images and videos is a computer vision task called object detection. Several algorithms have emerged in the past few years to tackle the problem. One of the most popular algorithms to date for real-time object detection is , initially proposed

Traditionally, models like YOLO required enormous amounts of training data to yield reasonable results. People might not have access to such high-quality labeled data. Thankfully, open-source communities and researchers have made it possible to utilize pre-trained models to perform inference. In other words, you can use models that have already been trained on large datasets to perform object detection on your own data.

Bacalhau is a highly scalable decentralized computing platform and is well suited to running massive object detection jobs. In this example, you can take advantage of the GPUs available on the Bacalhau Network and perform an end-to-end object detection inference, using the

TL;DR

Load your dataset into S3/IPFS, specify it and pre-trained weights via the --input flags, choose a suitable container, specify the command and path to save the results - done!

Prerequisite

To get started, you need to install the Bacalhau client, see more information

Test Run with Sample Data

To get started, let's run a test job with a small sample dataset that is included in the YOLOv5 Docker Image. This will give you a chance to familiarise yourself with the process of running a job on Bacalhau.

In addition to the usual Bacalhau flags, you will also see example of using the --gpu 1 flag in order to specify the use of a GPU.

Remember that by default Bacalhau does not provide any network connectivity when running a job. So you need to either provide all assets at job submission time, or use the --network=full or --network=http flags to access the data at task time. See the page for more details

The model requires pre-trained weights to run and by default downloads them from within the container. Bacalhau jobs don't have network access so we will pass in the weights at submission time, saving them to /usr/src/app/yolov5s.pt. You may also provide your own weights here.

The container has its own options that we must specify:

--project specifies the output volume that the model will save its results to. Bacalhau defaults to using /outputs as the output directory, so we save it there.

One final additional hack that we have to do is move the weights file to a location with the standard name. As of writing this, Bacalhau downloads the file to a UUID-named file, which the model is not expecting. This is because GitHub 302 redirects the request to a random file in its backend.

export JOB_ID=$( ... ) exports the job ID as environment variable
The --gpu 1 flag is set to specify hardware requirements, a GPU is needed to run such a job
The --timeout flag is set to make sure that if the job is not completed in the specified time, it will be terminated
The --wait flag is set to wait for the job to complete before return
The --wait-timeout-secs flag is set together with --wait to define how long should app wait for the job to complete
The --id-only flag is set to print only job id
The --input flags are used to specify the sources of input data
-- /bin/bash -c 'find /inputs -type f -exec cp {} /outputs/yolov5s.pt \; ; python detect.py --weights /outputs/yolov5s.pt --source $(pwd)/data/images --project /outputs' tells the model where to find input data and where to write output

This should output a UUID (like 59c59bfb-4ef8-45ac-9f4b-f0e9afd26e70), which will be stored in the environment variable JOB_ID. This is the ID of the job that was created. You can check the status of the job using the commands below.

Job status: You can check the status of the job using bacalhau list:

When it says Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau describe:

After the download has finished we can see the results in the results/outputs/exp folder.

Let's run a the same job again, but this time use the images above.

Just as in the example above, this should output a UUID, which will be stored in the environment variable JOB_ID. You can check the status of the job using the commands below.

Generate Realistic Images using StyleGAN3 and Bacalhau

Introduction

In this example tutorial, we will show you how to generate realistic images with and Bacalhau. StyleGAN is based on Generative Adversarial Networks (GANs), which include a generator and discriminator network that has been trained to differentiate images generated by the generator from real images. However, during the training, the generator tries to fool the discriminator, which results in the generation of realistic-looking images. With StyleGAN3 we can generate realistic-looking images or videos. It can generate not only human faces but also animals, cars, and landscapes.

TL;DR

Prerequisite

To get started, you need to install the Bacalhau client, see more information

Running StyleGAN3 locally

To run StyleGAN3 locally, you'll need to clone the repo, install dependencies and download the model weights.

Now you can generate an image using a pre-trained AFHQv2 model. Here is an example of the image we generated:

To build your own docker container, create a Dockerfile, which contains instructions to build your image.

We will run docker build command to build the container:

Before running the command replace:

repo-name with the name of the container, you can name it anything you want
tag this is not required but you can use the latest tag

In our case:

Next, upload the image to the registry. This can be done by using the Docker hub username, repo name or tag.

In our case:

To submit a job run the Bacalhau command with following structure:

export JOB_ID=$( ... ) exports the job ID as environment variable
bacalhau docker run: call to Bacalhau
The --gpu 1 flag is set to specify hardware requirements, a GPU is needed to run such a job
The --id-only flag is set to print only job id
jsacex/stylegan3: the name and the tag of the docker image we are using
python gen_images.py: execute the script with following parameters:
1. --trunc=1 --seeds=2 --network=stylegan3-r-afhqv2-512x512.pkl: The animation length is either determined based on the --seeds value or explicitly specified using the --num-keyframes option. When num keyframes are specified with --num-keyframes, the output video length will be num_keyframes * w_frames frames.
2. ../outputs: path to the output

The job description should be saved in .yaml format, e.g. stylegan3.yaml, and then run with the command:

You can also run variations of this command to generate videos and other things. In the following command below, we will render a latent vector interpolation video. This will render a 4x2 grid of interpolations for seeds 0 through 31.

Let's look closely at the command below:

export JOB_ID=$( ... ) exports the job ID as environment variable
bacalhau docker run: call to bacalhau
The --gpu 1 flag is set to specify hardware requirements, a GPU is needed to run such a job
The --id-only flag is set to print only job id
jsacex/stylegan3 the name and the tag of the docker image we are using
python gen_images.py: execute the script with following parameters:
1. --trunc=1 --seeds=2 --network=stylegan3-r-afhqv2-512x512.pkl: The animation length is either determined based on the --seeds value or explicitly specified using the --num-keyframes option. When num keyframes is specified with --num-keyframes, the output video length will be num_keyframes * w_frames frames. If --num-keyframes is not specified, the number of seeds given with --seeds must be divisible by grid size W*H (--grid). In this case, the output video length will be # seeds/(w*h)*w_frames frames.
2. ../outputs: path to the output

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

You can check the status of the job using bacalhau list.

When it says Completed, that means the job is done, and we can get the results.

You can find out more information about your job by using bacalhau describe.

After the download has finished you should see the following contents in results directory

Now you can find the file in the results/outputs folder.

Support

Stable Diffusion Checkpoint Inference

Introduction

is a state of the art text-to-image model that generates images from text and was developed as an open-source alternative to . It is based on a and uses a to generate images from text.

This example demonstrates how to use stable diffusion using a finetuned model and run inference on it. The first section describes the development of the code and the container - it is optional as users don't need to build their own containers to use their own custom model. The second section demonstrates how to convert your model weights to ckpt. The third section demonstrates how to run the job using Bacalhau.

The following guide is using the fine-tuned model, which was finetuned on Bacalhau. To learn how to finetune your own stable diffusion model refer to .

TL;DR

Convert your existing model weights to the ckpt format and upload to the IPFS storage.
Create a job using bacalhau docker run, relevant docker image, model weights and any prompt.
Download results using bacalhau get and the job id.

Prerequisite

To get started, you need to install:

Bacalhau client, see more information
NVIDIA GPU
CUDA drivers
NVIDIA docker

To build your own docker container, create a Dockerfile, which contains instructions to containerize the code for inference.

This container is using the pytorch/pytorch:1.13.0-cuda11.6-cudnn8-runtime image and the working directory is set. Next the Dockerfile installs required dependencies. Then we add our custom code and pull the dependent repositories.

We will run docker build command to build the container.

Before running the command replace:

repo-name with the name of the container, you can name it anything you want
tag this is not required but you can use the latest tag

So in our case, the command will look like this:

Next, upload the image to the registry. This can be done by using the Docker hub username, repo name or tag.

Thus, in this case, the command would look this way:

After the repo image has been pushed to Docker Hub, you can now use the container for running on Bacalhau. But before that you need to check whether your model is a ckpt file or not. If your model is a ckpt file you can skip to the running on Bacalhau, and if not - the next section describes how to convert your model into the ckpt format.

To download the convert script:

To convert the model weights into ckpt format, the --half flag cuts the size of the output model from 4GB to 2GB:

After the checkpoint file has been uploaded copy its CID.

Let's look closely at the command above:

export JOB_ID=$( ... ): Export results of a command execution as environment variable
The --gpu 1 flag is set to specify hardware requirements, a GPU is needed to run such a job
-i ipfs://QmUCJuFZ2v7KvjBGHRP2K1TMPFce3reTkKVGF2BJY5bXdZ:/model.ckpt: Path to mount the checkpoint
-- conda run --no-capture-output -n ldm: since we are using conda we need to specify the name of the environment which we are going to use, in this case it is ldm
scripts/txt2img.py: running the python script
--prompt "a photo of a person drinking coffee": the prompt you need to specify the session name in the prompt.
--plms: the sampler you want to use. In this case we will use the plms sampler
--ckpt ../model.ckpt: here we specify the path to our checkpoint
--n_samples 1: no of samples we want to produce
--skip_grid: skip creating a grid of images
--outdir ../outputs: path to store the outputs
--seed $RANDOM: The output generated on the same prompt will always be the same for different outputs on the same prompt set the seed parameter to random

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

Job status: You can check the status of the job using bacalhau list:

When it says Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau describe:

After the download has finished we can see the results in the results/outputs folder. We received following image for our prompt:

Running Inference on a Model stored on S3

Introduction

In this example, we will demonstrate how to run inference on a model stored on Amazon S3. We will use a PyTorch model trained on the MNIST dataset.

Running Locally

Prerequisites

Consider using the latest versions or use the docker method listed below in the article.

Python
PyTorch

Downloading the Datasets

Use the following commands to download the model and test image:

Creating the Inference Script

This script is designed to load a pretrained PyTorch model for MNIST digit classification from a tar.gz file, extract it, and use the model to perform inference on a given input image. Ensure you have all required dependencies installed:

To use this script, you need to provide the paths to the tar.gz file containing the pre-trained model, the output directory where the model will be extracted, and the input image file for which you want to perform inference. The script will output the predicted digit (class) for the given input image.

export JOB_ID=$( ... ): Export results of a command execution as environment variable
-w /inputs Set the current working directory at /inputs in the container
-i src=s3://sagemaker-sample-files/datasets/image/MNIST/model/pytorch-training-2020-11-21-22-02-56-203/model.tar.gz,dst=/model/,opt=region=us-east-1: Mount the s3 bucket at the destination path provided - /model/ and specifying the region where the bucket is located opt=region=us-east-1
-i git://github.com/js-ts/mnist-test.git: Flag to mount the source code repo from GitHub. It would mount the repo at /inputs/js-ts/mnist-test in this case it also contains the test image
pytorch/pytorch: The name of the Docker image
-- python3 /inputs/js-ts/mnist-test/inference.py --tar_gz_file_path /model/model.tar.gz --output_directory /model-pth --image_path /inputs/js-ts/mnist-test/image.png: The command to run inference on the model. It consists of:
1. /model/model.tar.gz is the path to the model file
2. /model-pth is the output directory for the model
3. /inputs/js-ts/mnist-test/image.png is the path to the input image

When the job is submitted Bacalhau prints out the related job id. We store that in an environment variable JOB_ID so that we can reuse it later on.

Use the bacalhau logs command to view the job output, since the script prints the result of execution to the stdout:

You can also use bacalhau get to download job results:

Support

#content of the openai-whisper.py file import argparse import os import sys import warnings import whisper from pathlib import Path import subprocess import torch import shutil import numpy as np parser = argparse.ArgumentParser(description="OpenAI Whisper Automatic Speech Recognition") parser.add_argument("-l",dest="audiolanguage", type=str,help="Language spoken in the audio, use Auto detection to let Whisper detect the language. Select from the following languages['Auto detection', 'Afrikaans', 'Albanian', 'Amharic', 'Arabic', 'Armenian', 'Assamese', 'Azerbaijani', 'Bashkir', 'Basque', 'Belarusian', 'Bengali', 'Bosnian', 'Breton', 'Bulgarian', 'Burmese', 'Castilian', 'Catalan', 'Chinese', 'Croatian', 'Czech', 'Danish', 'Dutch', 'English', 'Estonian', 'Faroese', 'Finnish', 'Flemish', 'French', 'Galician', 'Georgian', 'German', 'Greek', 'Gujarati', 'Haitian', 'Haitian Creole', 'Hausa', 'Hawaiian', 'Hebrew', 'Hindi', 'Hungarian', 'Icelandic', 'Indonesian', 'Italian', 'Japanese', 'Javanese', 'Kannada', 'Kazakh', 'Khmer', 'Korean', 'Lao', 'Latin', 'Latvian', 'Letzeburgesch', 'Lingala', 'Lithuanian', 'Luxembourgish', 'Macedonian', 'Malagasy', 'Malay', 'Malayalam', 'Maltese', 'Maori', 'Marathi', 'Moldavian', 'Moldovan', 'Mongolian', 'Myanmar', 'Nepali', 'Norwegian', 'Nynorsk', 'Occitan', 'Panjabi', 'Pashto', 'Persian', 'Polish', 'Portuguese', 'Punjabi', 'Pushto', 'Romanian', 'Russian', 'Sanskrit', 'Serbian', 'Shona', 'Sindhi', 'Sinhala', 'Sinhalese', 'Slovak', 'Slovenian', 'Somali', 'Spanish', 'Sundanese', 'Swahili', 'Swedish', 'Tagalog', 'Tajik', 'Tamil', 'Tatar', 'Telugu', 'Thai', 'Tibetan', 'Turkish', 'Turkmen', 'Ukrainian', 'Urdu', 'Uzbek', 'Valencian', 'Vietnamese', 'Welsh', 'Yiddish', 'Yoruba'] ",default="English") parser.add_argument("-p",dest="inputpath", type=str,help="Path of the input file",default="/hello.mp3") parser.add_argument("-v",dest="typeverbose", type=str,help="Whether to print out the progress and debug messages. ['Live transcription', 'Progress bar', 'None']",default="Live transcription") parser.add_argument("-g",dest="outputtype", type=str,help="Type of file to generate to record the transcription. ['All', '.txt', '.vtt', '.srt']",default="All") parser.add_argument("-s",dest="speechtask", type=str,help="Whether to perform X->X speech recognition (`transcribe`) or X->English translation (`translate`). ['transcribe', 'translate']",default="transcribe") parser.add_argument("-n",dest="numSteps", type=int,help="Number of Steps",default=50) parser.add_argument("-t",dest="decodingtemperature", type=int,help="Temperature to increase when falling back when the decoding fails to meet either of the thresholds below.",default=0.15 ) parser.add_argument("-b",dest="beamsize", type=int,help="Number of Images",default=5) parser.add_argument("-o",dest="output", type=str,help="Output Folder where to store the outputs",default="") args=parser.parse_args() device = torch.device('cuda:0') print('Using device:', device, file=sys.stderr) Model = 'large' whisper_model =whisper.load_model(Model) video_path_local = os.getcwd()+args.inputpath file_name=os.path.basename(video_path_local) output_file_path=args.output if os.path.splitext(video_path_local)[1] == ".mp4": video_path_local_wav =os.path.splitext(file_name)[0]+".wav" result = subprocess.run(["ffmpeg", "-i", str(video_path_local), "-vn", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", str(video_path_local_wav)]) # add language parameters # Language spoken in the audio, use Auto detection to let Whisper detect the language. # ['Auto detection', 'Afrikaans', 'Albanian', 'Amharic', 'Arabic', 'Armenian', 'Assamese', 'Azerbaijani', 'Bashkir', 'Basque', 'Belarusian', 'Bengali', 'Bosnian', 'Breton', 'Bulgarian', 'Burmese', 'Castilian', 'Catalan', 'Chinese', 'Croatian', 'Czech', 'Danish', 'Dutch', 'English', 'Estonian', 'Faroese', 'Finnish', 'Flemish', 'French', 'Galician', 'Georgian', 'German', 'Greek', 'Gujarati', 'Haitian', 'Haitian Creole', 'Hausa', 'Hawaiian', 'Hebrew', 'Hindi', 'Hungarian', 'Icelandic', 'Indonesian', 'Italian', 'Japanese', 'Javanese', 'Kannada', 'Kazakh', 'Khmer', 'Korean', 'Lao', 'Latin', 'Latvian', 'Letzeburgesch', 'Lingala', 'Lithuanian', 'Luxembourgish', 'Macedonian', 'Malagasy', 'Malay', 'Malayalam', 'Maltese', 'Maori', 'Marathi', 'Moldavian', 'Moldovan', 'Mongolian', 'Myanmar', 'Nepali', 'Norwegian', 'Nynorsk', 'Occitan', 'Panjabi', 'Pashto', 'Persian', 'Polish', 'Portuguese', 'Punjabi', 'Pushto', 'Romanian', 'Russian', 'Sanskrit', 'Serbian', 'Shona', 'Sindhi', 'Sinhala', 'Sinhalese', 'Slovak', 'Slovenian', 'Somali', 'Spanish', 'Sundanese', 'Swahili', 'Swedish', 'Tagalog', 'Tajik', 'Tamil', 'Tatar', 'Telugu', 'Thai', 'Tibetan', 'Turkish', 'Turkmen', 'Ukrainian', 'Urdu', 'Uzbek', 'Valencian', 'Vietnamese', 'Welsh', 'Yiddish', 'Yoruba'] language = args.audiolanguage # Whether to print out the progress and debug messages. # ['Live transcription', 'Progress bar', 'None'] verbose = args.typeverbose # Type of file to generate to record the transcription. # ['All', '.txt', '.vtt', '.srt'] output_type = args.outputtype # Whether to perform X->X speech recognition (`transcribe`) or X->English translation (`translate`). # ['transcribe', 'translate'] task = args.speechtask # Temperature to use for sampling. temperature = args.decodingtemperature # Temperature to increase when falling back when the decoding fails to meet either of the thresholds below. temperature_increment_on_fallback = 0.2 # Number of candidates when sampling with non-zero temperature. best_of = 5 # Number of beams in beam search, only applicable when temperature is zero. beam_size = args.beamsize # Optional patience value to use in beam decoding, as in [*Beam Decoding with Controlled Patience*](https://arxiv.org/abs/2204.05424), the default (1.0) is equivalent to conventional beam search. patience = 1.0 # Optional token length penalty coefficient (alpha) as in [*Google's Neural Machine Translation System*](https://arxiv.org/abs/1609.08144), set to negative value to uses simple length normalization. length_penalty = -0.05 # Comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations. suppress_tokens = "-1" # Optional text to provide as a prompt for the first window. initial_prompt = "" # if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop. condition_on_previous_text = True # whether to perform inference in fp16. fp16 = True # If the gzip compression ratio is higher than this value, treat the decoding as failed. compression_ratio_threshold = 2.4 # If the average log probability is lower than this value, treat the decoding as failed. logprob_threshold = -1.0 # If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence. no_speech_threshold = 0.6 verbose_lut = { 'Live transcription': True, 'Progress bar': False, 'None': None } args = dict( language = (None if language == "Auto detection" else language), verbose = verbose_lut[verbose], task = task, temperature = temperature, temperature_increment_on_fallback = temperature_increment_on_fallback, best_of = best_of, beam_size = beam_size, patience=patience, length_penalty=(length_penalty if length_penalty>=0.0 else None), suppress_tokens=suppress_tokens, initial_prompt=(None if not initial_prompt else initial_prompt), condition_on_previous_text=condition_on_previous_text, fp16=fp16, compression_ratio_threshold=compression_ratio_threshold, logprob_threshold=logprob_threshold, no_speech_threshold=no_speech_threshold ) temperature = args.pop("temperature") temperature_increment_on_fallback = args.pop("temperature_increment_on_fallback") if temperature_increment_on_fallback is not None: temperature = tuple(np.arange(temperature, 1.0 + 1e-6, temperature_increment_on_fallback)) else: temperature = [temperature] if Model.endswith(".en") and args["language"] not in {"en", "English"}: warnings.warn(f"{Model} is an English-only model but receipted '{args['language']}'; using English instead.") args["language"] = "en" video_transcription = whisper.transcribe( whisper_model, str(video_path_local), temperature=temperature, **args, ) # Save output writing_lut = { '.txt': whisper.utils.write_txt, '.vtt': whisper.utils.write_vtt, '.srt': whisper.utils.write_txt, } if output_type == "All": for suffix, write_suffix in writing_lut.items(): transcript_local_path =os.getcwd()+output_file_path+'/'+os.path.splitext(file_name)[0] +suffix with open(transcript_local_path, "w", encoding="utf-8") as f: write_suffix(video_transcription["segments"], file=f) try: transcript_drive_path =file_name except: print(f"**Transcript file created: {transcript_local_path}**") else: transcript_local_path =output_file_path+'/'+os.path.splitext(file_name)[0] +output_type with open(transcript_local_path, "w", encoding="utf-8") as f: writing_lut[output_type](video_transcription["segments"], file=f)

# content of the inference.py file import torch import torchvision.transforms as transforms from PIL import Image from torch.autograd import Variable import argparse import tarfile class CustomModel(torch.nn.Module): def __init__(self): super(CustomModel, self).__init__() self.conv1 = torch.nn.Conv2d(1, 10, 5) self.conv2 = torch.nn.Conv2d(10, 20, 5) self.fc1 = torch.nn.Linear(320, 50) self.fc2 = torch.nn.Linear(50, 10) def forward(self, x): x = torch.relu(self.conv1(x)) x = torch.max_pool2d(x, 2) x = torch.relu(self.conv2(x)) x = torch.max_pool2d(x, 2) x = torch.flatten(x, 1) x = torch.relu(self.fc1(x)) x = self.fc2(x) output = torch.log_softmax(x, dim=1) return output def extract_tar_gz(file_path, output_dir): with tarfile.open(file_path, 'r:gz') as tar: tar.extractall(path=output_dir) # Parse command-line arguments parser = argparse.ArgumentParser() parser.add_argument('--tar_gz_file_path', type=str, required=True, help='Path to the tar.gz file') parser.add_argument('--output_directory', type=str, required=True, help='Output directory to extract the tar.gz file') parser.add_argument('--image_path', type=str, required=True, help='Path to the input image file') args = parser.parse_args() # Extract the tar.gz file tar_gz_file_path = args.tar_gz_file_path output_directory = args.output_directory extract_tar_gz(tar_gz_file_path, output_directory) # Load the model model_path = f"{output_directory}/model.pth" model = CustomModel() model.load_state_dict(torch.load(model_path, map_location=torch.device("cpu"))) model.eval() # Transformations for the MNIST dataset transform = transforms.Compose([ transforms.Resize((28, 28)), transforms.Grayscale(num_output_channels=1), transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)), ]) # Function to run inference on an image def run_inference(image, model): image_tensor = transform(image).unsqueeze(0) # Apply transformations and add batch dimension input = Variable(image_tensor) # Perform inference output = model(input) _, predicted = torch.max(output.data, 1) return predicted.item() # Example usage image_path = args.image_path image = Image.open(image_path) predicted_class = run_inference(image, model) print(f"Predicted class: {predicted_class}")

Model Inference

EasyOCR (Optical Character Recognition) on Bacalhau

Introduction

TL;DR

Running Easy OCR Locally​

Running Inference on Dolly 2.0 Model with Hugging Face

Introduction​

Running locally​

Prerequisites​

Installing dependencies​

Building the container (optional)​

Running Inference on Bacalhau​

Prerequisite​

Structure of the command​

Checking the State of your Jobs​

Viewing your Job Output​

Speech Recognition using Whisper

Introduction

TL;DR

Prerequisite

Stable Diffusion on a GPU

TL;DR

Prerequisite

Quick Test

Stable Diffusion on a CPU

Introduction

TL;DR

Development

Prerequisites

Object Detection with YOLOv5 on Bacalhau

Introduction

TL;DR

Prerequisite

Test Run with Sample Data

Generate Realistic Images using StyleGAN3 and Bacalhau

Introduction

TL;DR

Prerequisite

Running StyleGAN3 locally

Support

Stable Diffusion Checkpoint Inference

Introduction

TL;DR

Prerequisite

Running Inference on a Model stored on S3

Introduction

Running Locally

Prerequisites

Downloading the Datasets

Creating the Inference Script

Support

Running Inference on Dolly 2.0 Model with Hugging Face

Introduction​

Running locally​

Prerequisites​

Installing dependencies​

Building the container (optional)​

Running Inference on Bacalhau​

Prerequisite​

Structure of the command​

Checking the State of your Jobs​

Viewing your Job Output​

Object Detection with YOLOv5 on Bacalhau

Introduction

TL;DR

Prerequisite

Test Run with Sample Data

Structure of the command

Declarative job description

Checking the State of your Jobs

Viewing Output

Using Custom Images as an Input

Support

Speech Recognition using Whisper

Introduction

TL;DR

Prerequisite

Running whisper locally

Create the script

Containerize Script using Docker

Running Easy OCR Locally

Introduction

Running locally

Prerequisites

Installing dependencies

Building the container (optional)

Running Inference on Bacalhau

Prerequisite

Structure of the command

Checking the State of your Jobs

Viewing your Job Output

Introduction

Running locally

Prerequisites

Installing dependencies

Building the container (optional)

Running Inference on Bacalhau

Prerequisite

Structure of the command

Checking the State of your Jobs

Viewing your Job Output