1 of 10

Model Inference

EasyOCR (Optical Character Recognition) on Bacalhau

Introduction

In this example tutorial, we use Bacalhau and Easy OCR to digitize paper records or for recognizing characters or extract text data from images stored on IPFS/Filecoin or on the web. EasyOCR is a ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc. With easy OCR you use the pre-trained models or use your own fine-tuned model.

TD:LR

Using Bacalhau and Easy OCR to extract text data from images stored on the web.

Prerequisite

To get started, you need to install the Bacalhau client, see more information here

Running Easy OCR Locally

Install the required dependencies

%%bash
pip install easyocr

Load the different example images

%%bash
npx degit JaidedAI/EasyOCR/examples -f

List all the images

%%bash
ls -l

total 3508
-rw-r--r-- 1 root root   59898 Jun 16 22:36 chinese.jpg
-rw-r--r-- 1 root root   97910 Jun 16 22:36 easyocr_framework.jpeg
-rw-r--r-- 1 root root 1740957 Jun 16 22:36 english.png
-rw-r--r-- 1 root root  487995 Jun 16 22:36 example2.png
-rw-r--r-- 1 root root  127454 Jun 16 22:36 example3.png
-rw-r--r-- 1 root root  488641 Jun 16 22:36 example.png
-rw-r--r-- 1 root root  168376 Jun 16 22:36 french.jpg
-rw-r--r-- 1 root root   42159 Jun 16 22:36 japanese.jpg
-rw-r--r-- 1 root root  225531 Jun 16 22:36 korean.png
drwxr-xr-x 1 root root    4096 Jun 15 13:37 sample_data
-rw-r--r-- 1 root root   82229 Jun 16 22:36 thai.jpg
-rw-r--r-- 1 root root   34706 Jun 16 22:36 width_ths.png

To displaying an image from the list

# show an image
import PIL
from PIL import ImageDraw
im = PIL.Image.open("thai.jpg")

Next, we create a reader to do OCR to get coordinates which represent a rectangle containing text and the text itself

# If you change to GPU instance, it will be faster. But CPU is enough.
# (by MENU > Runtime > Change runtime type > GPU, then redo from beginning )
import easyocr
reader = easyocr.Reader(['th','en'])
# Doing OCR. Get bounding boxes.
bounds = reader.readtext('thai.jpg')
bounds

Containerize your Script using Docker

:::tip You can skip this step and go straight to running a Bacalhau job :::

We will use the Dockerfile that is already created in the Easy OCR repo. Use the command below to clone the repo

git clone https://github.com/JaidedAI/EasyOCR
cd EasyOCR

Build the Container

The docker build command builds Docker images from a Dockerfile.

docker build -t hub-user/repo-name:tag .

Before running the command replace;

hub-user with your docker hub username, If you don’t have a docker hub account follow these instructions to create docker account, and use the username of the account you created
repo-name with the name of the container, you can name it anything you want
tag this is not required but you can use the latest tag

Push the container

Next, upload the image to the registry. This can be done by using the Docker hub username, repo name, or tag.

docker push <hub-user>/<repo-name>:<tag>

Running a Bacalhau Job to Generate Easy OCR output

After the repo image has been pushed to Docker Hub, we can now use the container for running on Bacalhau. To submit a job, run the following Bacalhau command:

%%bash --out job_id
bacalhau docker run \
-i ipfs://bafybeibvcllzpfviggluobcfassm3vy4x2a4yanfxtmn4ir7olyzfrgq64:/root/.EasyOCR/model/zh_sim_g2.pth  \
-i https://raw.githubusercontent.com/JaidedAI/EasyOCR/ae773d693c3f355aac2e58f0d8142c600172f016/examples/chinese.jpg \
--timeout 3600 \
--wait-timeout-secs 3600 \
--gpu 1  \
--id-only \
--wait \
jsacex/easyocr \
--  easyocr -l ch_sim  en -f ./inputs/chinese.jpg --detail=1 --gpu=True

Since the model and the image aren't present in the container we will mount the image from an URL and the model from IPFS. You can find models to download from here. You can choose the model you want to use in this case we will be using the zh_sim_g2 model

Structure of the command

-i ipfs://bafybeibvc......: Mounting the model from IPFS
-i https://raw.githubusercontent.com......... Mounting the Input Image from a URL
--gpu 1: Specifying the no of GPUs
jsacex/easyocr: Name of the Docker image

Breaking up the easyocr command

-- easyocr -l ch_sim en -f ./inputs/chinese.jpg --detail=1 --gpu=True

-l: the name of the model which is ch_sim
-f: path to the input Image or directory
--detail=1: level of detail
--gpu=True: we set this flag to true since we are running inference on a GPU, if you run this on a CPU you set this to false

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

Checking the State of your Jobs

Job status: You can check the status of the job using bacalhau list.

%%bash
bacalhau list --id-filter ${JOB_ID}

When it says Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau describe.

%%bash
bacalhau describe ${JOB_ID}

Job download: You can download your job results directly by using bacalhau get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.

%%bash
rm -rf results && mkdir -p results
bacalhau get $JOB_ID --output-dir results

After the download has finished you should see the following contents in the results directory

Viewing your Job Output

To view the file, run the following command:

%%bash
ls results/ # list the contents of the current directory
cat results/stdout # displays the contents of the current directory

Running Inference on Dolly 2.0 Model with Hugging Face

Introduction

Dolly 2.0, the groundbreaking, open-source, instruction-following Large Language Model (LLM) that has been fine-tuned on a human-generated instruction dataset, licensed for both research and commercial purposes. Developed using the EleutherAI Pythia model family, this 12-billion-parameter language model is built exclusively on a high-quality, human-generated instruction following dataset, contributed by Databricks employees.

Dolly 2.0 package is open source, including the training code, dataset, and model weights, all available for commercial use. This unprecedented move empowers organizations to create, own, and customize robust LLMs capable of engaging in human-like interactions, without the need for API access fees or sharing data with third parties.

Running locally

Prerequisites

A NVIDIA GPU
Python

Installing dependencies

%%bash
pip -q install git+https://github.com/huggingface/transformers # need to install from github
pip -q install accelerate>=0.12.0

Creating the inference script

%%writefile inference.py
import argparse
import torch
from transformers import pipeline

def main(prompt_string, model_version):

    # use dolly-v2-12b if you're using Colab Pro+, using pythia-2.8b for Free Colab
    generate_text = pipeline(model=model_version,
                            torch_dtype=torch.bfloat16,
                            trust_remote_code=True,
                            device_map="auto")

    print(generate_text(prompt_string))

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--prompt", type=str, required=True, help="The prompt to be used in the GPT model")
    parser.add_argument("--model_version", type=str, default="./databricks/dolly-v2-12b", help="The model version to be used")
    args = parser.parse_args()
    main(args.prompt, args.model_version)

Building the container (optional)

Prerequisite

Install Docker on your local machine.
Sign up for a DockerHub account if you don't already have one. Steps

Step 1: Create a Dockerfile Create a new file named Dockerfile in your project directory with the following content:

FROM huggingface/transformers-pytorch-deepspeed-nightly-gpu

RUN apt-get update -y

RUN pip -q install git+https://github.com/huggingface/transformers

RUN pip -q install accelerate>=0.12.0

WORKDIR /

# COPY ./dolly_inference.py .

This Dockerfile sets up a container with the necessary dependencies and installs the Segment Anything Model from its GitHub repository.

Step 2: Build the Docker Image In your terminal, navigate to the directory containing the Dockerfile and run the following command to build the Docker image:

docker build -t your-dockerhub-username/sam:lite .

Replace your-dockerhub-username with your actual DockerHub username. This command will build the Docker image and tag it with your DockerHub username and the name "sam".

Step 3: Push the Docker Image to DockerHub Once the build process is complete, Next, push the Docker image to DockerHub using the following command:

docker push your-dockerhub-username/sam:lite

Again, replace your-dockerhub-username with your actual DockerHub username. This command will push the Docker image to your DockerHub repository.

Running Inference on Bacalhau

Prerequisite

To get started, you need to install the Bacalhau client, see more information here

Structure of the command

bacalhau docker run \
--gpu 1 \
-w /inputs \
-i gitlfs://huggingface.co/databricks/dolly-v2-3b.git \
-i https://gist.githubusercontent.com/js-ts/d35e2caa98b1c9a8f176b0b877e0c892/raw/3f020a6e789ceef0274c28fc522ebf91059a09a9/inference.py \
jsacex/dolly_inference:latest \
 -- python inference.py --prompt "Where is Earth located ?" --model_version "./databricks/dolly-v2-3b"

docker run: Docker command to run a container from a specified image.
--gpu 1: Flag to specify the number of GPUs to use for the execution. In this case, 1 GPU will be used.
-w /inputs: Flag to set the working directory inside the container to /inputs.
-i gitlfs://huggingface.co/databricks/dolly-v2-3b.git: Flag to clone the Dolly V2-3B model from Hugging Face's repository using Git LFS. The files will be mounted to /inputs/databricks/dolly-v2-3b.
-i https://gist.githubusercontent.com/js-ts/d35e2caa98b1c9a8f176b0b877e0c892/raw/3f020a6e789ceef0274c28fc522ebf91059a09a9/inference.py: Flag to download the inference.py script from the provided URL. The file will be mounted to /inputs/inference.py.
jsacex/dolly_inference:latest: The name and the tag of the Docker image.
The command to run inference on the model: python inference.py --prompt "Where is Earth located ?" --model_version "./databricks/dolly-v2-3b".
- inference.py: The Python script that runs the inference process using the Dolly V2-3B model.
- --prompt "Where is Earth located ?": Specifies the text prompt to be used for the inference.
- --model_version "./databricks/dolly-v2-3b": Specifies the path to the Dolly V2-3B model. In this case, the model files are mounted to /inputs/databricks/dolly-v2-3b.

Speech Recognition using Whisper

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise, and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing. In this example, we will transcribe an audio clip locally, containerize the script and then run the container on Bacalhau.

The advantage of using Bacalhau over managed Automatic Speech Recognition services is that you can run your own containers which can scale to do batch process petabytes of videos or audio for automatic speech recognition

TD:LR

Using OpenAI whisper with Bacalhau to process audio files

Prerequisite

To get started, you need to install:

Bacalhau client, see more information here
Whisper,
pytorch
pandas

Running whisper locally

%%bash
pip install git+https://github.com/openai/whisper.git
pip install torch==1.10.1
pip install pandas
sudo apt update && sudo apt install ffmpeg

Before we create and run the script we need a sample audio file to test the code for that we download a sample audio clip.

%%bash
wget https://github.com/js-ts/hello/raw/main/hello.mp3

Create the script

We will create a script that accepts parameters (input file path, output file path, temperature, etc.) and set the default parameters. Also:

If the input file is in mp4 format, then the script converts it to wav format.
Save the transcript in various formats,
We load the large model
Then pass it the required parameters. This model is not only limited to English and transcription, it supports other languages and also does translation, into the following languages:

Next, let's create a openai-whisper script:

%%writefile openai-whisper.py
import argparse
import os
import sys
import warnings
import whisper
from pathlib import Path
import subprocess
import torch
import shutil
import numpy as np
parser = argparse.ArgumentParser(description="OpenAI Whisper Automatic Speech Recognition")
parser.add_argument("-l",dest="audiolanguage", type=str,help="Language spoken in the audio, use Auto detection to let Whisper detect the language. Select from the following languages['Auto detection', 'Afrikaans', 'Albanian', 'Amharic', 'Arabic', 'Armenian', 'Assamese', 'Azerbaijani', 'Bashkir', 'Basque', 'Belarusian', 'Bengali', 'Bosnian', 'Breton', 'Bulgarian', 'Burmese', 'Castilian', 'Catalan', 'Chinese', 'Croatian', 'Czech', 'Danish', 'Dutch', 'English', 'Estonian', 'Faroese', 'Finnish', 'Flemish', 'French', 'Galician', 'Georgian', 'German', 'Greek', 'Gujarati', 'Haitian', 'Haitian Creole', 'Hausa', 'Hawaiian', 'Hebrew', 'Hindi', 'Hungarian', 'Icelandic', 'Indonesian', 'Italian', 'Japanese', 'Javanese', 'Kannada', 'Kazakh', 'Khmer', 'Korean', 'Lao', 'Latin', 'Latvian', 'Letzeburgesch', 'Lingala', 'Lithuanian', 'Luxembourgish', 'Macedonian', 'Malagasy', 'Malay', 'Malayalam', 'Maltese', 'Maori', 'Marathi', 'Moldavian', 'Moldovan', 'Mongolian', 'Myanmar', 'Nepali', 'Norwegian', 'Nynorsk', 'Occitan', 'Panjabi', 'Pashto', 'Persian', 'Polish', 'Portuguese', 'Punjabi', 'Pushto', 'Romanian', 'Russian', 'Sanskrit', 'Serbian', 'Shona', 'Sindhi', 'Sinhala', 'Sinhalese', 'Slovak', 'Slovenian', 'Somali', 'Spanish', 'Sundanese', 'Swahili', 'Swedish', 'Tagalog', 'Tajik', 'Tamil', 'Tatar', 'Telugu', 'Thai', 'Tibetan', 'Turkish', 'Turkmen', 'Ukrainian', 'Urdu', 'Uzbek', 'Valencian', 'Vietnamese', 'Welsh', 'Yiddish', 'Yoruba'] ",default="English")
parser.add_argument("-p",dest="inputpath", type=str,help="Path of the input file",default="/hello.mp3")
parser.add_argument("-v",dest="typeverbose", type=str,help="Whether to print out the progress and debug messages. ['Live transcription', 'Progress bar', 'None']",default="Live transcription")
parser.add_argument("-g",dest="outputtype", type=str,help="Type of file to generate to record the transcription. ['All', '.txt', '.vtt', '.srt']",default="All")
parser.add_argument("-s",dest="speechtask", type=str,help="Whether to perform X->X speech recognition (`transcribe`) or X->English translation (`translate`). ['transcribe', 'translate']",default="transcribe")
parser.add_argument("-n",dest="numSteps", type=int,help="Number of Steps",default=50)
parser.add_argument("-t",dest="decodingtemperature", type=int,help="Temperature to increase when falling back when the decoding fails to meet either of the thresholds below.",default=0.15 )
parser.add_argument("-b",dest="beamsize", type=int,help="Number of Images",default=5)
parser.add_argument("-o",dest="output", type=str,help="Output Folder where to store the outputs",default="")

args=parser.parse_args()
device = torch.device('cuda:0')
print('Using device:', device, file=sys.stderr)

Model = 'large'
whisper_model =whisper.load_model(Model)
video_path_local = os.getcwd()+args.inputpath
file_name=os.path.basename(video_path_local)
output_file_path=args.output

if os.path.splitext(video_path_local)[1] == ".mp4":
    video_path_local_wav =os.path.splitext(file_name)[0]+".wav"
    result  = subprocess.run(["ffmpeg", "-i", str(video_path_local), "-vn", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", str(video_path_local_wav)])

# add language parameters
# Language spoken in the audio, use Auto detection to let Whisper detect the language.
#  ['Auto detection', 'Afrikaans', 'Albanian', 'Amharic', 'Arabic', 'Armenian', 'Assamese', 'Azerbaijani', 'Bashkir', 'Basque', 'Belarusian', 'Bengali', 'Bosnian', 'Breton', 'Bulgarian', 'Burmese', 'Castilian', 'Catalan', 'Chinese', 'Croatian', 'Czech', 'Danish', 'Dutch', 'English', 'Estonian', 'Faroese', 'Finnish', 'Flemish', 'French', 'Galician', 'Georgian', 'German', 'Greek', 'Gujarati', 'Haitian', 'Haitian Creole', 'Hausa', 'Hawaiian', 'Hebrew', 'Hindi', 'Hungarian', 'Icelandic', 'Indonesian', 'Italian', 'Japanese', 'Javanese', 'Kannada', 'Kazakh', 'Khmer', 'Korean', 'Lao', 'Latin', 'Latvian', 'Letzeburgesch', 'Lingala', 'Lithuanian', 'Luxembourgish', 'Macedonian', 'Malagasy', 'Malay', 'Malayalam', 'Maltese', 'Maori', 'Marathi', 'Moldavian', 'Moldovan', 'Mongolian', 'Myanmar', 'Nepali', 'Norwegian', 'Nynorsk', 'Occitan', 'Panjabi', 'Pashto', 'Persian', 'Polish', 'Portuguese', 'Punjabi', 'Pushto', 'Romanian', 'Russian', 'Sanskrit', 'Serbian', 'Shona', 'Sindhi', 'Sinhala', 'Sinhalese', 'Slovak', 'Slovenian', 'Somali', 'Spanish', 'Sundanese', 'Swahili', 'Swedish', 'Tagalog', 'Tajik', 'Tamil', 'Tatar', 'Telugu', 'Thai', 'Tibetan', 'Turkish', 'Turkmen', 'Ukrainian', 'Urdu', 'Uzbek', 'Valencian', 'Vietnamese', 'Welsh', 'Yiddish', 'Yoruba']
language = args.audiolanguage
# Whether to print out the progress and debug messages.
# ['Live transcription', 'Progress bar', 'None']
verbose = args.typeverbose
#  Type of file to generate to record the transcription.
# ['All', '.txt', '.vtt', '.srt']
output_type = args.outputtype
# Whether to perform X->X speech recognition (`transcribe`) or X->English translation (`translate`).
# ['transcribe', 'translate']
task = args.speechtask
# Temperature to use for sampling.
temperature = args.decodingtemperature
#  Temperature to increase when falling back when the decoding fails to meet either of the thresholds below.
temperature_increment_on_fallback = 0.2
#  Number of candidates when sampling with non-zero temperature.
best_of = 5
#  Number of beams in beam search, only applicable when temperature is zero.
beam_size = args.beamsize
# Optional patience value to use in beam decoding, as in [*Beam Decoding with Controlled Patience*](https://arxiv.org/abs/2204.05424), the default (1.0) is equivalent to conventional beam search.
patience = 1.0
# Optional token length penalty coefficient (alpha) as in [*Google's Neural Machine Translation System*](https://arxiv.org/abs/1609.08144), set to negative value to uses simple length normalization.
length_penalty = -0.05
# Comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations.
suppress_tokens = "-1"
# Optional text to provide as a prompt for the first window.
initial_prompt = ""
# if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop.
condition_on_previous_text = True
#  whether to perform inference in fp16.
fp16 = True
#  If the gzip compression ratio is higher than this value, treat the decoding as failed.
compression_ratio_threshold = 2.4
# If the average log probability is lower than this value, treat the decoding as failed.
logprob_threshold = -1.0
# If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence.
no_speech_threshold = 0.6

verbose_lut = {
    'Live transcription': True,
    'Progress bar': False,
    'None': None
}

args = dict(
    language = (None if language == "Auto detection" else language),
    verbose = verbose_lut[verbose],
    task = task,
    temperature = temperature,
    temperature_increment_on_fallback = temperature_increment_on_fallback,
    best_of = best_of,
    beam_size = beam_size,
    patience=patience,
    length_penalty=(length_penalty if length_penalty>=0.0 else None),
    suppress_tokens=suppress_tokens,
    initial_prompt=(None if not initial_prompt else initial_prompt),
    condition_on_previous_text=condition_on_previous_text,
    fp16=fp16,
    compression_ratio_threshold=compression_ratio_threshold,
    logprob_threshold=logprob_threshold,
    no_speech_threshold=no_speech_threshold
)

temperature = args.pop("temperature")
temperature_increment_on_fallback = args.pop("temperature_increment_on_fallback")
if temperature_increment_on_fallback is not None:
    temperature = tuple(np.arange(temperature, 1.0 + 1e-6, temperature_increment_on_fallback))
else:
    temperature = [temperature]

if Model.endswith(".en") and args["language"] not in {"en", "English"}:
    warnings.warn(f"{Model} is an English-only model but receipted '{args['language']}'; using English instead.")
    args["language"] = "en"

video_transcription = whisper.transcribe(
    whisper_model,
    str(video_path_local),
    temperature=temperature,
    **args,
)

# Save output
writing_lut = {
    '.txt': whisper.utils.write_txt,
    '.vtt': whisper.utils.write_vtt,
    '.srt': whisper.utils.write_txt,
}

if output_type == "All":
    for suffix, write_suffix in writing_lut.items():
        transcript_local_path =os.getcwd()+output_file_path+'/'+os.path.splitext(file_name)[0] +suffix
        with open(transcript_local_path, "w", encoding="utf-8") as f:
            write_suffix(video_transcription["segments"], file=f)
        try:
            transcript_drive_path =file_name
        except:
            print(f"**Transcript file created: {transcript_local_path}**")
else:
    transcript_local_path =output_file_path+'/'+os.path.splitext(file_name)[0] +output_type

    with open(transcript_local_path, "w", encoding="utf-8") as f:
        writing_lut[output_type](video_transcription["segments"], file=f)

Let's run the script with the default parameters:

%%bash
python openai-whisper.py

Viewing the outputs

%%bash
cat hello.srt

Containerize Script using Docker

To build your own docker container, create a Dockerfile, which contains instructions on how the image will be built, and what extra requirements will be included.

FROM  pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime

WORKDIR /

RUN apt-get -y update

RUN apt-get -y install git

RUN python3 -m pip install --upgrade pip

RUN python -m pip install regex tqdm Pillow

RUN pip install git+https://github.com/openai/whisper.git

ADD hello.mp3 hello.mp3

ADD openai-whisper.py openai-whisper.py

RUN python openai-whisper.py

We choose pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime as our base image

And then install all the dependencies, after that we will add the test audio file and our openai-whisper script to the container, we will also run a test command to check whether our script works inside the container and if the container builds successfully

:::info See more information on how to containerize your script/app here :::

Build the container

We will run docker build command to build the container;

docker build -t <hub-user>/<repo-name>:<tag> .

Before running the command replace;

hub-user with your docker hub username, If you don’t have a docker hub account follow these instructions to create a Docker account, and use the username of the account you created
repo-name with the name of the container, you can name it anything you want
tag this is not required but you can use the latest tag

In our case

docker build -t jsacex/whisper

Push the container

Next, upload the image to the registry. This can be done by using the Docker hub username, repo name or tag.

docker push <hub-user>/<repo-name>:<tag>

In our case

docker push jsacex/whisper

Running a Bacalhau Job

We will transcribe the moon landing video, which can be found here: https://www.nasa.gov/multimedia/hd/apollo11_hdpage.html

Since the downloaded video is in mov format we convert the video to mp4 format and then upload it to our public storage in this case IPFS. We will be using NFT.Storage (Recommended Option). To upload your dataset using NFTup just drag and drop your directory it will upload it to IPFS

After the dataset has been uploaded, copy the CID:

bafybeielf6z4cd2nuey5arckect5bjmelhouvn5rhbjlvpvhp7erkrc4nu

To submit a job, run the following Bacalhau command:

%%bash --out job_id
bacalhau docker run \
    --id-only \
    --gpu 1 \
    --timeout 3600 \
    --wait-timeout-secs 3600 \
    jsacex/whisper \
    -i ipfs://bafybeielf6z4cd2nuey5arckect5bjmelhouvn5rhbjlvpvhp7erkrc4nu \
    -- python openai-whisper.py -p inputs/Apollo_11_moonwalk_montage_720p.mp4 -o outputs

%env JOB_ID={job_id}

Structure of the command

Let's look closely at the command above:

-i ipfs://bafybeielf6z4cd2nuey5arckect5bjmelhouvn5r: flag to mount the CID which contains our file to the container at the path /inputs
-p inputs/Apollo_11_moonwalk_montage_720p.mp4 : the input path of our file
-o outputs: the path where to store the outputs
--gpu : here we request 1 GPU
jsacex/whisper: the name and the tag of the docker image we are using

Checking the State of your Jobs

Job status: You can check the status of the job using bacalhau list.

%%bash
bacalhau list --id-filter ${JOB_ID} --wide

When it says Published or Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau describe.

%%bash
bacalhau describe ${JOB_ID}

Job download: You can download your job results directly by using bacalhau get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.

%%bash
rm -rf results && mkdir -p results
bacalhau get $JOB_ID --output-dir results

Viewing your Job Output

To view the file, run the following command:

%%bash
cat results/outputs/Apollo_11_moonwalk_montage_720p.vtt

Stable Diffusion on a GPU

This example tutorial demonstrates how to use stable diffusion on a GPU and run it on the Bacalhau network.Stable Diffusion is a state of the art text-to-image model that generates images from text and was developed as an open-source alternative to DALL·E 2. It is based on a Diffusion Probabilistic Model and uses a Transformer to generate images from text.

TD;LR

bacalhau docker run --id-only --gpu 1 ghcr.io/bacalhau-project/examples/stable-diffusion-gpu:0.0.1 -- python main.py --o ./outputs --p "meme about tensorflow"

Prerequisite

To get started, you need to install the Bacalhau client, see more information here

Quick Test

Here is an example of an image generated by this model.

bacalhau docker run --gpu 1 ghcr.io/bacalhau-project/examples/stable-diffusion-gpu:0.0.1 -- python main.py --o ./outputs --p "cod swimming through data"

Development

This stable diffusion example is based on the Keras/Tensorflow implementation. You might also be interested in the Pytorch oriented diffusers library.

Installing dependencies

:::info When you run this code for the first time, it will download the pre-trained weights, which may add a short delay. :::

Based on the requirements here, we will install the following:

%%bash
pip install git+https://github.com/fchollet/stable-diffusion-tensorflow --upgrade --quiet
pip install tensorflow tensorflow_addons ftfy --upgrade --quiet
pip install tqdm
apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2

Testing the Code

We have a sample code from this the Stable Diffusion in TensorFlow/Keras repo which we will use to check if the code is working as expected. Our output for this code will be a DSLR photograph of an astronaut riding a horse.

:::tip When you run this code for the first time, it will download the pre-trained weights, which may add a short delay. :::

from stable_diffusion_tf.stable_diffusion import Text2Image
from PIL import Image

generator = Text2Image(
    img_height=512,
    img_width=512,
    jit_compile=False,  # You can try True as well (different performance profile)
)
img = generator.generate(
    "DSLR photograph of an astronaut riding a horse",
    num_steps=50,
    unconditional_guidance_scale=7.5,
    temperature=1,
    batch_size=1,
)
pil_img = Image.fromarray(img[0])
display(pil_img)

When running this code, if you check the GPU RAM usage, you'll see that it's sucked up many GBs, and depending on what GPU you're running, it may OOM (Out of memory) if you run this again.

You can try and reduce RAM usage by playing with batch sizes (although it is only set to 1 above!) or more carefully controlling the TensorFlow session.

To clear the GPU memory we will use numba. This won't be required when running in a single-shot manner.

%%bash
pip install numba

# clearing the GPU memory
from numba import cuda
device = cuda.get_current_device()
device.reset()

Write the Script

You need a script to execute when we submit jobs. The code below is a slightly modified version of the code we ran above which we got from here, however, this includes more things such as argument parsing argument parsing to be able to customize the generator.

%%writefile main.py
import argparse
from stable_diffusion_tf.stable_diffusion import Text2Image
from PIL import Image
import os
parser = argparse.ArgumentParser(description="Stable Diffusion")
parser.add_argument("--h",dest="height", type=int,help="height of the image",default=512)
parser.add_argument("--w",dest="width", type=int,help="width of the image",default=512)
parser.add_argument("--p",dest="prompt", type=str,help="Description of the image you want to generate",default="cat")
parser.add_argument("--n",dest="numSteps", type=int,help="Number of Steps",default=50)
parser.add_argument("--u",dest="unconditionalGuidanceScale", type=float,help="Number of Steps",default=7.5)
parser.add_argument("--t",dest="temperature", type=int,help="Number of Steps",default=1)
parser.add_argument("--b",dest="batchSize", type=int,help="Number of Images",default=1)
parser.add_argument("--o",dest="output", type=str,help="Output Folder where to store the Image",default="./")

args=parser.parse_args()
height=args.height
width=args.width
prompt=args.prompt
numSteps=args.numSteps
unconditionalGuidanceScale=args.unconditionalGuidanceScale
temperature=args.temperature
batchSize=args.batchSize
output=args.output

generator = Text2Image(
    img_height=height,
    img_width=width,
    jit_compile=False,  # You can try True as well (different performance profile)
)

img = generator.generate(
    prompt,
    num_steps=numSteps,
    unconditional_guidance_scale=unconditionalGuidanceScale,
    temperature=temperature,
    batch_size=batchSize,
)
for i in range(0,batchSize):
  pil_img = Image.fromarray(img[i])
  image = pil_img.save(f"{output}/image{i}.png")

:::info For a full list of arguments that you can pass to the script, see more information here :::

Run the Script

After writing the code the next step is to run the script.

%%bash
python main.py

Display the Output Image

import IPython.display as display
display.Image("image0.png")

The following presents additional parameters you can try:

python main.py --p "cat with three eyes - to set prompt
python main.py --p "cat with three eyes --n 100 - to set the number of iterations to 100
python stable-diffusion.py --p "cat with three eyes" --b 2 to set batch size to 2 (No of images to generate)

Containerize Script using Docker

Docker is the easiest way to run TensorFlow on a GPU since the host machine only requires the NVIDIA® driver. To containerize the inference code, we will create a Dockerfile. The Dockerfile is a text document that contains the commands that specify how the image will be built.

FROM tensorflow/tensorflow:2.10.0-gpu

RUN apt-get -y update

RUN apt-get -y install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2 git

RUN python3 -m pip install --upgrade pip

RUN python -m pip install regex tqdm Pillow tensorflow tensorflow_addons ftfy  --upgrade --quiet

RUN pip install git+https://github.com/fchollet/stable-diffusion-tensorflow --upgrade --quiet

ADD main.py main.py

# Run once so it downloads and caches the pre-trained weights
RUN python main.py --n 1

The Dockerfile leverages the latest official TensorFlow GPU image and then installs other dependencies like git, CUDA packages, and other image-related necessities. See the original repository for the expected requirements.

:::info See more information on how to containerize your script/apphere :::

Build the container

We will run docker build command to build the container;

docker build -t <hub-user>/<repo-name>:<tag> .

Before running the command replace;

hub-user with your docker hub username, If you don’t have a docker hub account follow these instructions to create a Docker account, and use the username of the account you created
repo-name with the name of the container, you can name it anything you want
tag this is not required but you can use the latest tag

In our case

docker build -t ghcr.io/bacalhau-project/examples/stable-diffusion-gpu:0.0.1 .

Push the container

Next, upload the image to the registry. This can be done by using the Docker hub username, repo name or tag.

docker push <hub-user>/<repo-name>:<tag>

In our case

docker push ghcr.io/bacalhau-project/examples/stable-diffusion-gpu:0.0.1 .

Running a Bacalhau Job

To submit a job, run the following Bacalhau command:

%%bash --out job_id
bacalhau docker run --id-only --gpu 1 ghcr.io/bacalhau-project/examples/stable-diffusion-gpu:0.0.1 -- python main.py --o ./outputs --p "meme about tensorflow"

Structure of the command

Let's look closely at the command above:

bacalhau docker run: call to bacalhau
--gpu 1: No of GPUs
ghcr.io/bacalhau-project/examples/stable-diffusion-gpu:0.0.1: the name and the tag of the docker image we are using
../outputs: path to the output
python main.py: execute the script

The Bacalhau command passes a prompt to the model and generates an image in the outputs directory. The main difference in the example below compared to all the other examples is the addition of the --gpu X flag, which tells Bacalhau to only schedule the job on nodes that have X GPUs free. You can read more about GPU support in the documentation.

:::tip This will take about 5 minutes to complete and is mainly due to the cold-start GPU setup time. This is faster than the CPU version, but you might still want to grab some fruit or plan your lunchtime run.

Furthermore, the container itself is about 10GB, so it might take a while to download on the node if it isn't cached. :::

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

%env JOB_ID={job_id}

Checking the State of your Jobs

Job status: You can check the status of the job using bacalhau list.

%%bash
bacalhau list --id-filter ${JOB_ID}

When it says Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau describe.

%%bash
bacalhau describe ${JOB_ID}

Job download: You can download your job results directly by using bacalhau get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.

%%bash
rm -rf results && mkdir -p results
bacalhau get $JOB_ID --output-dir results

After the download has finished you should see the following contents in results directory

Viewing your Job Output

To view the file, run the following command:

%%bash
ls results/outputs

Display image

To display and view your image run the code below:

import IPython.display as display
display.Image("results/outputs/image0.png")

Stable Diffusion on a CPU

Stable Diffusion is a state of the art text-to-image model that generates images from text and was developed as an open-source alternative to DALL·E 2. It is based on a Diffusion Probabilistic Model and uses a Transformer to generate images from text.

This example demonstrates how to use stable diffusion on a CPU and run it on the Bacalhau network. The first section describes the development of the code and the container. The section demonstrates how to run the job using Bacalhau.

The following image is an example generated by this model.

TL;DR

bacalhau docker run ghcr.io/bacalhau-project/examples/stable-diffusion-cpu:0.0.1 -- python demo.py --prompt "cod in space" --output ../outputs/cod.png

1. Development

The original text-to-image stable diffusion model was trained on a fleet of GPU machines, at great cost. To use this trained model for inference, you also need to run it on a GPU.

However, this isn't always desired or possible. One alternative is to use a project called OpenVINO from Intel that allows you to convert and optimize models from a variety of frameworks (and ONNX if your framework isn't directly supported) to run on a supported Intel CPU. This is what we will do in this example.

:::tip Heads up! This example takes about 10 minutes to generate an image on an average CPU. Whilst this demonstrates it is possible, it might not be practical. :::

Prerequisites

In order to run this example you need:

A Debian-flavoured Linux (although you might be able to get it working on M1 macs)
Docker

Converting Stable Diffusion to a CPU Model Using OpenVINO

The first step is to convert the trained stable diffusion models so that they work efficiently on a CPU using OpenVINO. The example is quite complex, so we have created a separate repository (which is a fork from Github user Sergei Belousov) to host the code. In summary, the code:

Downloads a pre-optimized OpenVINO version of ...
the original pre-trained stable diffusion model ...
which also leverages OpenAI's CLIP transformer ...
and is then wrapped inside an OpenVINO runtime, which reads in and executes the model.

The core code representing these tasks can be found in the stable_diffusion_engine.py file. This is a mashup that creates a pipeline necessary to tokenize the text and run the stable diffusion model. This boilerplate could be simplified by leveraging the more recent version of the diffusers library. But let's crack on.

Install Dependencies

Note that these dependencies are only known to work on Ubuntu-based x64 machines.

%%bash
sudo apt-get update
sudo apt-get install -y libgl1 libglib2.0-0 git-lfs

Clone the Repository and Dependencies

The following commands clone the example repository, and other required repositories, and install the Python dependencies.

%%bash
git clone https://github.com/js-ts/stable_diffusion.openvino
cd stable_diffusion.openvino
git lfs install
git clone https://huggingface.co/openai/clip-vit-large-patch14
git clone https://huggingface.co/bes-dev/stable-diffusion-v1-4-openvino
pip3 install -r requirements.txt

Generating an Image

Now that we have all the dependencies installed, we can call the demo.py wrapper, which is a simple CLI, to generate an image from a prompt.

!cd stable_diffusion.openvino && \
  python3 demo.py --prompt "hello" --output hello.png

import IPython.display as display
display.Image("stable_diffusion.openvino/hello.png")

!cd stable_diffusion.openvino && \
  python3 demo.py --prompt "cat driving a car" --output cat.png

import IPython.display as display
display.Image("stable_diffusion.openvino/cat.png")

2. Running Stable Diffusion (CPU) on Bacalhau

Now we have a working example, we can convert it into a format that allows us to perform inference in a distributed environment.

First we will create a Dockerfile to containerize the inference code. The Dockerfile can be found in the repository, but is presented here to aid understanding.

FROM python:3.9.9-bullseye

WORKDIR /src

RUN apt-get update && \
    apt-get install -y \
    libgl1 libglib2.0-0 git-lfs

RUN git lfs install

COPY requirements.txt /src/

RUN pip3 install -r requirements.txt

COPY stable_diffusion_engine.py demo.py demo_web.py /src/
COPY data/ /src/data/

RUN git clone https://huggingface.co/openai/clip-vit-large-patch14
RUN git clone https://huggingface.co/bes-dev/stable-diffusion-v1-4-openvino

# download models
RUN python3 demo.py --num-inference-steps 1 --prompt "test" --output /tmp/test.jpg

This container is using the python:3.9.9-bullseye image and the working directory is set. Next, the Dockerfile installs the same dependencies from earlier in this notebook. Then we add our custom code and pull the dependent repositories.

We've already pushed this image to GHCR, but for posterity, you'd use a command like this to update it:

docker buildx build --platform linux/amd64 --push -t ghcr.io/bacalhau-project/examples/stable-diffusion-cpu:0.0.1 .

Prerequisites

To run this example you will need:

Bacalhau installed and running

!command -v bacalhau >/dev/null 2>&1 || (export BACALHAU_INSTALL_DIR=.; curl -sL https://get.bacalhau.org/install.sh | bash)
path=!echo $PATH
%env PATH=./:{path[0]}

Generating an Image Using Stable Diffusion on Bacalhau

Bacalhau is a distributed computing platform that allows you to run jobs on a network of computers. It is designed to be easy to use and to run on a variety of hardware. In this example, we will use it to run the stable diffusion model on a CPU.

To submit a job, you can use the Bacalhau CLI. The following command passes a prompt to the model and generates an image in the outputs directory.

:::tip

This will take about 10 minutes to complete. Go grab a coffee. Or a beer. Or both. If you want to block and wait for the job to complete, add the --wait flag.

Furthermore, the container itself is about 15GB, so it might take a while to download on the node if it isn't cached.

:::

%%bash --out job_id
bacalhau docker run ghcr.io/bacalhau-project/examples/stable-diffusion-cpu:0.0.1 --id-only -- python demo.py --prompt "First Humans On Mars" --output ../outputs/mars.png

%env JOB_ID={job_id}

Running the commands will output a UUID that represents the job that was created. You can check the status of the job with the following command:

%%bash
bacalhau list --id-filter ${JOB_ID}

Wait until it says Completed and then get the results.

To find out more information about your job, run the following command:

%%bash
bacalhau describe ${JOB_ID}

If you see that the job has completed and there are no errors, then you can download the results with the following command:

%%bash
rm -rf results && mkdir -p results
bacalhau get $JOB_ID --output-dir results

After the download has finished you should see the following contents in the results directory:

%%bash
ls results/outputs

mars.png

import IPython.display as display
display.Image("results/outputs/mars.png")

Object Detection with YOLOv5 on Bacalhau

The identification and localization of objects in images and videos is a computer vision task called object detection. Several algorithms have emerged in the past few years to tackle the problem. One of the most popular algorithms to date for real-time object detection is YOLO (You Only Look Once), initially proposed by Redmond et al.[1]

Traditionally, models like YOLO required enormous amounts of training data to yield reasonable results. People might not have access to such high-quality labeled data. Thankfully, open-source communities and researchers have made it possible to utilize pre-trained models to perform inference. In other words, you can use models that have already been trained on large datasets to perform object detection on your own data.

In this tutorial you will perform an end-to-end object detection inference, using the YOLOv5 Docker Image developed by Ultralytics.

TD;LR

Performing object detection inference using Yolov5 and Bacalhau

Prerequisite

To get started, you need to install the Bacalhau client, see more information here

Running Object Detection Jobs on Bacalhau

Bacalhau is a highly scalable decentralized computing platform and is well suited to running massive object detection jobs. In this example, you can take advantage of the GPUs available on the Bacalhau network.

Test Run with Sample Data

To get started, let's run a test job with a small sample dataset that is included in the YOLOv5 Docker Image. This will give you a chance to familiarise yourself with the process of running a job on Bacalhau.

In addition to the usual Bacalhau flags, you will also see:

--gpu 1 to specify the use of a GPU

:::tip Remember that Bacalhau does not provide any network connectivity when running a job. All assets must be provided at submission time. :::

The model requires pre-trained weights to run and by default downloads them from within the container. Bacalhau jobs don't have network access so we will pass in the weights at submission time, saving them to /usr/src/app/yolov5s.pt. You may also provide your own weights here.

The container has its own options that we must specify:

--input to select which pre-trained weights you desire with details on the yolov5 release page
--project specifies the output volume that the model will save its results to. Bacalhau defaults to using /outputs as the output directory, so we save it there.

For more container flags refer to the yolov5/detect.py file in the YOLO repository.

One final additional hack that we have to do is move the weights file to a location with the standard name. As of writing this, Bacalhau downloads the file to a UUID-named file, which the model is not expecting. This is because GitHub 302 redirects the request to a random file in its backend.

%%bash --out job_id
bacalhau docker run \
--gpu 1 \
--timeout 3600 \
--wait-timeout-secs 3600 \
--wait \
--id-only \
--input https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s.pt \
ultralytics/yolov5:v6.2 \
-- /bin/bash -c 'find /inputs -type f -exec cp {} /outputs/yolov5s.pt \; ; python detect.py --weights /outputs/yolov5s.pt --source $(pwd)/data/images --project /outputs'

This should output a UUID (like 59c59bfb-4ef8-45ac-9f4b-f0e9afd26e70). This is the ID of the job that was created. You can check the status of the job with the following command:

Checking the State of your Jobs

Job status: You can check the status of the job using bacalhau list.

%%bash
bacalhau list --id-filter ${JOB_ID}

When it says Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau describe.

%%bash
bacalhau describe ${JOB_ID}

Job download: You can download your job results directly by using bacalhau get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.

%%bash
rm -rf results && mkdir results
bacalhau get ${JOB_ID} --output-dir results

Viewing Output

After the download has finished we can see the results:

import IPython.display as display
display.Image("results/outputs/exp/bus.jpg")
display.Image("results/outputs/exp/zidane.jpg")

Using custom Images as an input

Now let's use some custom images. First, you will need to ingest your images onto IPFS/Filecoin. For more information about how to do that see the data ingestion section.

This example will use the Cyclist Dataset for Object Detection | Kaggle dataset.

We have already uploaded this dataset to Filecoin under the CID: bafybeicyuddgg4iliqzkx57twgshjluo2jtmlovovlx5lmgp5uoh3zrvpm. You can browse to this dataset via a HTTP IPFS proxy.

Let's run a the same job again, but this time use the images above.

%%bash --out job_id
bacalhau docker run \
--gpu 1 \
--timeout 3600 \
--wait-timeout-secs 3600 \
--wait \
--id-only \
--input https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s.pt \
--input ipfs://bafybeicyuddgg4iliqzkx57twgshjluo2jtmlovovlx5lmgp5uoh3zrvpm:/datasets \
ultralytics/yolov5:v6.2 \
-- /bin/bash -c 'find /inputs -type f -exec cp {} /outputs/yolov5s.pt \; ; python detect.py --weights /outputs/yolov5s.pt --source /datasets --project /outputs'

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

Checking the State of your Jobs

Job status: You can check the status of the job using bacalhau list.

%%bash
bacalhau list --id-filter ${JOB_ID}

Job download: You can download your job results directly by using bacalhau get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.

%%bash
rm -rf custom-results && mkdir custom-results
bacalhau get ${JOB_ID} --output-dir custom-results

Viewing Job Output

import glob
from IPython.display import Image, display
for file in glob.glob('custom-results/outputs/exp/*.jpg'):
    display(Image(filename=file))

Generate Realistic Images using StyleGAN3 and Bacalhau

In this example tutorial, we will show you how to generate realistic images with StyleGAN3 and Bacalhau. StyleGAN is based on Generative Adversarial Networks (GANs), which include a generator and discriminator network that has been trained to differentiate images generated by the generator from real images. However, during the training, the generator tries to fool the discriminator, which results in the generation of realistic-looking images. With StyleGAN3 we can generate realistic-looking images or videos. It can generate not only human faces but also animals, cars, and landscapes.

TD;LR

Generative images with Bacalhau

Prerequisite

To get started, you need to install the Bacalhau client, see more information here

Running StyleGAN3 locally

To run StyleGAN3 locally, you'll need to clone the repo, install dependencies and download the model weights.

%%bash
git clone https://github.com/NVlabs/stylegan3
cd stylegan3
conda env create -f environment.yml
conda activate stylegan3
wget https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-afhqv2-512x512.pkl

Generate an image using a pre-trained AFHQv2 model

Viewing the output image

Containerize Script with Docker

To build your own docker container, create a Dockerfile, which contains instructions to build your image.

COPY . /scratch

WORKDIR /scratch

ENV HOME /scratch

:::info See more information on how to containerize your script/apphere :::

Build the container

We will run docker build command to build the container;

docker build -t <hub-user>/<repo-name>:<tag> .

Before running the command replace;

hub-user with your docker hub username, If you don’t have a docker hub account follow these instructions to create docker account, and use the username of the account you created
repo-name with the name of the container, you can name it anything you want
tag this is not required but you can use the latest tag

In our case

docker build -t jsacex/stylegan3

Push the container

Next, upload the image to the registry. This can be done by using the Docker hub username, repo name or tag.

docker push <hub-user>/<repo-name>:<tag>

In our case

docker push jsacex/stylegan3

Running a Bacalhau Job

To submit a job, run the following Bacalhau command:

%%bash --out job_id
bacalhau docker run \
--wait \
--id-only \
--gpu 1 \
--timeout 3600 \
--wait-timeout-secs 3600 \
jsacex/stylegan3 \
-- python gen_images.py --outdir=../outputs --trunc=1 --seeds=2 --network=stylegan3-r-afhqv2-512x512.pkl

Structure of the command

Let's look closely at the command above:

bacalhau docker run: call to bacalhau
--gpu 1: No of GPUs
jsacex/stylegan3: the name and the tag of the docker image we are using
../outputs: path to the output
python gen_images.py: execute the script
--trunc=1 --seeds=2 --network=stylegan3-r-afhqv2-512x512.pkl: The animation length is either determined based on the --seeds value or explicitly specified using the --num-keyframes option. When num keyframes are specified with --num-keyframes, the output video length will be 'num_keyframes*w_frames' frames.

Render a latent vector interpolation video

You can also run variations of this command to generate videos and other things. In the following command below, we will render a latent vector interpolation video. This will render a 4x2 grid of interpolations for seeds 0 through 31.

%%bash --out job_id
bacalhau docker run \
jsacex/stylegan3 \
--gpu 1 \
--timeout 3600 \
--wait-timeout-secs 3600 \
-- python gen_video.py --output=../outputs/lerp.mp4 --trunc=1 --seeds=0-31 --grid=4x2 --network=stylegan3-r-afhqv2-512x512.pkl

Structure of the command

Let's look closely at the command above:

bacalhau docker run: call to bacalhau
--gpu 1: No of GPUs
jsacex/stylegan3: the name and the tag of the docker image we are using
../outputs: path to the output
python gen_images.py: execute the script
--trunc=1 --seeds=2 --network=stylegan3-r-afhqv2-512x512.pkl: The animation length is either determined based on the --seeds value or explicitly specified using the --num-keyframes option. When num keyframes is specified with --num-keyframes, the output video length will be 'num_keyframes_w_frames' frames. If --num-keyframes is not specified, the number of seeds given with --seeds must be divisible by grid size W_H (--grid). In this case, the output video length will be '# seeds/(w*h)*w_frames' frames.

When a job is submitted, Bacalhau prints out the related job_id. We store that in an environment variable so that we can reuse it later on.

%env JOB_ID={job_id}

Checking the State of your Jobs

Job status: You can check the status of the job using bacalhau list.

%%bash
bacalhau list --id-filter ${JOB_ID} --wide

When it says Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau describe.

%%bash
bacalhau describe ${JOB_ID}

Job download: You can download your job results directly by using bacalhau get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.

%%bash
rm -rf results && mkdir -p results
bacalhau get $JOB_ID --output-dir results

After the download has finished you should see the following contents in the results directory

Viewing your Job Output

To view the file, run the following command:

%%bash
ls results/outputs

import IPython.display as display
display.Image("results/outputs/seed0002.png")

Stable Diffusion Checkpoint Inference

Running Inference on a Model stored on S3

In this example, we will demonstrate how to run inference on a model stored on Amazon S3. We will use a PyTorch model trained on the MNIST dataset.

Running Locally

Prerequisites

Python
PyTorch

Downloading the Datasets

%%bash
wget https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/model/pytorch-training-2020-11-21-22-02-56-203/model.tar.gz
wget https://raw.githubusercontent.com/js-ts/mnist-test/main/digit.png

Creating the Inference Script

This script is designed to load a pretrained PyTorch model for MNIST digit classification from a tar.gz file, extract it, and use the model to perform inference on a given input image.

%%writefile inference.py
import torch
import torchvision.transforms as transforms
from PIL import Image
from torch.autograd import Variable
import argparse
import tarfile

class CustomModel(torch.nn.Module):
    def __init__(self):
        super(CustomModel, self).__init__()
        self.conv1 = torch.nn.Conv2d(1, 10, 5)
        self.conv2 = torch.nn.Conv2d(10, 20, 5)
        self.fc1 = torch.nn.Linear(320, 50)
        self.fc2 = torch.nn.Linear(50, 10)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.max_pool2d(x, 2)
        x = torch.relu(self.conv2(x))
        x = torch.max_pool2d(x, 2)
        x = torch.flatten(x, 1)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        output = torch.log_softmax(x, dim=1)
        return output

def extract_tar_gz(file_path, output_dir):
    with tarfile.open(file_path, 'r:gz') as tar:
        tar.extractall(path=output_dir)

# Parse command-line arguments
parser = argparse.ArgumentParser()
parser.add_argument('--tar_gz_file_path', type=str, required=True, help='Path to the tar.gz file')
parser.add_argument('--output_directory', type=str, required=True, help='Output directory to extract the tar.gz file')
parser.add_argument('--image_path', type=str, required=True, help='Path to the input image file')
args = parser.parse_args()

# Extract the tar.gz file
tar_gz_file_path = args.tar_gz_file_path
output_directory = args.output_directory
extract_tar_gz(tar_gz_file_path, output_directory)

# Load the model
model_path = f"{output_directory}/model.pth"
model = CustomModel()
model.load_state_dict(torch.load(model_path, map_location=torch.device("cpu")))
model.eval()

# Transformations for the MNIST dataset
transform = transforms.Compose([
    transforms.Resize((28, 28)),
    transforms.Grayscale(num_output_channels=1),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,)),
])

# Function to run inference on an image
def run_inference(image, model):
    image_tensor = transform(image).unsqueeze(0)  # Apply transformations and add batch dimension
    input = Variable(image_tensor)

    # Perform inference
    output = model(input)
    _, predicted = torch.max(output.data, 1)
    return predicted.item()

# Example usage
image_path = args.image_path
image = Image.open(image_path)
predicted_class = run_inference(image, model)
print(f"Predicted class: {predicted_class}")

Running the inference script

To use this script, you need to provide the paths to the tar.gz file containing the pre-trained model, the output directory where the model will be extracted, and the input image file for which you want to perform inference. The script will output the predicted digit (class) for the given input image.

%%bash
python inference.py --tar_gz_file_path ./model.tar.gz --output_directory ./model --image_path ./digit.png

Running inference on Bacalhau

Prerequisite

To get started, you need to install the Bacalhau client, see more information here

%%bash --out job_id
bacalhau docker run \
--wait \
--id-only \
--timeout 3600 \
--wait-timeout-secs 3600 \
-w /inputs \
-i src=s3://sagemaker-sample-files/datasets/image/MNIST/model/pytorch-training-2020-11-21-22-02-56-203/model.tar.gz,dst=/model/,opt=region=us-east-1 \
-i git://github.com/js-ts/mnist-test.git \
pytorch/pytorch \
 -- python /inputs/js-ts/mnist-test/inference.py --tar_gz_file_path /model/model.tar.gz --output_directory /model-pth --image_path /inputs/js-ts/mnist-test/image.png

Structure of the command

-w /inputs Setting the current working directory at /inputs in the container
-i src=s3://sagemaker-sample-files/datasets/image/MNIST/model/pytorch-training-2020-11-21-22-02-56-203/model.tar.gz,dst=/model/,opt=region=us-east-1: Mounting the s3 bucket at the destination path provided /model/ and specifying the region where the bucket is located opt=region=us-east-1
-i git://github.com/js-ts/mnist-test.git: Flag to mount the source code repo from GitHub. It would mount the repo at /inputs/js-ts/mnist-test in this case it also contains the test image.
pytorch/pytorch: The name of the Docker image.
-- python /inputs/js-ts/mnist-test/inference.py --tar_gz_file_path /model/model.tar.gz --output_directory /model-pth --image_path /inputs/js-ts/mnist-test/image.png: The command to run inference on the model.
- /model/model.tar.gz is the path to the model file.
- /model-pth is the output directory for the model.
- /inputs/js-ts/mnist-test/image.png is the path to the input image.

The job has been submitted and Bacalhau has printed out the related job id. We store that in an environment variable so that we can reuse it later on.

Viewing the output

%%bash
bacalhau logs ${JOB_ID}

Speech Recognition using Whisper

TD:LR

Using OpenAI whisper with Bacalhau to process audio files

Prerequisite

To get started, you need to install:

Bacalhau client, see more information here
Whisper,
pytorch
pandas

Running whisper locally

%%bash
pip install git+https://github.com/openai/whisper.git
pip install torch==1.10.1
pip install pandas
sudo apt update && sudo apt install ffmpeg

Before we create and run the script we need a sample audio file to test the code for that we download a sample audio clip.

%%bash
wget https://github.com/js-ts/hello/raw/main/hello.mp3

Create the script

We will create a script that accepts parameters (input file path, output file path, temperature, etc.) and set the default parameters. Also:

If the input file is in mp4 format, then the script converts it to wav format.
Save the transcript in various formats,
We load the large model
Then pass it the required parameters. This model is not only limited to English and transcription, it supports other languages and also does translation, into the following languages:

Next, let's create a openai-whisper script:

%%writefile openai-whisper.py
import argparse
import os
import sys
import warnings
import whisper
from pathlib import Path
import subprocess
import torch
import shutil
import numpy as np
parser = argparse.ArgumentParser(description="OpenAI Whisper Automatic Speech Recognition")
parser.add_argument("-l",dest="audiolanguage", type=str,help="Language spoken in the audio, use Auto detection to let Whisper detect the language. Select from the following languages['Auto detection', 'Afrikaans', 'Albanian', 'Amharic', 'Arabic', 'Armenian', 'Assamese', 'Azerbaijani', 'Bashkir', 'Basque', 'Belarusian', 'Bengali', 'Bosnian', 'Breton', 'Bulgarian', 'Burmese', 'Castilian', 'Catalan', 'Chinese', 'Croatian', 'Czech', 'Danish', 'Dutch', 'English', 'Estonian', 'Faroese', 'Finnish', 'Flemish', 'French', 'Galician', 'Georgian', 'German', 'Greek', 'Gujarati', 'Haitian', 'Haitian Creole', 'Hausa', 'Hawaiian', 'Hebrew', 'Hindi', 'Hungarian', 'Icelandic', 'Indonesian', 'Italian', 'Japanese', 'Javanese', 'Kannada', 'Kazakh', 'Khmer', 'Korean', 'Lao', 'Latin', 'Latvian', 'Letzeburgesch', 'Lingala', 'Lithuanian', 'Luxembourgish', 'Macedonian', 'Malagasy', 'Malay', 'Malayalam', 'Maltese', 'Maori', 'Marathi', 'Moldavian', 'Moldovan', 'Mongolian', 'Myanmar', 'Nepali', 'Norwegian', 'Nynorsk', 'Occitan', 'Panjabi', 'Pashto', 'Persian', 'Polish', 'Portuguese', 'Punjabi', 'Pushto', 'Romanian', 'Russian', 'Sanskrit', 'Serbian', 'Shona', 'Sindhi', 'Sinhala', 'Sinhalese', 'Slovak', 'Slovenian', 'Somali', 'Spanish', 'Sundanese', 'Swahili', 'Swedish', 'Tagalog', 'Tajik', 'Tamil', 'Tatar', 'Telugu', 'Thai', 'Tibetan', 'Turkish', 'Turkmen', 'Ukrainian', 'Urdu', 'Uzbek', 'Valencian', 'Vietnamese', 'Welsh', 'Yiddish', 'Yoruba'] ",default="English")
parser.add_argument("-p",dest="inputpath", type=str,help="Path of the input file",default="/hello.mp3")
parser.add_argument("-v",dest="typeverbose", type=str,help="Whether to print out the progress and debug messages. ['Live transcription', 'Progress bar', 'None']",default="Live transcription")
parser.add_argument("-g",dest="outputtype", type=str,help="Type of file to generate to record the transcription. ['All', '.txt', '.vtt', '.srt']",default="All")
parser.add_argument("-s",dest="speechtask", type=str,help="Whether to perform X->X speech recognition (`transcribe`) or X->English translation (`translate`). ['transcribe', 'translate']",default="transcribe")
parser.add_argument("-n",dest="numSteps", type=int,help="Number of Steps",default=50)
parser.add_argument("-t",dest="decodingtemperature", type=int,help="Temperature to increase when falling back when the decoding fails to meet either of the thresholds below.",default=0.15 )
parser.add_argument("-b",dest="beamsize", type=int,help="Number of Images",default=5)
parser.add_argument("-o",dest="output", type=str,help="Output Folder where to store the outputs",default="")

args=parser.parse_args()
device = torch.device('cuda:0')
print('Using device:', device, file=sys.stderr)

Model = 'large'
whisper_model =whisper.load_model(Model)
video_path_local = os.getcwd()+args.inputpath
file_name=os.path.basename(video_path_local)
output_file_path=args.output

if os.path.splitext(video_path_local)[1] == ".mp4":
    video_path_local_wav =os.path.splitext(file_name)[0]+".wav"
    result  = subprocess.run(["ffmpeg", "-i", str(video_path_local), "-vn", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", str(video_path_local_wav)])

# add language parameters
# Language spoken in the audio, use Auto detection to let Whisper detect the language.
#  ['Auto detection', 'Afrikaans', 'Albanian', 'Amharic', 'Arabic', 'Armenian', 'Assamese', 'Azerbaijani', 'Bashkir', 'Basque', 'Belarusian', 'Bengali', 'Bosnian', 'Breton', 'Bulgarian', 'Burmese', 'Castilian', 'Catalan', 'Chinese', 'Croatian', 'Czech', 'Danish', 'Dutch', 'English', 'Estonian', 'Faroese', 'Finnish', 'Flemish', 'French', 'Galician', 'Georgian', 'German', 'Greek', 'Gujarati', 'Haitian', 'Haitian Creole', 'Hausa', 'Hawaiian', 'Hebrew', 'Hindi', 'Hungarian', 'Icelandic', 'Indonesian', 'Italian', 'Japanese', 'Javanese', 'Kannada', 'Kazakh', 'Khmer', 'Korean', 'Lao', 'Latin', 'Latvian', 'Letzeburgesch', 'Lingala', 'Lithuanian', 'Luxembourgish', 'Macedonian', 'Malagasy', 'Malay', 'Malayalam', 'Maltese', 'Maori', 'Marathi', 'Moldavian', 'Moldovan', 'Mongolian', 'Myanmar', 'Nepali', 'Norwegian', 'Nynorsk', 'Occitan', 'Panjabi', 'Pashto', 'Persian', 'Polish', 'Portuguese', 'Punjabi', 'Pushto', 'Romanian', 'Russian', 'Sanskrit', 'Serbian', 'Shona', 'Sindhi', 'Sinhala', 'Sinhalese', 'Slovak', 'Slovenian', 'Somali', 'Spanish', 'Sundanese', 'Swahili', 'Swedish', 'Tagalog', 'Tajik', 'Tamil', 'Tatar', 'Telugu', 'Thai', 'Tibetan', 'Turkish', 'Turkmen', 'Ukrainian', 'Urdu', 'Uzbek', 'Valencian', 'Vietnamese', 'Welsh', 'Yiddish', 'Yoruba']
language = args.audiolanguage
# Whether to print out the progress and debug messages.
# ['Live transcription', 'Progress bar', 'None']
verbose = args.typeverbose
#  Type of file to generate to record the transcription.
# ['All', '.txt', '.vtt', '.srt']
output_type = args.outputtype
# Whether to perform X->X speech recognition (`transcribe`) or X->English translation (`translate`).
# ['transcribe', 'translate']
task = args.speechtask
# Temperature to use for sampling.
temperature = args.decodingtemperature
#  Temperature to increase when falling back when the decoding fails to meet either of the thresholds below.
temperature_increment_on_fallback = 0.2
#  Number of candidates when sampling with non-zero temperature.
best_of = 5
#  Number of beams in beam search, only applicable when temperature is zero.
beam_size = args.beamsize
# Optional patience value to use in beam decoding, as in [*Beam Decoding with Controlled Patience*](https://arxiv.org/abs/2204.05424), the default (1.0) is equivalent to conventional beam search.
patience = 1.0
# Optional token length penalty coefficient (alpha) as in [*Google's Neural Machine Translation System*](https://arxiv.org/abs/1609.08144), set to negative value to uses simple length normalization.
length_penalty = -0.05
# Comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations.
suppress_tokens = "-1"
# Optional text to provide as a prompt for the first window.
initial_prompt = ""
# if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop.
condition_on_previous_text = True
#  whether to perform inference in fp16.
fp16 = True
#  If the gzip compression ratio is higher than this value, treat the decoding as failed.
compression_ratio_threshold = 2.4
# If the average log probability is lower than this value, treat the decoding as failed.
logprob_threshold = -1.0
# If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence.
no_speech_threshold = 0.6

verbose_lut = {
    'Live transcription': True,
    'Progress bar': False,
    'None': None
}

args = dict(
    language = (None if language == "Auto detection" else language),
    verbose = verbose_lut[verbose],
    task = task,
    temperature = temperature,
    temperature_increment_on_fallback = temperature_increment_on_fallback,
    best_of = best_of,
    beam_size = beam_size,
    patience=patience,
    length_penalty=(length_penalty if length_penalty>=0.0 else None),
    suppress_tokens=suppress_tokens,
    initial_prompt=(None if not initial_prompt else initial_prompt),
    condition_on_previous_text=condition_on_previous_text,
    fp16=fp16,
    compression_ratio_threshold=compression_ratio_threshold,
    logprob_threshold=logprob_threshold,
    no_speech_threshold=no_speech_threshold
)

temperature = args.pop("temperature")
temperature_increment_on_fallback = args.pop("temperature_increment_on_fallback")
if temperature_increment_on_fallback is not None:
    temperature = tuple(np.arange(temperature, 1.0 + 1e-6, temperature_increment_on_fallback))
else:
    temperature = [temperature]

if Model.endswith(".en") and args["language"] not in {"en", "English"}:
    warnings.warn(f"{Model} is an English-only model but receipted '{args['language']}'; using English instead.")
    args["language"] = "en"

video_transcription = whisper.transcribe(
    whisper_model,
    str(video_path_local),
    temperature=temperature,
    **args,
)

# Save output
writing_lut = {
    '.txt': whisper.utils.write_txt,
    '.vtt': whisper.utils.write_vtt,
    '.srt': whisper.utils.write_txt,
}

if output_type == "All":
    for suffix, write_suffix in writing_lut.items():
        transcript_local_path =os.getcwd()+output_file_path+'/'+os.path.splitext(file_name)[0] +suffix
        with open(transcript_local_path, "w", encoding="utf-8") as f:
            write_suffix(video_transcription["segments"], file=f)
        try:
            transcript_drive_path =file_name
        except:
            print(f"**Transcript file created: {transcript_local_path}**")
else:
    transcript_local_path =output_file_path+'/'+os.path.splitext(file_name)[0] +output_type

    with open(transcript_local_path, "w", encoding="utf-8") as f:
        writing_lut[output_type](video_transcription["segments"], file=f)

Let's run the script with the default parameters:

%%bash
python openai-whisper.py

Viewing the outputs

%%bash
cat hello.srt

Containerize Script using Docker

To build your own docker container, create a Dockerfile, which contains instructions on how the image will be built, and what extra requirements will be included.

FROM  pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime

WORKDIR /

RUN apt-get -y update

RUN apt-get -y install git

RUN python3 -m pip install --upgrade pip

RUN python -m pip install regex tqdm Pillow

RUN pip install git+https://github.com/openai/whisper.git

ADD hello.mp3 hello.mp3

ADD openai-whisper.py openai-whisper.py

RUN python openai-whisper.py

We choose pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime as our base image

:::info See more information on how to containerize your script/app here :::

Build the container

We will run docker build command to build the container;

docker build -t <hub-user>/<repo-name>:<tag> .

Before running the command replace;

hub-user with your docker hub username, If you don’t have a docker hub account follow these instructions to create a Docker account, and use the username of the account you created
repo-name with the name of the container, you can name it anything you want
tag this is not required but you can use the latest tag

In our case

docker build -t jsacex/whisper

Push the container

Next, upload the image to the registry. This can be done by using the Docker hub username, repo name or tag.

docker push <hub-user>/<repo-name>:<tag>

In our case

docker push jsacex/whisper

Running a Bacalhau Job

We will transcribe the moon landing video, which can be found here: https://www.nasa.gov/multimedia/hd/apollo11_hdpage.html

After the dataset has been uploaded, copy the CID:

bafybeielf6z4cd2nuey5arckect5bjmelhouvn5rhbjlvpvhp7erkrc4nu

To submit a job, run the following Bacalhau command:

%%bash --out job_id
bacalhau docker run \
    --id-only \
    --gpu 1 \
    --timeout 3600 \
    --wait-timeout-secs 3600 \
    jsacex/whisper \
    -i ipfs://bafybeielf6z4cd2nuey5arckect5bjmelhouvn5rhbjlvpvhp7erkrc4nu \
    -- python openai-whisper.py -p inputs/Apollo_11_moonwalk_montage_720p.mp4 -o outputs

%env JOB_ID={job_id}

Structure of the command

Let's look closely at the command above:

-i ipfs://bafybeielf6z4cd2nuey5arckect5bjmelhouvn5r: flag to mount the CID which contains our file to the container at the path /inputs
-p inputs/Apollo_11_moonwalk_montage_720p.mp4 : the input path of our file
-o outputs: the path where to store the outputs
--gpu : here we request 1 GPU
jsacex/whisper: the name and the tag of the docker image we are using

Checking the State of your Jobs

Job status: You can check the status of the job using bacalhau list.

%%bash
bacalhau list --id-filter ${JOB_ID} --wide

When it says Published or Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau describe.

%%bash
bacalhau describe ${JOB_ID}

Job download: You can download your job results directly by using bacalhau get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.

%%bash
rm -rf results && mkdir -p results
bacalhau get $JOB_ID --output-dir results

Viewing your Job Output

To view the file, run the following command:

%%bash
cat results/outputs/Apollo_11_moonwalk_montage_720p.vtt

Model Inference

EasyOCR (Optical Character Recognition) on Bacalhau

Introduction

TD:LR

Prerequisite

Running Easy OCR Locally​

Containerize your Script using Docker

Build the Container

Push the container

Running a Bacalhau Job to Generate Easy OCR output

Structure of the command

Checking the State of your Jobs

Viewing your Job Output

Running Inference on Dolly 2.0 Model with Hugging Face

Introduction

Running locally

Prerequisites

Installing dependencies

Building the container (optional)

Prerequisite

Running Inference on Bacalhau

Prerequisite

Structure of the command

Speech Recognition using Whisper

TD:LR

Prerequisite

Running whisper locally

Create the script

Containerize Script using Docker

Build the container

Push the container

Running a Bacalhau Job

Structure of the command

Checking the State of your Jobs

Viewing your Job Output

Stable Diffusion on a GPU

TD;LR

Prerequisite

Quick Test

Development

Installing dependencies

Testing the Code

Write the Script

Run the Script

Display the Output Image

Containerize Script using Docker

Build the container

Push the container

Running a Bacalhau Job

Structure of the command

Checking the State of your Jobs

Viewing your Job Output

Display image

Stable Diffusion on a CPU

TL;DR

1. Development

Prerequisites

Converting Stable Diffusion to a CPU Model Using OpenVINO

Install Dependencies

Clone the Repository and Dependencies

Generating an Image

2. Running Stable Diffusion (CPU) on Bacalhau

Prerequisites

Generating an Image Using Stable Diffusion on Bacalhau

Object Detection with YOLOv5 on Bacalhau

TD;LR

Prerequisite

Running Object Detection Jobs on Bacalhau

Test Run with Sample Data

Checking the State of your Jobs

Viewing Output

Using custom Images as an input

Checking the State of your Jobs

Viewing Job Output

Generate Realistic Images using StyleGAN3 and Bacalhau

TD;LR

Prerequisite

Running StyleGAN3 locally

Containerize Script with Docker

Build the container

Running Easy OCR Locally

Running Easy OCR Locally