1 of 1

Speech Recognition using Whisper

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise, and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing. In this example, we will transcribe an audio clip locally, containerize the script and then run the container on Bacalhau.

The advantage of using Bacalhau over managed Automatic Speech Recognition services is that you can run your own containers which can scale to do batch process petabytes of videos or audio for automatic speech recognition

TD:LR

Using OpenAI whisper with Bacalhau to process audio files

Prerequisite

To get started, you need to install:

Bacalhau client, see more information here
Whisper,
pytorch
pandas

Running whisper locally

Before we create and run the script we need a sample audio file to test the code for that we download a sample audio clip.

Create the script

We will create a script that accepts parameters (input file path, output file path, temperature, etc.) and set the default parameters. Also:

If the input file is in mp4 format, then the script converts it to wav format.
Save the transcript in various formats,
We load the large model
Then pass it the required parameters. This model is not only limited to English and transcription, it supports other languages and also does translation, into the following languages:

Next, let's create a openai-whisper script:

Let's run the script with the default parameters:

Viewing the outputs

Containerize Script using Docker

To build your own docker container, create a Dockerfile, which contains instructions on how the image will be built, and what extra requirements will be included.

We choose pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime as our base image

And then install all the dependencies, after that we will add the test audio file and our openai-whisper script to the container, we will also run a test command to check whether our script works inside the container and if the container builds successfully

:::info See more information on how to containerize your script/app here :::

Build the container

We will run docker build command to build the container;

Before running the command replace;

hub-user with your docker hub username, If you don’t have a docker hub account follow these instructions to create a Docker account, and use the username of the account you created
repo-name with the name of the container, you can name it anything you want
tag this is not required but you can use the latest tag

In our case

Push the container

Next, upload the image to the registry. This can be done by using the Docker hub username, repo name or tag.

In our case

Running a Bacalhau Job

We will transcribe the moon landing video, which can be found here: https://www.nasa.gov/multimedia/hd/apollo11_hdpage.html

Since the downloaded video is in mov format we convert the video to mp4 format and then upload it to our public storage in this case IPFS. We will be using NFT.Storage (Recommended Option). To upload your dataset using NFTup just drag and drop your directory it will upload it to IPFS

After the dataset has been uploaded, copy the CID:

bafybeielf6z4cd2nuey5arckect5bjmelhouvn5rhbjlvpvhp7erkrc4nu

To submit a job, run the following Bacalhau command:

Structure of the command

Let's look closely at the command above:

-i ipfs://bafybeielf6z4cd2nuey5arckect5bjmelhouvn5r: flag to mount the CID which contains our file to the container at the path /inputs
-p inputs/Apollo_11_moonwalk_montage_720p.mp4 : the input path of our file
-o outputs: the path where to store the outputs
--gpu : here we request 1 GPU
jsacex/whisper: the name and the tag of the docker image we are using

Checking the State of your Jobs

Job status: You can check the status of the job using bacalhau list.

When it says Published or Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau describe.

Job download: You can download your job results directly by using bacalhau get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.

Viewing your Job Output

To view the file, run the following command:

Speech Recognition using Whisper

TD:LR

Using OpenAI whisper with Bacalhau to process audio files

Prerequisite

To get started, you need to install:

Bacalhau client, see more information here
Whisper,
pytorch
pandas

Running whisper locally

Before we create and run the script we need a sample audio file to test the code for that we download a sample audio clip.

Create the script

We will create a script that accepts parameters (input file path, output file path, temperature, etc.) and set the default parameters. Also:

If the input file is in mp4 format, then the script converts it to wav format.
Save the transcript in various formats,
We load the large model
Then pass it the required parameters. This model is not only limited to English and transcription, it supports other languages and also does translation, into the following languages:

Next, let's create a openai-whisper script:

Let's run the script with the default parameters:

Viewing the outputs

Containerize Script using Docker

To build your own docker container, create a Dockerfile, which contains instructions on how the image will be built, and what extra requirements will be included.

We choose pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime as our base image

:::info See more information on how to containerize your script/app here :::

Build the container

We will run docker build command to build the container;

Before running the command replace;

hub-user with your docker hub username, If you don’t have a docker hub account follow these instructions to create a Docker account, and use the username of the account you created
repo-name with the name of the container, you can name it anything you want
tag this is not required but you can use the latest tag

In our case

Push the container

Next, upload the image to the registry. This can be done by using the Docker hub username, repo name or tag.

In our case

Running a Bacalhau Job

We will transcribe the moon landing video, which can be found here: https://www.nasa.gov/multimedia/hd/apollo11_hdpage.html

After the dataset has been uploaded, copy the CID:

bafybeielf6z4cd2nuey5arckect5bjmelhouvn5rhbjlvpvhp7erkrc4nu

To submit a job, run the following Bacalhau command:

Structure of the command

Let's look closely at the command above:

-i ipfs://bafybeielf6z4cd2nuey5arckect5bjmelhouvn5r: flag to mount the CID which contains our file to the container at the path /inputs
-p inputs/Apollo_11_moonwalk_montage_720p.mp4 : the input path of our file
-o outputs: the path where to store the outputs
--gpu : here we request 1 GPU
jsacex/whisper: the name and the tag of the docker image we are using

Checking the State of your Jobs

Job status: You can check the status of the job using bacalhau list.

When it says Published or Completed, that means the job is done, and we can get the results.

Job information: You can find out more information about your job by using bacalhau describe.

Job download: You can download your job results directly by using bacalhau get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.

Viewing your Job Output

To view the file, run the following command: