Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise, and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing. In this example, we will transcribe an audio clip locally, containerize the script and then run the container on Bacalhau.
The advantage of using Bacalhau over managed Automatic Speech Recognition services is that you can run your own containers which can scale to do batch process petabytes of videos or audio for automatic speech recognition
Using OpenAI whisper with Bacalhau to process audio files
To get started, you need to install:
Bacalhau client, see more information here
Whisper,
pytorch
pandas
Before we create and run the script we need a sample audio file to test the code for that we download a sample audio clip.
We will create a script that accepts parameters (input file path, output file path, temperature, etc.) and set the default parameters. Also:
If the input file is in mp4 format, then the script converts it to wav format.
Save the transcript in various formats,
We load the large model
Then pass it the required parameters. This model is not only limited to English and transcription, it supports other languages and also does translation, into the following languages:
Next, let's create a openai-whisper script:
Let's run the script with the default parameters:
Viewing the outputs
To build your own docker container, create a Dockerfile
, which contains instructions on how the image will be built, and what extra requirements will be included.
We choose pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
as our base image
And then install all the dependencies, after that we will add the test audio file and our openai-whisper script to the container, we will also run a test command to check whether our script works inside the container and if the container builds successfully
:::info See more information on how to containerize your script/app here :::
We will run docker build
command to build the container;
Before running the command replace;
hub-user with your docker hub username, If you don’t have a docker hub account follow these instructions to create a Docker account, and use the username of the account you created
repo-name with the name of the container, you can name it anything you want
tag this is not required but you can use the latest tag
In our case
Next, upload the image to the registry. This can be done by using the Docker hub username, repo name or tag.
In our case
We will transcribe the moon landing video, which can be found here: https://www.nasa.gov/multimedia/hd/apollo11_hdpage.html
Since the downloaded video is in mov format we convert the video to mp4 format and then upload it to our public storage in this case IPFS. We will be using NFT.Storage (Recommended Option). To upload your dataset using NFTup just drag and drop your directory it will upload it to IPFS
After the dataset has been uploaded, copy the CID:
bafybeielf6z4cd2nuey5arckect5bjmelhouvn5rhbjlvpvhp7erkrc4nu
To submit a job, run the following Bacalhau command:
Let's look closely at the command above:
-i ipfs://bafybeielf6z4cd2nuey5arckect5bjmelhouvn5r
: flag to mount the CID which contains our file to the container at the path /inputs
-p inputs/Apollo_11_moonwalk_montage_720p.mp4
: the input path of our file
-o outputs
: the path where to store the outputs
--gpu
: here we request 1 GPU
jsacex/whisper
: the name and the tag of the docker image we are using
Job status: You can check the status of the job using bacalhau list
.
When it says Published
or Completed
, that means the job is done, and we can get the results.
Job information: You can find out more information about your job by using bacalhau describe
.
Job download: You can download your job results directly by using bacalhau get
. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.
To view the file, run the following command: