Video Processing
Parallel Video Resizing via File Sharding
Parallel Video Resizing via File Sharding
News & Blog
BlogGet Support
Request Enterprise SolutionsExpanso (2024). All Rights Reserved.
Many data engineering workloads consist of embarrassingly parallel workloads where you want to run a simple execution on a large number of files. In this example tutorial, we will run a simple video filter on a large number of video files.
Running video files with Bacalhau
To get started, you need to install the Bacalhau client, see more information here
To submit a workload to Bacalhau, we will use the bacalhau docker run
command.
The job has been submitted and Bacalhau has printed out the related job id. We store that in an environment variable so that we can reuse it later on.
The bacalhau docker run
command allows one to pass input data volume with a -i ipfs://CID:path
argument just like Docker, except the left-hand side of the argument is a content identifier (CID). This results in Bacalhau mounting a data volume inside the container. By default, Bacalhau mounts the input volume at the path /inputs
inside the container.
We created a 72px wide video thumbnails for all the videos in the inputs
directory. The outputs
directory will contain the thumbnails for each video. We will shard by 1 video per job, and use the linuxserver/ffmpeg
container to resize the videos.
:::tip Bacalhau overwrites the default entrypoint so we must run the full command after the --
argument. In this line you will list all of the mp4 files in the /inputs
directory and execute ffmpeg
against each instance. :::
Job status: You can check the status of the job using bacalhau list
.
When it says Published
or Completed
, that means the job is done, and we can get the results.
Job information: You can find out more information about your job by using bacalhau describe
.
Job download: You can download your job results directly by using bacalhau get
. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.
After the download has finished you should see the following contents in the results directory.
To view the file, run the following command:
To view the videos, we will use glob to return all file paths that match a specific pattern.
<video src={require('./scaled_Bird_flying_over_the_lake.mp4').default} controls > Your browser does not support the video
element. <video src={require('./scaled_Calm_waves_on_a_rocky_sea_gulf.mp4').default} controls > Your browser does not support the video
element. <video src={require('./scaled_Prominent_Late_Gothic_styled_architecture.mp4').default} controls > Your browser does not support the video
element.
For questions, and feedback, please reach out in our forum