Simple Image Processing
How to process images stored in IPFS with Bacalhau
In this example tutorial, we will show you how to use Bacalhau to process images on a Landsat dataset.
Bacalhau has the unique capability of operating at a massive scale in a distributed environment. This is made possible because data is naturally sharded across the IPFS network amongst many providers. We can take advantage of this to process images in parallel.
TD;LR
Processing of images from a dataset using Bacalhau
Prerequisite
To get started, you need to install the Bacalhau client, see more information here
Running a Bacalhau Job
To submit a workload to Bacalhau, we will use the bacalhau docker run
command.
The job has been submitted and Bacalhau has printed out the related job id. We store that in an environment variable so that we can reuse it later on.
The bacalhau docker run
command allows to pass input data volume with a -i ipfs://CID:path
argument just like Docker, except the left-hand side of the argument is a content identifier (CID). This results in Bacalhau mounting a data volume inside the container. By default, Bacalhau mounts the input volume at the path /inputs
inside the container.
Bacalhau also mounts a data volume to store output data. The bacalhau docker run
command creates an output data volume mounted at /outputs
. This is a convenient location to store the results of your job.
Checking the State of your Jobs
Job status: You can check the status of the job using
bacalhau list
.
When it says Published
or Completed
, that means the job is done, and we can get the results.
Job information: You can find out more information about your job by using
bacalhau describe
.
Job download: You can download your job results directly by using
bacalhau get
. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.
After the download has finished you should see the following contents in results directory.
Viewing your Job Output
To view the file, run the following command:
Display the image
To view the images, we will use glob to return all file paths that match a specific pattern.
Need Support?
For questions, feedback, please reach out in our forum