Skip to main content

Stable Diffusion on a GPU

Open In Colab Open In Binder

Stable Diffusion is a state of the art text-to-image model that generates images from text and was developed as an open source alternative to DALL·E 2. It is based on a Diffusion Probabilistic Model and uses a Transformer to generate images from text.

This example demonstrates how to use stable diffusion on a GPU and run it on the Bacalhau network. The first section describes the development of the code and the container. The section section demonstrates how to run the job using Bacalhau.

The following image is an example generated by this model.


bacalhau docker run --gpu 1 -- python --o ./outputs --p "cod swimming through data"

1. Development

This stable diffusion example is based on the Keras/Tensorflow implementation of the model available here. You might also be interested in the Pytorch oriented diffusers library.


In order to run this example you need:

  • A Debian-flavoured Linux (although you might be able to get it working on M1 macs)
  • Docker
  • A GPU -- this was developed against a Tesla T4
pip install git+ --upgrade --quiet
pip install tensorflow tensorflow_addons ftfy --upgrade --quiet
pip install tqdm
apt install --allow-change-held-packages libcudnn8=
Looking in indexes:,
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (4.64.1)
Reading package lists...
Building dependency tree...
Reading state information...
The following package was automatically installed and is no longer required:
Use 'apt autoremove' to remove it.
The following packages will be REMOVED:
The following held packages will be changed:
The following packages will be DOWNGRADED:
0 upgraded, 0 newly installed, 1 downgraded, 1 to remove and 20 not upgraded.
Need to get 430 MB of archives.
After this operation, 1,392 MB disk space will be freed.
Get:1 libcudnn8 [430 MB]
Fetched 430 MB in 6s (66.2 MB/s)
(Reading database ... 123941 files and directories currently installed.)
Removing libcudnn8-dev ( ...
update-alternatives: removing manually selected alternative - switching libcudnn to auto mode
dpkg: warning: downgrading libcudnn8 from to
(Reading database ... 123918 files and directories currently installed.)
Preparing to unpack .../libcudnn8_8.1.0.77-1+cuda11.2_amd64.deb ...
Unpacking libcudnn8 ( over ( ...
Setting up libcudnn8 ( ...

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Testing the Code

Quite often libraries aren't pinned, or code gets updated and breaks the docs, and even the simplest examples don't work. To derisk this, first I will try the code in the README to double check that the code is working as the author expected.


When you run this code for the first time, it will download the pretrained weights, which may add a short delay.

from stable_diffusion_tf.stable_diffusion import Text2Image
from PIL import Image

generator = Text2Image(
jit_compile=False, # You can try True as well (different performance profile)
img = generator.generate(
"DSLR photograph of an astronaut riding a horse",
pil_img = Image.fromarray(img[0])
Downloading data from
1356917/1356917 [==============================] - 0s 0us/step
Downloading data from
492456896/492456896 [==============================] - 3s 0us/step
Downloading data from
3439035312/3439035312 [==============================] - 36s 0us/step
Downloading data from
198152112/198152112 [==============================] - 1s 0us/step

0 1: 100%|██████████| 50/50 [01:10<00:00, 1.42s/it]


That's great, it works! But it's used up all the GPU RAM.


If you're interested, check the GPU RAM usage now you've run the code. You'll see that it's sucked up many GBs and depending on what GPU you're running, it may OOM if you run this again.

You can try and reduce RAM usage by playing with batch sizes (although it is only set to 1 above!) or more carefully controlling the tensorflow session.

For now, let's ignore this and clear the GPU memory with numba so it works again next time. This won't be required when running in a single-shot manner.

pip install numba
Looking in indexes:,
Requirement already satisfied: numba in /usr/local/lib/python3.7/dist-packages (0.56.3)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from numba) (4.13.0)
Requirement already satisfied: numpy<1.24,>=1.18 in /usr/local/lib/python3.7/dist-packages (from numba) (1.21.6)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from numba) (57.4.0)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.7/dist-packages (from numba) (0.39.1)
Requirement already satisfied: typing-extensions>=3.6.4 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->numba) (4.1.1)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->numba) (3.9.0)
# clearing the GPU memory 
from numba import cuda
device = cuda.get_current_device()

Prepare a Script

You need a script to execute when we submit our job. The code below is functionally the same as above, but it also includes argument parsing to be able to customize the generator. This code is a slightly modified version of the script in the original repository.

import argparse
from stable_diffusion_tf.stable_diffusion import Text2Image
from PIL import Image
import os
parser = argparse.ArgumentParser(description="Stable Diffusion")
parser.add_argument("--h",dest="height", type=int,help="height of the image",default=512)
parser.add_argument("--w",dest="width", type=int,help="width of the image",default=512)
parser.add_argument("--p",dest="prompt", type=str,help="Description of the image you want to generate",default="cat")
parser.add_argument("--n",dest="numSteps", type=int,help="Number of Steps",default=50)
parser.add_argument("--u",dest="unconditionalGuidanceScale", type=float,help="Number of Steps",default=7.5)
parser.add_argument("--t",dest="temperature", type=int,help="Number of Steps",default=1)
parser.add_argument("--b",dest="batchSize", type=int,help="Number of Images",default=1)
parser.add_argument("--o",dest="output", type=str,help="Output Folder where to store the Image",default="./")


generator = Text2Image(
jit_compile=False, # You can try True as well (different performance profile)

img = generator.generate(
for i in range(0,batchSize):
pil_img = Image.fromarray(img[i])
image ="{output}/image{i}.png")


You should test that your script works! Let's run it again.

Process is interrupted.

Viewing the outputted image

import IPython.display as display


For reference, here is a full list of arguments that you can pass to the script.

optional arguments:
-h, --help show this help message and exit
--h HEIGHT height of the image
--w WIDTH width of the image
--p PROMPT Description of the image you want to generate
--n NUMSTEPS Number of Steps
--t TEMPERATURE Temparature
--b BATCHSIZE Number of Images to generate
--o OUTPUT Output Folder where to store the Image

Further Examples

The following presents some examples that you can try.


python --p "cat with three eyes"

Number of iterations

python --p "cat with three eyes" --n 100

Batch Size (No of images to generate)

python --p "cat with three eyes" --b 2

2. Running Stable Diffusion on Bacalhau with a GPU

Now we have a working example, we can convert it into a format that allows us to perform inference in a distributed environment.

First we will create a Dockerfile to containerize the inference code.

FROM tensorflow/tensorflow:2.10.0-gpu

RUN apt-get -y update

RUN apt-get -y install --allow-change-held-packages libcudnn8= git

RUN python3 -m pip install --upgrade pip

RUN python -m pip install regex tqdm Pillow tensorflow tensorflow_addons ftfy --upgrade --quiet

RUN pip install git+ --upgrade --quiet


# Run once so it downloads and caches the pre-trained weights
RUN python --n 1

The dockerfile leverages the latest official tensorflow GPU image and then installs other dependencies like git, CUDA packages and other image related necessities. See the original repository for the expected requirements.


Note the last line, which runs the script once to trigger the download of the pretrained weights. This is necessary for Bacalhau, because Bacalhau jobs do not allow access to the internet.

Build the container in the usual way. Replace the org/repo with your own if you are pushing to a custom registry.

docker buildx build --platform linux/amd64 --push -t .
!command -v bacalhau >/dev/null 2>&1 || (export BACALHAU_INSTALL_DIR=.; curl -sL | bash)
path=!echo $PATH
%env PATH=./:{path[0]}
Your system is linux_amd64
No BACALHAU detected. Installing fresh BACALHAU CLI...
Getting the latest BACALHAU CLI...
Installing v0.3.3 BACALHAU CLI...
Downloading ...
Downloading sig file ...
Verified OK
Extracting tarball ...
NOT verifying Bin
bacalhau installed into . successfully.
Client Version: v0.3.3
Server Version: v0.3.3
env: PATH=./:/opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin

Generating an Image Using Stable Diffusion on a GPU using Bacalhau

Bacalhau is a distributed computing platform that allows you to run jobs on a network of computers. It is designed to be easy to use and to run on a variety of hardware. In this example, we will use it to run the stable diffusion model on a GPU.

To submit a job, you can use the Bacalhau CLI. The following command passes a prompt to the model and generates an image in the outputs directory.

The main difference in the example below compared to all the other examples is the addition of the --gpu X flag, which tells Bacalhau to only schedule the job on nodes that have X GPUs free. You can read more about GPU support in the documentation.


This will take about 5 minutes to complete and is mainly due to the cold-start GPU setup time. This is faster than the CPU version, but you might still want to grab some fruit or plan your lunchtime run.

Furthermore, the container itself is about 10GB, so it might take a while to download on the node if it isn't cached.

bacalhau docker run --id-only --gpu 1 -- python --o ./outputs --p "meme about tensorflow"
%env JOB_ID={job_id}
env: JOB_ID=f126c9a5-0fd6-41c5-88e2-2d66a64a1317

Running the commands will output a UUID that represents the job that was created. You can check the status of the job with the following command:

bacalhau list --id-filter ${JOB_ID}
 CREATED   ID        JOB                      STATE      VERIFIED  PUBLISHED               
 10:36:05  f126c9a5  Docker  Completed   /ipfs/QmatWywziRqxTh... 

Where it says "Completed ", that means the job is done, and we can get the results.

To find out more information about your job, run the following command:

bacalhau describe ${JOB_ID}
APIVersion: V1alpha1
ClientID: 2e9bed59a71d1334f6576e314fa1e3e0fdb6a309396f33381fc5bf9ae1bcbf51
CreatedAt: "2022-10-19T10:36:05.655553494Z"
Concurrency: 1
ShardsTotal: 1
ID: f126c9a5-0fd6-41c5-88e2-2d66a64a1317
NodeId: QmRjLYuFU1wAhWh3u94cm7DgbLRBTkUhCTAx77VyXBDgr4
CID: QmatWywziRqxThuovctYRcPXPXJpcAWmJB56WyNwtRorWq
Name: job-f126c9a5-0fd6-41c5-88e2-2d66a64a1317-shard-0-host-QmRjLYuFU1wAhWh3u94cm7DgbLRBTkUhCTAx77VyXBDgr4
StorageSource: IPFS
exitCode: 0
runnerError: ""
stderr: "0:13, 1.07s/it]\r 12 241: 74%|███████▍ | 37/50 [01:07<00:13,
\ 1.07s/it]\r 12 241: 76%|███████▌ | 38/50 [01:08<00:12, 1.07s/it]\r
11 221: 76%|███████▌ | 38/50 [01:08<00:12, 1.07s/it]\r 11 221: 78%|███████▊
\ | 39/50 [01:09<00:11, 1.07s/it]\r 10 201: 78%|███████▊ | 39/50
[01:09<00:11, 1.07s/it]\r 10 201: 80%|████████ | 40/50 [01:10<00:10,
\ 1.07s/it]\r 9 181: 80%|████████ | 40/50 [01:10<00:10, 1.07s/it]\r
\ 9 181: 82%|████████▏ | 41/50 [01:11<00:09, 1.07s/it]\r 8 161: 82%|████████▏
| 41/50 [01:11<00:09, 1.07s/it]\r 8 161: 84%|████████▍ | 42/50 [01:12<00:08,
\ 1.07s/it]\r 7 141: 84%|████████▍ | 42/50 [01:12<00:08, 1.07s/it]\r
\ 7 141: 86%|████████▌ | 43/50 [01:13<00:07, 1.07s/it]\r 6 121: 86%|████████▌
| 43/50 [01:13<00:07, 1.07s/it]\r 6 121: 88%|████████▊ | 44/50 [01:14<00:06,
\ 1.07s/it]\r 5 101: 88%|████████▊ | 44/50 [01:14<00:06, 1.07s/it]\r
\ 5 101: 90%|█████████ | 45/50 [01:15<00:05, 1.07s/it]\r 4 81: 90%|█████████
| 45/50 [01:15<00:05, 1.07s/it]\r 4 81: 92%|█████████▏| 46/50 [01:16<00:04,
\ 1.08s/it]\r 3 61: 92%|█████████▏| 46/50 [01:16<00:04, 1.08s/it]\r
\ 3 61: 94%|█████████▍| 47/50 [01:17<00:03, 1.08s/it]\r 2 41: 94%|█████████▍|
47/50 [01:17<00:03, 1.08s/it]\r 2 41: 96%|█████████▌| 48/50 [01:19<00:02,
\ 1.08s/it]\r 1 21: 96%|█████████▌| 48/50 [01:19<00:02, 1.08s/it]\r
\ 1 21: 98%|█████████▊| 49/50 [01:20<00:01, 1.08s/it]\r 0 1: 98%|█████████▊|
49/50 [01:20<00:01, 1.08s/it]\r 0 1: 100%|██████████| 50/50 [01:21<00:00,
\ 1.08s/it]\r 0 1: 100%|██████████| 50/50 [01:21<00:00, 1.62s/it]"
stderrtruncated: true
stdout: ""
stdouttruncated: false
State: Completed
Status: 'Got results proposal of length: 0'
Complete: true
Result: true
NodeId: QmdMDhqqpkw2cAY1dk45cwL8PsKDexYKewN7thrF2TZeUe
PublishedResults: {}
State: Cancelled
VerificationResult: {}
RequesterNodeID: QmXaXu9N5GNetatsvwnTfQqNtSeKAD6uCmarbh3LMRYAcF
RequesterPublicKey: CAASpgIwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCehDIWl72XKJi1tsrYM9JjAWt3n6hNzrCA+IVRXixK1sJVTLMpsxEP8UKJI+koAWkAUuY8yi6DMzot0owK4VpM3PYp34HdKi2hTjzM8pjCVb70XVXt6k9bzj4KmbiQTuEkQfvwIRmgxb2jrkRdTpZmhMb1Q7StR/nrGa/bx75Vpupx1EYH6+LixYnnV5WbCUK/kjpBW8SF5v+f9ZO61KHd9DMpdhJnzocTGq17tAjHh3birke0xlP98JjxlMkzzvIAuFsnH0zBIgjmHDA1Yi5DcOPWgE0jUfGlSDC1t2xITVoofHQcXDjkHZE6OhxswNYPd7cnTf9OppLddFdQnga5AgMBAAE=
- python
- --o
- ./outputs
- --p
- meme about tensorflow
Engine: Docker
JobContext: {}
Publisher: Estuary
GPU: "1"
BatchSize: 1
GlobPatternBasePath: /inputs
Verifier: Noop
- Name: outputs
StorageSource: IPFS
path: /outputs

If you see that the job has completed and there are no errors, then you can download the results with the following command:

rm -rf results && mkdir -p results
bacalhau get $JOB_ID --output-dir results
Fetching results of job 'f126c9a5-0fd6-41c5-88e2-2d66a64a1317'...

2022/10/19 10:40:30 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See for details.

After the download has finished you should see the following contents in results directory

ls results/volumes/outputs
import IPython.display as display