Stable diffusion has revolutionalized text2image models by producing high quality images based on a prompt. Dreambooth is a approach for personalization of text-to-image diffusion models. With images as input subject, we can fine-tune a pretrained text-to-image model
Although the dreambooth paper used Imagen to finetune the pre-trained model since both the Imagen model and Dreambooth code are closed source, several opensource projects have emerged using stable diffusion.
Dreambooth makes stable-diffusion even more powered with the ability to generate realistic looking pictures of humans, animals or any other object by just training them on 20-30 images.
In this example tutorial, we will be fine-tuning a pretrained stable diffusion using images of a human and generating images of him drinking coffee.
Building this container requires you to have a supported GPU which needs to have 16gb+ of memory, since it can be resource intensive.
We will create a Dockerfile and add the desired configuration to the file. Following commands specify how the image will be built, and what extra requirements will be included:
This container is using the pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel image and the working directory is set. Next, we add our custom code and pull the dependent repositories.
The shell script is there to make things much simpler since the command to train the model needs many parameters to pass and later convert the model weights to the checkpoint, you can edit this script and add in your own parameters
To download the models and run a test job in the Docker file, copy the following:
FROM pytorch/pytorch:1.12.1-cuda11.3-cudnn8-develWORKDIR /# Install requirements# RUN git clone https://github.com/TheLastBen/diffusersRUN apt update && apt install wget git unzip -yRUN wget -q https://gist.githubusercontent.com/js-ts/28684a7e6217214ec944a9224584e9af/raw/d7492bc8f36700b75d51e3346259d1a466b99a40/train_dreambooth.py
RUN wget -q https://github.com/TheLastBen/diffusers/raw/main/scripts/convert_diffusers_to_original_stable_diffusion.py# RUN cp /content/convert_diffusers_to_original_stable_diffusion.py /content/diffusers/scripts/convert_diffusers_to_original_stable_diffusion.py
RUN pip install -qq git+https://github.com/TheLastBen/diffusersRUN pip install -q accelerate==0.12.0 transformers ftfy bitsandbytes gradio natsort# Install xformersRUN pip install -q https://github.com/metrolobo/xformers_wheels/releases/download/1d31a3ac_various_6/xformers-0.0.14.dev0-cp37-cp37m-linux_x86_64.whl
# You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work.
# https://huggingface.co/settings/tokensRUN mkdir -p ~/.huggingfaceRUN echo -n "<your-hugging-face-token>" > ~/.huggingface/token# copy the test dataset from a local file# COPY jfk /jfk# Download and extract the test datasetRUN wget https://github.com/js-ts/test-images/raw/main/jfk.zipRUN unzip -j jfk.zip -d jfkRUN mkdir modelRUN wget 'https://github.com/TheLastBen/fast-stable-diffusion/raw/main/Dreambooth/Regularization/Women' -O woman.zipRUN wget 'https://github.com/TheLastBen/fast-stable-diffusion/raw/main/Dreambooth/Regularization/Men' -O man.zipRUN wget 'https://github.com/TheLastBen/fast-stable-diffusion/raw/main/Dreambooth/Regularization/Mix' -O mix.zipRUN unzip -j woman.zip -d womanRUN unzip -j man.zip -d manRUN unzip -j mix.zip -d mixRUN accelerate launch train_dreambooth.py \ --image_captions_filename \ --train_text_encoder \ --save_starting_step=5\ --stop_text_encoder_training=31 \ --class_data_dir=/man \ --save_n_steps=5 \ --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \ --instance_data_dir="/jfk" \ --output_dir="/model" \ --instance_prompt="a photo of jfk man" \ --class_prompt="a photo of man" \ --instance_prompt="" \ --seed=96576 \ --resolution=512 \ --mixed_precision="fp16" \ --train_batch_size=1 \ --gradient_accumulation_steps=1 \ --use_8bit_adam \ --learning_rate=2e-6 \ --lr_scheduler="polynomial" \ --center_crop \ --lr_warmup_steps=0 \ --max_train_steps=30COPY finetune.sh /finetune.shRUN wget -q https://gist.githubusercontent.com/js-ts/624fecc3fff807d4948688cb28993a94/raw/fd69ac084debe26a815485c1f363b8a45566f1ba/clear_mem.py
# Removing your tokenRUN rm -rf ~/.huggingface/token
We will run docker build command to build the container:
dockerbuild-t<hub-user>/<repo-name>:<tag>.
Before running the command replace:
hub-user with your docker hub username, If you don’t have a docker hub account follow these instructions to create a Docker account, and use the username of the account you create.
repo-name with the name of the container, you can name it anything you want.
tag this is not required but you can use the latest tag
Now you can push this repository to the registry designated by its name or tag.
The optimal dataset size is between 20-30 images. You can choose the images of the subject in different positions, full body images, half body, pictures of the face etc.
Only the subject should appear in the image so you can crop the image to just fit the subject. Make sure that the images are 512x512 size and are named in the following pattern:
Subject Name.jpg, Subject Name (2).jpg ... Subject Name (n).jpg
Approaches to run a Bacalhau Job on a Finetuned Model
Since there are a lot of combinations that you can try, processing of finetuned model can take almost 1hr+ to complete. Here are a few approaches that you can try based on your requirements:
The --gpu 1 flag is set to specify hardware requirements, a GPU is needed to run such a job
-i ipfs://bafybeidqbuphwkqwgrobv2vakwsh3l6b4q2mx7xspgh4l7lhulhc3dfa7a Mounts the data from IPFS via its CID
jsacex/dreambooth:latest Name and tag of the docker image we are using
-- bash finetune.sh /inputs /outputs "a photo of David Aronchick man" "a photo of man" 3000 "/man" execute script with following paramters:
/inputs Path to the subject Images
/outputs Path to save the generated outputs
"a photo of < name of the subject > < class >" -> "a photo of David Aronchick man" Subject name along with class
"a photo of < class >" -> "a photo of man" Name of the class
bacalhaudockerrun \--gpu1 \--timeout3600 \--wait-timeout-secs3600 \--timeout3600 \--wait-timeout-secs3600 \-i<CID-OF-THE-SUBJECT> \jsacex/dreambooth:full \--bashfinetune.sh/inputs/outputs"a photo of <name-of-the-subject> man""a photo of man"3000"/man""/model"
The number of iterations is 3000. This number should be no of subject images x 100. So if there are 30 images, it would be 3000. It takes around 32 minutes on a v100 for 3000 iterations, but you can increase/decrease the number based on your requirements.
Here is our command with our parameters replaced:
bacalhaudockerrun \--gpu1 \--timeout3600 \--wait-timeout-secs3600 \--timeout3600 \--wait-timeout-secs3600 \-iipfs://bafybeidqbuphwkqwgrobv2vakwsh3l6b4q2mx7xspgh4l7lhulhc3dfa7a \--wait \--id-only \jsacex/dreambooth:full \--bashfinetune.sh/inputs/outputs"a photo of David Aronchick man""a photo of man"3000"/man""/model"
If your subject fits the above class, but has a different name you just need to replace the input CID and the subject name.
Here you can provide your own regularization images or use the mix class.
Use the /mix class images if the class of the subject is mix
bacalhaudockerrun \--gpu1 \--timeout3600 \--wait-timeout-secs3600 \-i<CID-OF-THE-SUBJECT> \jsacex/dreambooth:full \--bashfinetune.sh/inputs/outputs"a photo of <name-of-the-subject> mix""a photo of mix"3000"/mix""/model"
Case 4: If you want a different tokenizer, model, and a different shell script with custom parameters
You can upload the model to IPFS and then create a gist, mount the model and script to the lightweight container
bacalhaudockerrun \--gpu1 \--timeout3600 \--wait-timeout-secs3600 \-iipfs://bafybeidqbuphwkqwgrobv2vakwsh3l6b4q2mx7xspgh4l7lhulhc3dfa7a:/aronchick \-iipfs://<CID-OF-THE-MODEL>:/model -i https://gist.githubusercontent.com/js-ts/54b270a36aa3c35fdc270640680b3bd4/raw/7d8e8fa47bc3811ef63772f7fc7f4360aa9d51a8/finetune.sh
--wait \--id-only \jsacex/dreambooth:lite \--bash/inputs/finetune.sh/aronchick/outputs"a photo of aronchick man""a photo of man"3000"/man""/model"
When a job is submitted, Bacalhau prints out the related job_id. Use the export JOB_ID=$(bacalhau docker run ...) wrapper to store that in an environment variable so that we can reuse it later on.
The same job can be presented in the declarative format. In this case, the description will look like this. Change the command in the Parameters section and CID to suit your goals.
name:Stable Diffusion Dreambooth Finetuningtype:batchcount:1tasks: - name:My main taskEngine:type:dockerparams:Image:"jsacex/dreambooth:full"Parameters: - bash finetune.sh /inputs /outputs "a photo of aronchick man" "a photo of man" 3000 "/man" "/model"InputSources: - Target:"/inputs/data"Source:Type:"ipfs"Params:CID:"QmRKnvqvpFzLjEoeeNNGHtc7H8fCn9TvNWHFnbBHkK8Mhy"Resources:GPU:"1"
You can download your job results directly by using bacalhau job get. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.
Refer to our guide on CKPT model for more details of how to build a SD inference container
Bacalhau currently doesn't support mounting subpaths of the CID, so instead of just mounting the model.ckpt file we need to mount the whole output CID which is 6.4GB, which might result in errors like FAILED TO COPY /inputs. So you have to manually copy the CID of the model.ckpt which is of 2GB.
To get the CID of the model.ckpt file go to https://gateway.ipfs.io/ipfs/< YOUR-OUTPUT-CID >/outputs/. For example: