Changing Clothes Using A.I. (ComfyUI)
In the previous post I showed how to generate character sheets for game development. But some folks asked me “what if I want specific clothes?”.
Hold my Beer
There’s a way. It’s a separate workflow that you can later add to the previous workflow if you want (homework assignment). Just take the character generated in the first group of nodes and pass it through this other workflow that swaps clothes. (Or do the extra work of generating loras and activating them with keywords in the prompt as well, there isn’t only one way.)
Unlike Mick’s workflow - which is paid - this other one is open. I grabbed it from some Reddit post and dropped it into the workflows folder of my Docker project as “garmente-replacement-idm-vton.json”. Just load it in ComfyUI and play around. Let’s see the result. Here are two images. The result isn’t perfect, but see if you can figure out which of the two is the original and which one was generated by A.I.:
Answer at the end of the post lol
Anyway, the complete workflow looks like this:
In the current state of my setup there’s a glitch I haven’t been able to solve. I think it’s the JNodes extension that’s opening an infernal popup at the top left of the interface, which won’t close no matter what. For now I’m opening the Inspector and deleting the Node from the HTML directly lol it’s a hack but it works:
If anyone knows how to fix it, I’d really appreciate it. There’s an alert from an extension that seems to be outdated, this front-end something, but just ignore it. Of course it had to be front-end-something…
Anyway. In this first group it’s simple, just load an image as clean and clear as possible of the new garment you want to use. And then the original photo where you want this clothing to be applied. It doesn’t perform miracles. If the photo is of someone in a blazer and you want to change it to a tank top, sometimes it manages, but don’t expect the arms and torso to look good. It’s going to redraw everything as it sees fit, or it’ll end up putting the tank top on top of the blazer. Anyway, it’s A.I., the clearer the goal, the better the result. If the new garment is a blazer, ideally the original was another blazer, it’ll fit better.
Heads up that the IDM-VTON models really weigh a TON. It alone downloads no less than 30GB of models (already included in my setup) and will demand more than 20GB of VRAM, so anything below an RTX 3090 with 24GB just won’t run. You can run it online, on the Hugging Face site. Watch this tutorial that shows how:
This is the Aiconomist channel. He shows how to run it online and how to set it up on your machine in this other video which I used as reference to build my workflow. Subscribe to his channel, it’s worth it. The tutorials are very well explained and have more details that I won’t cover in this post.
That said, there’s a bug I haven’t solved yet. After running the workflow, the model stays hanging in the GPU’s VRAM and if I try to run again, it FREEZES. Then I have to shut down the Docker container and restart it to recover the memory. I tried a Clear VRAM GPU node but it didn’t work. If anyone knows how to fix it, I’d appreciate that too.
The next step is recurring across many workflows: extracting layers of information from the original photo. Getting a mask of the garment we want to replace and information about the original pose:
This is why ComfyUI workflows are powerful once you start understanding how the process is broken down. These parts are reusable and different models give different results depending on which controlnets you add.
Then we get to the main part: wiring all this information into the IDM-VTON node, which will reposition the new garment on top of the original photo:
Notice that the result isn’t very good, it looks pretty “fake”. And there are parameters you can tweak to improve it. But the use case isn’t to swap for a specific garment but rather for a similar one. I’ll explain in a bit, but first, a tangent.
The repository for the original extension of this Node is this one. But if you try to install it, it breaks. It seems to be a bit out of date with ComfyUI and needs to run on an older version. But that’s a pain. So I went ahead and made a FORK with the necessary fixes to run on the newer version (the newer versions of the Python packages diffusers, transformers, huggingface_hub, break IDM-VTON).
By the way, Python dependency management is pretty BAD. But ComfyUI’s is even worse, because different extensions (which install Python dependencies globally) can be at different stages of development. Every time ComfyUI itself changes, everyone needs to update fast. That’s why the ideal is not to install the master branch version of ComfyUI, always a stable version (marked with a tag) or two versions back, just to be safe. But then there are new extensions that require you to be on the newest version. As I said, it’s a nightmare to manage. Being a programmer, I can “hack around” when errors happen, but normal users will suffer a bit.
Rant over. In the previous step the first part of the flow ends. When it hits this Image Sender node, the process stops - I don’t know why. And you have to click the “RUN” button a second time. Then the other node, Image Receiver, will receive the new image. I think I’ll just delete those two nodes and connect the “image” output of the Run IDM-VTON Inference node directly to the next node, which are the Image Composite Masked and the VAE Encode.
Let’s go back a step:
In this next step, on the second “RUN”, it takes the image with the new garment, but low quality, and loads a new checkpoint (in this case juggernaultXL but it could be any other like SDXL itself, FLUX or dreamsharper). The model will determine the new photo, which is what gets generated at the end, on the right.
Notice that the new photo has much better garment quality, but the face is completely different. It used the information from the original photo, kept the pose, the body proportions, even the background, but it can’t preserve the face. No problem, because we can help.
Remember that in the previous step we got the mask (black and white) of the exact position of the garment? We can enlarge the mask and apply a Gaussian Blur (so it later blends nicely into the original photo) and use that to crop from the generated photo with the different face, using the IPAdapter model (which alone is another 30GB of models - I told you it’s heavy).
If I understood correctly, IP Adapter is responsible for being able to generate new images that are COHERENT with each other, with the same face, or same clothes, but in other positions, for example, a kind of conditioned generation. With it you can control the coherence of new images A LOT, generate a character in different positions, in different backgrounds and always preserving the characteristics of the original character.
IP Adapter will use the original photo, keep the face, use the masks and blend with the photo with the different face, taking only the garment, and then assemble everything in the Image Composite Masked which is the final Node of the process.
In the end, we can compare the first attempt at putting on the new garment (which came out ok but ugly, and the new garment, which came out different but prettier) and everything integrated into the original photo, preserving the same face.
As I said, the qualities are affected by the choice of model, the KSampler settings, number of steps, cfg and everything else. You have to experiment and generate several until you reach the result you like best. That’s also how you learn more. You don’t just click once and magically have the result. A good artist experiments dozens of times.
The final image might not be suitable for a Vogue cover (although I’ve already seen plenty of official covers that were VERY questionable). It depends on the need. In particular it’s good for prototyping. “I wonder how this garment looks on my model?” Instead of calling the model in, having her change clothes, or spending an hour Photoshopping, in minutes you can try several different garments and see whether the model fits the garment, without wasting time. So a photographer can just show it to the client before they spend money on studio, assistants, equipment, etc. You can visualize quickly and discuss details much faster.
This is the kind of thing where tools like this can help a lot and differentiate one artist from another, not by replacing the artist, but by creating more options. And the main thing: all these workflows are modifiable, you reprogram them and combine them with others however you want. The limit is your creativity.