• Bonus activated 35% increased.
    Donate and get something hint hint.

AI Image Generation - Getting Started - So you wanna make some pretties?

Just an update on the cpu-only situation.
Awesome to see that you found something that works for you.
Speaking of FLUX, I went a bit crazy and tried that out as well. Well, I installed it and did three prompts with it, all of which were kind of garbage. But the thought counts(?) Turns out, figuring FLUX out is much easier than all the complicated ComfyUI stuff for Stable Diffusion. The quantised models are also smaller and less of a resource hog than SDXL, which is great. Took my computer pretty much exactly 30 minutes using the settings of a simple prompt, and 60 minutes for twice as many steps. Given how much more powerful FLUX seems to be at similar minimum system requirements, I wonder if it will largely replace SDXL/Pony once the community had a few months to build an ecosystem for it.
Flux is interesting. There are some things it is really good at, and some things it is just fucking terrible at. Before my AI SSD failed, Flux made great hands but was just fucking garbage for NSFW stuff. I used it in combination with SDXL. Try them both to see whichever one generated the base image best, then refine with the other one to fix hands or clear up other detail problems as needed. When I'm back up and running, I'll be interested to see if anyone has cracked the NSFW problem.
Bad news: For some reason, using any base models other than LUSTIFY! causes ComfyUI to stall out after ~10 minutes, then freeze my computer for ~20 minutes, before crashing and freeing my computer again.
That's odd, there shouldn't be anything special about LUSTIFY! that would make it work when other SDXL models fail.
 
That's odd, there shouldn't be anything special about LUSTIFY! that would make it work when other SDXL models fail.
The only special thing it has I can think of is this blurb:
"This isn't a lightning, but DMD2 model. For those who don't know, it's purpose is the same as Lightning/Hyper, but with a different look."

Based on other models it taking ages to load (10+ minutes), and then stalling out shorty afterwards, my first thought would be a memory issue. Given that it is the same size as all other SDXL models, I don’t see why it would be different, though. That said, the one OnlyForNSFW one worked, it did when I had very few other programs asking for RAM at the same time. (Only Firefox with a few tabs and my notes app.)

IDK how to do that in ComfyUI, but maybe someone else here knows how. If Comfy is too confusing, you might want to switch to A1111.

Awesome to see that you found something that works for you.
Just to be clear, I’m using ComfyUI because the installer worked, not because I’m particularly attached to it. I had issues installing both InvokeAI and Forge, though I’m pretty sure what went wrong with Forge and could probably get that going.

ComfyUI works okay once it is set up. However, everything feels very needlessly inconvenient. All these dozens upon dozens (hundreds, really) of tiny nodes for every little thing. Why the heck do you need half a dozen nodes for the KSampler? Why are does (almost) every single need to be loaded extra instead of just one big node for the basic things? It feels incredibly bloated.

Of course there are more conventient nodes. After all, every user can program their own nodes. This just confuses the situation. Take the "Load LoRa" node.

The basic node lets you load one (1) LoRa. Want two LoRas? make more nodes. Suddenly, you have 10+ nodes just for the LoRas.
So someone made a LoRa stacker that lets you load three (3) LoRas at once. Why three? Who knows. You still need to fully unload a Lora to turn it off, though, so have fun clicking around a lot if you want to test things!
Then there’s an expanding node that can fit as many LoRas as you want, but it only lets you adjust the strength, not the clip strength. (Shouldn’t matter, though, since most don’t affect clip anyway.)
Then there’s a LoRa node stacker that has an on-and-off button. Convenient. But it doesn’t work alone and instead requires you to load another node as well to get it to work. (I think so, at least.)
And so on…

For basically every single thing you might want to do, there are different impementations, often with different settings and different requirements. And given that the whole ComfyUI ecosystem is based around sharing workflows with other people, it’s one huge mess where every one does the same things differently.


Long story short, I think have a correct implementation for the Face/Lips/Hands/etc Detailer now, but it also seems wrong because it didn’t work. Or to be more precise: It took 48 minutes to attempt to fix the hands in an image that took ~4 minutes to generate in the first place, and it just made them worse instead. I have about a dozen decent pictures with just one awful detail*, and my motivation for figuring out how to fix it is very low. I’ll probably try to install Forge again tomorrow, or maybe go for A1111 and use the baked-in Detailer that one is supposed to have.

It’s very annoying because I feel like have everything else essential figured out by now. I’m even pretty sure I could do the ControlNet stuff if I really wanted to for specific poses. The only other thing I failed at is making the suspended spitroast LoRa work, and I think that might just be an issue of it being too cartoonish for photorealism.

*Well really, I also have a couple of nightmare fuel images I accidentally made that I want to make presentable. Probably something to post myself once I have P4, though. As a hint: If you start your prompt with "evidence photo of", the scenes sometimes look like places the police might be called to…
 
Last edited:
  • Haha
Reactions: 1 user
The only special thing it has I can think of is this blurb:
"This isn't a lightning, but DMD2 model. For those who don't know, it's purpose is the same as Lightning/Hyper, but with a different look."
Ahh interesting. I missed that when I read the model description. Are you using a special DMD2 node to run it in Comfy? That node probably wouldn't work with a non-DMD2 model.
ComfyUI works okay once it is set up. However, everything feels very needlessly inconvenient. All these dozens upon dozens (hundreds, really) of tiny nodes for every little thing. Why the heck do you need half a dozen nodes for the KSampler? Why are does (almost) every single need to be loaded extra instead of just one big node for the basic things? It feels incredibly bloated.
This is the main reason I didn't use it. Unless you're the 0.0001% who needs that level of granularity in designing your generation pipeline, Comfy's UI is just unnecessarily complex.
I’ll probably try to install Forge again tomorrow, or maybe go for A1111 and use the baked-in Detailer that one is supposed to have.
Forge is more memory efficient, and the says that there are detailers that are now compatible.
The only other thing I failed at is making the suspended spitroast LoRa work, and I think that might just be an issue of it being too cartoonish for photorealism.
Sometimes if you decrease the LoRA strength you can get the effect without the art style the LoRA was trained on.
 
  • Like
Reactions: 1 user
Awesome thanks for the info
 
Ahh interesting. I missed that when I read the model description. Are you using a special DMD2 node to run it in Comfy? That node probably wouldn't work with a non-DMD2 model.
It’s baked in, so no extra node for that.

My setup was just replacing LUSTIFY! with another model and adding the SDXL Hyper LoRa afterward. (Which is also probably why the one pic I did when it worked was garbage, ’cause different models/LoRas need different Samplers etc.)
Forge is more memory efficient, and the says that there are detailers that are now compatible.
I’ll look into it. I usually spent 1-2 hours no generation-related tasks. (Usually generating some stuff in the background while browsing CivitAI or guides, or something else.) I’ll add Forge to the list.

Edit: I failed to get Forge working, so far, but I’ve tried the same FLUX setup again I used a few days ago. Last time, a 512x512 generation with 8 steps took an hour. Earlier, a 1024x512 generation with 8 steps took only half an hour. That’s four times as fast. I’m not sure if it’s just that using a smaller model is faster (Previous test used a ~7GB model, this one a ~5GB model), or if my attempt at getting Forge to work with my AMD iGPU somehow lead to ComfyUI being faster. It’s pretty neat, though.

Really, I don’t understand in general why steps with similar prompts using the same workflow and models are often so widely different. Sure, more LoRas means slower generation, but I usually use a similar number of LoRas. With several LoRas, my SD 1.5 setup takes between ~10 seconds and ~35 seconds per step, and a bit longer per step for the HighResFix that uses the same model. (Even the ratio between HighResFix time and the generation of the initial image doesn’t stay the same.)

The only thing I can think of is that the computer sometimes draws resources to other applications, slowing ComfyUI down. However, even if I close other programs and don’t actively do anything else, generations sometimes take much longer.
 
Last edited:
After reading this I think I'll never try AI rendering myself, but the "ComfyUI-CPU" triggered my interest:
I have a small crypto mining farm waiting for new tasks. There are 8 CPUs with 8 cores each and 32GB RAM in total. (IIRC)
I used to use them with Docker-images.
Has anyone heard of a way to use such hardware for AI-rendering?
I'd gladly sell it for a few bucks...
 
After reading this I think I'll never try AI rendering myself, but the "ComfyUI-CPU" triggered my interest:
I have a small crypto mining farm waiting for new tasks. There are 8 CPUs with 8 cores each and 32GB RAM in total. (IIRC)
I used to use them with Docker-images.
Has anyone heard of a way to use such hardware for AI-rendering?
I'd gladly sell it for a few bucks...
The software that almost everyone uses for AI rendering offline is not written for distributed computing. If that is a single 8-socket server, it could work, but 8-socket servers are rare and sell for many many thousands of dollars on the used market; anyone with that kind of money for an AI rig has much better options. If it's 8 separate systems with 4GB of RAM each, it's basically worthless for CPU rendering using any of the standard software options. The CPUs can do the rendering, it's the small amount of RAM that's the real killer.
 
No matter how big of a CPU setup you'll manage to get, it's performance will be dwarfed by a proper GPU. Go for Nvidia, AMD is currently no match
 
I came across this thread a while back, and have been dabbling with Fooocus. It was easy enough to install and get up and running. Before this, I never worked with any sort of AI and had no idea where to even start. So thanks for the guide.

I was more curious than anything, just been playing around seeing what I could create. Probably won't spend much more time with it, but was fun to see some of the inner-working of AI generation.

No matter how big of a CPU setup you'll manage to get, it's performance will be dwarfed by a proper GPU. Go for Nvidia, AMD is currently no match
I have and AMD and don't seem to have any problems.

I am a little confused by some of you guys' comments. Using an AMD, I can create a 1024x1024 with 30 steps in about 5 minutes. An 8 step takes about 30 seconds. Am I missing a big part of something, or did I just interpret your comments wrong?

Again, I have just been curious about AI and how it works. It certainly will become more and more apart of our lives.
 
I am a little confused by some of you guys' comments. Using an AMD, I can create a 1024x1024 with 30 steps in about 5 minutes. An 8 step takes about 30 seconds. Am I missing a big part of something, or did I just interpret your comments wrong?
I think it’s just that NVIDIA GPUs generate faster than AMD.

Good news, I’ve found an extremely lightweight fast PONY model that even my machine can run. Specifically, this 4-step PONY, which is only 2GB in size due to crazy quantization:
Since it’s quantized as a .gguf, you need to use a unet setup, and you need to load the vae and clip-l and clip-g separately. That’s trivially easy, though.


Bad news, severa days later:

Something fucked up my ComfyUI installation, again. This time, I have much less idea of what it could be. It just happened between one generation and the next. The only thing I can think of to have caused it is that I opened a very big img2img workflow and installed the missing nodes somewhere close to that timeframe…

It still works, but generations take almost exactly 10x as long as before, most of the time:

Code:
100%|██████████████████████████| 5/5 [01:25<00:00, 17.18s/it]
100%|██████████████████████████| 8/8 [00:36<00:00,  4.52s/it]
Prompt executed in 227.08 seconds
got prompt
100%|██████████████████████████| 8/8 [01:39<00:00, 12.42s/it]
Prompt executed in 110.48 seconds
got prompt
Requested to load SD1ClipModel
loaded completely 9.5367431640625e+25 235.84423828125 True
Requested to load BaseModel
loaded completely 9.5367431640625e+25 3278.812271118164 True
100%|███████████████████████████| 6/6 [17:22<00:00, 173.81s/it]
100%|███████████████████████████| 11/11 [16:21<00:00, 89.18s/it]
Prompt executed in 2452.12 seconds

And now, after uninstalling a bunch of unneeded custom nodes:

Code:
100%|███████████████████████████| 8/8 [23:54<00:00, 179.31s/it]
Requested to load BaseModel
loaded completely 9.5367431640625e+25 3278.812271118164 True
100%|███████████████████████████| 8/8 [01:36<00:00, 12.08s/it]
100%|███████████████████████████| 8/8 [24:44<00:00, 185.58s/it]
Requested to load BaseModel
loaded completely 9.5367431640625e+25 3278.812271118164 True
100%|███████████████████████████| 8/8 [13:21<00:00, 100.18s/it]

Notice how it suddenly jumps back to 12s/it in the middle? No idea what is going on here. I didn’t change the setup between these generations in any way that should impact it.

I was just using my normal workflow, converted into an img2img workflow by loading an image instead an empty image and setting denoise to 0.6. (There’s two loading bars per generation because I’m reusing the HighResFix.) I have absolutely no idea where to even look to fix it. I’ve removed any Custom Nodes that I don’t need, rebooted my computer in case some process was running and slowing it down, next I’ll try updating ComfyUI-Manager, I guess.

Edit: Now it’s 90.88s/it and 58.80s/it respectively. What the heck is even going on?

While the new custom nodes were throwing up errors, and something was for some reason causing enough instability to crash Firefox tabs several times, the main culprit was sitting before the keyboard, as is usual.
1) The easiest way to speed up generation is to set cfg to 1. It doubles your speed, at the cost of completely removing the negative prompt and making prompt adherence worse generally. (Not that big of an issue if you use a picture to direct the creation already.
2) You know how bigger images take longer to load? Well, loading an image does not automatically tell you how big it is, so you can switch from a tiny image to one that is literally 6x times the size or more without noticing.

Now, if you multiply it together, 6x2=12, and suddenly there’s a reason why my generations took more than ten times as long between one picture and the next…

This does not, however, explain why it suddenly went back to 12.08s/it in the middle, there. That was the exact same image as the rest. I have no idea what is going on there. I’m just happy that generations are at a more reasonable speed again for me.
 
Last edited:
  • Thinking
Reactions: 1 user
Sorry for the double-post, but that the issue with slow threads…

Since I didn’t find any other small quantized models, I’ve made my own. I’ve also uploaded the gguf versions of CyberRealistic Pony v7 to CivitAI, including the required clip_l and clip_g, in case anyone feels like trying it out:


Basically, using the Q5_K_S gguf allows you to load everything within ~3.2 GB of VRAM, allowing fast generation on small GPUs. (Or in my case, using very little RAM…)
 
  • Heart
Reactions: 1 user
Back
Top Bottom