AI Image Generation - Getting Started - So you wanna make some pretties?

Guz's iconGuz

Game Uploader
Prestige 4
Registered
Joined
Aug 15, 2024
Threads
41
Messages
776
Reaction score
5,903
LC COIN
536
Getting started making AI images for personal use, and maybe sharing with your friends on your favorite forum.
I don't profess to be an AI master, but it would have helped me to have a guide to follow and a place to ask questions when I first started, so I am trying to give that to you.
There is a steep learning curve if you want to generate your own images at home. However, it comes with the feeling of it being safer to experiment with prompting, free from logging and arbitrary censorship.
If you think the learning will be too much, pick an online generator and skip my section(s) below about the software/hardware.
------------------------------------------
Hardware
Step 1 - Assessment - What computer hardware do you have?
Your options will be different and your experience with AI image generation will be different depending on the computer hardware you have.
Video RamSoftware1.5 ModelsSDXL/Pony Models
24 GBSD Forge, SD Automatic 1111, Comfy UI, FooocusYes, very fastYes
18 GBSD Forge, SD Automatic 1111, Comfy UI, FooocusYes, very fastYes
12 GBSD Forge, SD Automatic 1111, Comfy UI, Fooocus (yes but slow)YesYes
8 GBSD Forge, Comfy UI??YesYes but slower - Recommend Forge because memory management is better
6 GBSD ForgeYesNo - You could try forge turning on "Never OOM" and keeping the dimensions small
Note: A notebook with any of the above graphics is not ideal but OK, as long as you have a good cooling solution. Get yourself a cooling fan and make sure it is dust free, or you're gonna have a bad time.
Generation on a PC with AMD graphics is possible, although there are hoops to jump through I can't help with any better than a google search.
From what I have seen most recommendations are to move to Linux or replace your graphics card.
If it is a newer AMD card with high video ram, you could try Fooocus, it auto-detects your graphics and sets the optimal settings for you. There is also software called "Stability Matrix" which attempts to set everything up for you. It seems to have options for AMD, so might be a good option to help you find something that works.
The macbooks with 64GB shared memory are apparently very good for text to text generation models as they can hold a much bigger model in memory and therefore give much better output (like the difference between chat GPT3 and chat GPT4).
I don't know much about macs other than your close window buttons look different and the menus are all in slightly different places. Commercial software like Adobe might be something to look at
------------------------------------------------------
Software
Step 2 - Choose your level of difficulty
Comfy UI.
  • I don't like dark souls games.
  • I have put the same 6 or so hours into this that I put into elden ring, before i gave up in disgust.
  • Apparently it is the way to go if you have the time to learn workflows and will ultimately give you the best results in the long run.
  • It also seems the be the one most of the tutorials on making AI videos/gifs seem to use.
  • Here is an install guide:
Automatic 1111
  • It is difficult to install if you don't know GIT and how cloning repositories and installing dependencies works.
  • Once installed it does have a lot of extensions you can try but most of them have very poorly written documentation if any at all.
  • It seems like a lot of hassle for very little gain.
  • If installing on a hidden drive you will need to add the drive to your path - ask a question below and i will explain.
  • when I installed it I used this guide: Installing A1111: it is fairly straight forward, but I still struggled and had to learn some things to make it work.
Forge
  • It is basically the same as automatic 1111, it looks very similar, except it has better memory management and is much much easier to install
  • The installer takes a while but it should add python and all the dependencies automatically.
  • The first time you run it it will also download a 6gb model file (juggernaut - it makes pretty and realistic photos but it's dumb and a bit of a prude)
  • You can download other models and put them in the folder: //webui/models/Stable-diffusion
  • You can download other loras and put them in the folder: //webui/models/loras
  • If installing on a hidden drive you will need to add the drive to your path - ask a question below and i will explain.
This is a direct link to download the last stable version:
After you download it, go back here and follow the install instructions as if you downloaded the installer package:

The latest version of forge on Git is a developer branch, it is not 100% stable but it is faster and runs flux models. If you install it you are likely to find things that don't work. Or things that suddenly stop working. am using the dev branch right now, and it has been pretty stable for weeks.

If you click the update.bat file in the stable version, it will update to the dev branch...
Fooocus,
  • easy to use
  • easy to install
  • does fantastic pictures.
  • don't need to know much to get good pics.
  • slower than Forge
  • a lot less ability to customize functions (Which may be good or bad, it's like Apple vs Android)
  • requires a fairly good graphics card because it only uses SDXL and Pony models.
  • Download from here: Look for the ">>> Click here to download <<<" link

  • The installer takes a while but it should add python and all the dependencies automatically.
  • The first time you run it it will also download a 6gb model file (juggernaut - it's pretty but a bit of a prude)
  • You can download other models and put them in the folder: fooocus/models/checkpoints
  • You can download other loras and put them in the folder: fooocus/models/loras
  • It is possible to change the directory your models/loras are stored in. Without going into detail, update config.txt with the instructions in config_modification_tutorial.txt If you're doing this and need a hand ask below.
  • If installing on a hidden drive you will need to add the drive to your path - ask a question below and i will explain.
In either app (they look almost identical) go the the extensions tab, and click "available" then "load from"

Search for and install:
  • CIVITAI helper - for model management,
  • adetailer - for auto inpainting, and
  • reactor - for easy face swap (can build a little library of faces you like and swap them in easily)
There are plenty of others to try, but these I find are the most valuable.
-------------------------------------
Online Generators
Most online generators will log what you are requesting. If you think there is risk, don't forget your VPN.
  • Needs Google ID: If I ever see a generator that asks for this I just close it. It's not worth the trouble of faking it.
  • Free:
  • Free:
------------------------------------
Models
  • When you download any of the software above it will generally also download a model.
  • The ones that come with the software make pretty pictures, and maybe some topless women, but not much more
  • find other models on you need to sign up to see NSFW
  • use a temporary email from here
  • Put them in the folder /Models/Stable-Diffusion or in Fooocus/models/checkpoints
  • SDXL models are big (6GB+) but if you can run them they are much easier to prompt because a lot of them use more natural language. And IMHO they make better pics.
Some good SDXL models include:
  • Cyberealistic Pony V5 - Might need to get from a site called huggingface
Loras are another kind of model used to fine tune the big model you are using to give it more information on a specific subject. Find loras on civitai.com, sign up with a temporary email so you can see NSFW.

You can download loras into your models/loras folder. Then you use them in a prompt with syntax like this:
<lora:StS_age_slider_v1_initial_release:-1>

There is a lora section on Forge and Automatic 1111 that will add them to your prompt if you click on them. then you can adjust the strength. Depending on the way it was created you might also need to add a trigger word to make it work. Click on the picture of the crossed hammer and spanner on the top right of the lora and you will probably see a list of trigger words.

Most loras that aren't sliders work best at a strength somewhere between 0.5 and 1. Read the description in civitai and the creator will often tel you what works best.

Some of my favorites include:
(I might update these to links one day, but you can just search on civitai)
  • StS_age_slider_v1_initial_release - Use at a strength of -1 to make the subject slightly younger. Use -5 if you want the FBI to come knocking.
  • XL_Body_ahxl_v1 - helps make skinny women with flat chests
  • MissionaryVaginal_v1_SDXL - what it says on the box
  • reverse_cowgirl_xl_2_0
  • XL_Sex_Cowgirl
  • SquattingAnal
  • XL_Sex_Doggystyle
  • XL_Sex_Blowjob_POV_Deepthroat
  • tutelage - can give blowjob with second woman encouraging (e.g. mom and daughter)
  • sdxl-creampie-v02-e99
  • XL_Sex_Doggystyle_With_Feet
  • SideSexWithFeetv4
  • XL_Sex_Therresom_FFM
  • sex_in_between_the_lines - it's rough sex.
  • Shaped_Uterus - if you like x-ray type pics, this makes the best ones.
  • lactation_XL_0_
  • preggoXL_v1.0
  • XL_Face_Cockshock
  • XL_Body_Fairywings - like redbull for AI
  • XL_Body_Tits_Conical_Nipples
  • XL_Body_Tits_Bumpy_Nipples
  • longnipsxl
  • SDXXL_V30_puffy_nipples_v3
  • HardNipples - Gives pokie/headlights/THOs nipples through shirts
  • God Pussy
  • Rear pussy
  • large_insertion_1_CivitAI
Fairies and Minigirls
  • Miniature_People_-_By_DICE-000006 - can help with size difference like small fairy big person images
  • Extreme_tiny - can also help with minigirls
  • Very_Small_Women - also minigirls... are you sensing a theme...
  • shrunk_xl_v11 - Guess
  • tinkerbellPonyXL_character-10
  • TinkerBell_NSFW_for_Pony-000003
  • MacroFairy
Monstergirls:
  • lamia_0.27_universal_sdxl_pub
  • RPGCentaurXL
  • Centaurs
  • Horse_Pussy
  • XL_Body_Monstergirl_Centaur
  • humantaurs_02_20_merge_sdxl
  • Enchantress_dota2_v1_ponyxl
  • RPGPixieXL
  • Sexy_Girls_With_Foxtail_SDXL-000003
  • mermaid_xl_v1
  • werecat-ponyxl-v1
  • Lapicentaur_SDXL
  • Werewolf_Sex2_SDXL
  • Werewolf_and_Weregirl
  • Tailgrab
  • cat_girl_pony
  • Realistic_Feline_Hybrids
  • Feline_Pussy-CatXL
  • Jessie_the_Yodeling_Cowgirl_-_Toy_Story_PonyXL
  • Kiri_-_Avatar_the_Way_of_Water_PonyXL
  • zelda_v2_pony
  • onarmor_pony_xl - to make big monsters wearing girls as Armour
For Flaming1
  • Goblin
  • Gobla
  • Gaby_the_goblin_V1.2
  • Shortstack_ANY
  • Shortstacky_Bukkake-000005
------------------------------------
Generating Images
1. For SDXL, simple is often better, a prompt like this usually gives pretty decent results with well trained realistic models:

"Main subject, action, inclusion 1, inclusion 2, inclusion 3, features of subject, background, things the AI should focus on a bit more, camera angle, lighting, long shot/midshot/portrait/closeup"​

Something like this:​
Short teenage girl, sitting on chair, skirt lifted, spread legs, knee high socks, redhead, flat chest, desk in empty classroom, perfect eyes, perfect face, perfect pussy, studio lighting, portrait​

2. Don't worry about negative prompts for SDXL unless it keeps giving you something you don't want to see.
3. If you want to make sure the AI focuses on the face, or gets the eyes right, or hands, or other parts, mention them in the positive prompt, e.g. pale skin, dark skin, detailed skin, will give much better results than pale, dark, or detailed. Just adding something like "nipples" to the end of your prompt will make the AI add a little more detail and make the nipples much more detailed than just pushing up the resolution or steps.
4. The further you zoom out the more likely it is that the AI will mess up the face, hands or something else. If the shot is very tight you will get a lot more fine detail.
5. Using (brackets) around words makes the AI pay more attention to the word, [brackets] around a word will make the AI pay less attention (useful if a lora has one trigger word and you don't want it affecting your prompt)
6. ((brackets)) are like double the amount of attention of single brackets, you can also use (brackets:1.2) for the same effect however it allows for more fine tuning as you can also say (brackets:1.25). It also keeps your prompts a bit cleaner to use (brackets:1.8) instead of ((((((((brackets))))))))
7. If you really want negative prompts, I found a wildcard with all the negative prompt words used on civitai (hundreds of thousands of words). Through the magic of excel formulas I counted them all and ranked them by the most commonly used. This should work very well as a negative prompt:

acne, age spot, mole, skin blemishes, skin spots, ugly, normal quality, poorly drawn, bad quality, bad artist, cropped, out of frame, drawing, grayscale, illustration, cartoon, cgi, painting, sketch, render, 3d, anime, signature, text, username, logo, artist name, error, amputation, bad hands, bad proportions, disconnected limbs, disfigured, disgusting, distorted, extra digit, extra limbs, fewer digits, fewer fingers, floating limbs, fused fingers, missing limb, morbid, mutated, mutilated, wrong anatomy,
To generate your first image:
  1. Open Fooocus
  2. Type the prompt: "Teenage vampire, lifting skirt, showing pussy, vampire fangs, under a streetlamp, perfect eyes, perfect face, perfect pussy,"
  3. Click advanced
  4. Select quality
  5. Select aspect ratio: 1024 X 1024
  6. Click style, turn off everything and select Volumetric Lighting
  7. Click model
  8. Select CyberealisticXL_v20
  9. Select a lora: Age slider, weight -2
  10. Click generate
If you generally liked it but want to fix the face or hands a bit.
  1. Select input image,
  2. Add the generated pic
  3. Click "Vary (subtle)"
  4. Click generate
  5. When you get an image you like add the pic to the input image section
  6. Select Upscale 2X
  7. Click generate
----------------------------------------
Resolution is by far the thing that makes your image generations take the most time.

256 X 256 = 65536
1024 X 1024 = 1048576
1048576 / 65536 = 16
a 1024 image will take roughly 16 times longer to generate then a 256 image.
If you have a slow graphics card, try generating at low resolution then using high-res fix to increase it. You will find it takes less time.

Models are often trained with a fairly narrow range of resolution ratios. If you are generating and people seem to have very long limbs or extra heads, the chances are that you're using a ratio that the model you're using doesn't like.
  • Square usually gives the best results.
  • Most SD 1.5 models are trained at the resolution 512 X 512
  • Most SDXL models are trained at the resolution 1024X1024 for SDXL
To make higher resolution images, generate at one of these 2 resolutions then use highres fix to upscale your pic to the desired size.
-----------------------------------------
1-2 for turbo/lightning models,
5-8 for non-turbo
As you increase CFG the generation usually listens to your prompt better. However, if you increase CFG you also need to increase your number of steps, which slows down the generation.
As just a general rule, (this differs for different models), 5 CFG would need about 20 steps, 7 = 30, 8 = 45
-----------------------------------------
model dependent, I usually start at 30 and go up or down depending on quality and detail
read the model info on civitai, the creator usually tells you what seems to work best.
-----------------------------------------
.
-----------------------------------------
.
---------------------------------------------


1. Forge has a ton of settings to play with, most are not worth bothering with if you have less than a NVIDIA 4060. Anything slower the trial and error for a lot of the settings becomes tedious, just use the basic settings, or something you know works.
2. If your graphics card has less than 8GB ram, don't bother, use an online service. @ 8 GB you want to use a 1.5 model, @ 11GB+ look for an SDXL or Pony model.
3. Start with something you know works, then increment changes until you get what you want. I would suggest using civitai.com to find an image you like, setting up everything exactly the same then make changes.
4. The pictures using a 1.5 model aren't usually as good, the failure rate seems higher, the models require a lot more positive and negative prompting to get a good result. They will also require many more loras and embeddings to understand your prompts, because it's dataset is lower.
5. Prompting is very different for different models, when you download models from civitai, read what the creator says and look at the prompts they used to make the initial images they shared.
6. As a rule of thumb, when working with an SDXL model, don't use any negative prompts to start with, just add things to the prompt as you see undesirable things the model is including.
7. There is a ton of advice out there if you google it, however, because this is such a new tech, the advice gets old really quickly, check the dates on posts and videos.
8. You are never going to use all the loras and models you download. Keep them organized, there is no standard naming convention everyone uses, so you are going to have a ton that you don't know what they do. - The same goes for checkpoints.
9. Keep a document where you add the pics you really like along with everything you used to make them: prompts, models, loras, embeddings, seeds, steps, clip skip, resolution, sampler, hiresfix %, upscaler.
10. Don't bother inpainting a pic you just created unless you really really want to kill hours on a single pic only to not get exactly what you want. Re-use the same seed and prompt and change a few things like steps, or add a lora to teach a concept the model doesn't know
11. If you have extra limbs, stretched bodies fused people, it is very possibly the resolution you're using. For SD1.5 make sure one of the dimensions is 512, and for SDXL make one 1024. (512X512 or 1024X1024 will solve a whole lot of issues.) To make the image bigger use highres fix or an upscaler after it is created, to fix a face or hands try the add in called adetailer, to swap in another face you like try reactor.

Just want to add some things to this 2 months later...
12. The highres fix button below your generated pic will re-run the generation and everything that would happen happens after it even if those things weren't turned on in the original generation.
13. You can run generations without highres fix, adetailer, or reactor until you find an image you like, then turn on adetailer/reactor, check the settings in highres fix are what you want, then click the button.
14. Highres fix can also fix hands and faces a bit, so you can run it a couple of times and the details might improve, each time you run it you need to resize or increase the dimensions slightly, if you don't the second or third generation will become weirdly over detailed and will start to look bad. Just be careful to use the save button to save an image you like, because the highres fix might overwrite files (mine does at least). You can do this on pics individually even if you have a batch of several pics created just by selecting the pic you want to re-run.
15. When you click the save button manually, it copies the generation text to a log file inside your save folder.

Also see Alego's AI Question thread: Ask your Stable Diffusion questions here
The thread is in the public area, if you want to ask a question there with an image that shouldn't be there, post the pic in P4 and link to it, don't endanger this site. Or DM it to me and I will see if I can help.
Alego makes some of the most amazing image gens I have ever seen. It is definitely worth a look if you are getting into image generation.
 
Last edited by a moderator:
  • Like
  • Heart
  • Love
Reactions: 14 users
  • Heart
Reactions: 1 user
Just a bit I want to share about using ComfyUI and "group nodes"

Workflows in ComfyUI allow to control every aspect of AI generations and have repeatable, reproducible results. It allows much better than Automatic1111 to experiment and learn how everything works to get best results. Workflows should be treated the same way as code, you want it clean and readable. To help with this goal, I present the little used feature of "group nodes".

If you have repeated blocks of similar nodes, you can select such a group of nodes, right click, and convert it to a single node. This hides all internal connections and allows to define, which widgets should be visible, same for inputs and outputs. I had trouble finding proper documentation of this feature, the best would be some youtube vides, such as this one:
You must be registered to see media

A quick example: The following workflow that I found on civitai is about text to animations with four keyframes. You can see the four similar groups of nodes, one for each keayframe, but this is a huge mess.
You must be registered to see attachments


First thing I do is entangling the flow, keeping a strict flow from left (input) to right (output). This makes the nodes to be grouped easily identifiable. In this case such a bundle looks like this:
You must be registered to see attachments

which I then convert to a group node. One tip: I like to propagate inputs to outputs to allow for clean chaining and no additional connections. In the example above, I did this with the ipadapters, width and height etc. It is not strictly necessary to have those as outputs, as the whole group actually only changes the model and conditionings. But with this propagation the chain is much simpler with no wild connections all around the place.

Defining a group node that way makes this so much cleaner and readable:


It has some caveats, though.
  • First group nodes only exist in a single workflow. To make them really reusable you have to convert them to components.
  • Then, defining the group node until it looks that clean and that it only exposes the necessary inputs, widgets and outputs is a lot of tedious work.
  • Last but not least I also run into many many unexplicable bugs, especially when using reroute nodes - so I try to avoid those as much as possible.

Hope this helps someone. Maybe we can start sharing much more tips here.
 
  • Love
  • Wow
  • Heart
Reactions: 3 users
Just a bit I want to share about using ComfyUI and "group nodes"

Workflows in ComfyUI allow to control every aspect of AI generations and have repeatable, reproducible results. It allows much better than Automatic1111 to experiment and learn how everything works to get best results. Workflows should be treated the same way as code, you want it clean and readable. To help with this goal, I present the little used feature of "group nodes".

If you have repeated blocks of similar nodes, you can select such a group of nodes, right click, and convert it to a single node. This hides all internal connections and allows to define, which widgets should be visible, same for inputs and outputs. I had trouble finding proper documentation of this feature, the best would be some youtube vides, such as this one:
You must be registered to see media

A quick example: The following workflow that I found on civitai is about text to animations with four keyframes. You can see the four similar groups of nodes, one for each keayframe, but this is a huge mess.
You must be registered to see attachments


First thing I do is entangling the flow, keeping a strict flow from left (input) to right (output). This makes the nodes to be grouped easily identifiable. In this case such a bundle looks like this:
You must be registered to see attachments

which I then convert to a group node. One tip: I like to propagate inputs to outputs to allow for clean chaining and no additional connections. In the example above, I did this with the ipadapters, width and height etc. It is not strictly necessary to have those as outputs, as the whole group actually only changes the model and conditionings. But with this propagation the chain is much simpler with no wild connections all around the place.

Defining a group node that way makes this so much cleaner and readable:


It has some caveats, though.
  • First group nodes only exist in a single workflow. To make them really reusable you have to convert them to components.
  • Then, defining the group node until it looks that clean and that it only exposes the necessary inputs, widgets and outputs is a lot of tedious work.
  • Last but not least I also run into many many unexplicable bugs, especially when using reroute nodes - so I try to avoid those as much as possible.

Hope this helps someone. Maybe we can start sharing much more tips here.
thx for sharing!
 
  • Like
Reactions: 1 user
wow, great guide. if you don't learn AI from this guide i don't think you ever can learn AI (y)
 
How can i control the number of persons in the image? I try various prompts but it doesn't seem to work. I often got more characters than i want, some of them just copy of another.

I use forge.
 
  • Thinking
Reactions: 1 user
How can i control the number of persons in the image? I try various prompts but it doesn't seem to work. I often got more characters than i want, some of them just copy of another.

I use forge.
I usually use the prompts "1girl", "3girls", "1boy", "3boys" etc to get the required amount of people. Most models also respect the forms "2+boys" or "A group of girls" as well. The higher the number, the more unreliable it gets.
And yes, stable diffusion tends to produce similar faces. Getting several different characters consistently in one picture is brutally difficult. I experimented with regional prompting and others, but in the end, manually inpainting each person is the only practical way right now. I would be happy if someone could prove me wrong
 
  • Like
Reactions: 2 users
Time for another drop of presonal knowledge about:
✨Pipelining in ComfyUI✨

The reason why ComfyUI is overwhelming for newcomers and frustrating even for experienced users is the ammount of noodles (connections) between nodes. To illustrate this little tutorial, I created the following workflow. It consits out of 3 parallel KSamplers to generate 3 images (imagine you are testing different settings) and does an upscaling for each picture at the end. This is BAD :sick:, but don't worry, we will improve.
You must be registered to see attachments

The amount of noodles is crazy. The workflow is neither intuitive, understanable, or maintainable that way. The more parmeters you want to reuse, the worse it gets. So how can we improve?

First method is to use native Reroute nodes. Reroutes are great to lead noodles out of the way and to clean up flows. There are many ways to do it, the one below for the example is just one way.
You must be registered to see attachments

The effect is immediatly visible. The amount of connections didn't go down, but overall it's less confusing. There is still the downside, that each parameter is treated individually. What if we could group parameters and settings that belong together? Well, meet pipes
There are a lot of custom nodes for ComfyUI, that do exactly that, grouping parameters, so that a single pipe noodle is needed to connect. A few examples are below together with the title of the custom nodes:


I especially like the last one, the context nodes from rgthree ( ) as they group a lot of common parameters. These context nodes also allow to change data in the flow. For example each context node can overwrite the incomming basic context with (for example) a generated image. The example workflow could look like this:

The bigger the workflow, the greater the effect of this pipelining. You can also collapse these nodes, so that they look like reroutes, just with many inputs and outputs, see in the above example in the Sampler2 and 3 group. Neat!

You might have noticed, that these pipelining or context nodes are fixed in what parameters you can group together. What if you have a lot of other parameters bundled? In one of my workflows that tests different upscaling methods, I have a group of parameters that I want to reuse everywhere. Again, there are some generic pipe nodes, but most are limited in the number of inputs:


A relatively unknown, but extremely useful custom node is the highway node ( ). I didn't test whether there are limits, but I use the following setup without any issues:


The input field of the node take a semicolon separated list of inputs and outputs, you click update, and the node offers the contact points for the noddles. For inputs, you add a ">" in front of the name, for outputs a "<". So the _query parameter for the input highway node looks like this: ">BatchSize;>DesiredUpscaleFactor;>UpscaleModel;>ModelUpscaleFactor;>UpscalerCfg;>UpscalerDenoise;>NrOfIterations;>RescaleFactor;>TilingStrength". Super useful!

That's it. I hope this helps someone to either overcome the fear of using ComfyUI, or to structure their workflows, reducing headaches and pain. Oh, and maybe it inspires others to also share their tips here ;)
 
  • Heart
  • Love
  • Wow
Reactions: 4 users
very interesting post thanks
 
  • Like
Reactions: 1 user
Awesome! I have scant experience working with nodes but enough to get the gist of your workflow. Good stuff, lad.
 
  • Like
Reactions: 1 user
Hey, all,

If, like me, you thought that the lack of GPU (or even good CPU) would keep worthwhile local generation out of your reach, think again! I have just made the very first pictures thanks to ComfyUI-cpu.

ComfyUI-cpu comes with a trimmed model, a LoRa, and a workflow that you just have to load. You can ignore the model, though. The pre-packed model is pretty much garbage, actually, sacrificing quality for speed.
After looking around for quick, powerful models, I decided on LUSTIFY! [SDXL NSFW checkpoint], specifically the ENDGAME DKD2 version, which is the latest and quickest, I think.

Prepackaged model: 2 minutes per image on low CPU load, completely useless output.
LUSTIFY!: ~20 minutes per image, 512x512 pixel image of good quality. (Can probably be improved. It recommends 4-8 steps, and I chose 8, so that alone should reduce the time.)
SDXL models are generally much bigger/slower than SD1.5 models, so a good SD1.5 model would probably speed that up as well.

Note: These times are with a cheap ~€500 laptop from several years ago.


After the installation, I couldn’t find the LoRa, trained Model, or workflow. However, the creator also shares them separately here:


Next step: Figuring out how upscalers work, to turn that into at least a 1024x1024 image, maybe bigger. I tried to understand that earlier today, but there are so many options. @Guz, can you help me out here? I’m a bit lost, and the one workflow with what I think was supposed to upscale the output didn’t really work for me. (It somehow made small images that were just colourful lines.)

Optional step: Figuring out if adding LoRas is worthwhile, or if that just eats resources.


Edit:

Apparently there are dozens of different ways to upscale, with many more combinations. Testing them would take ages, and many are probably straight up impossible for me with my ~14 GB of free RAM that is allocated to ComfyUI. So this goes to anyone knowledgable:

1) Can you recommend any powerful but very lightweight SD1.5 models? I’m currently using an SDXL model that, while only taking ~6 steps for these 512x512 images, presumably still takes longer than fast SD1.5 models.
2) Can you recommend a stupid-simple workflow for upscaling/enhancing images? The first ~10 recommendations were all completely different from another, often warned about high computing cost, and seemed very confusing. Then there’s the ~4 different upscaling-related nodes that come with ComfyUI, none of which I could easily figure out. Specifically, I’m interested in doing the following recommended task by the SDXL model:
"Highres fix: upscale by 1.4, 2-3 highres steps, 0.45 denoising."
(Or alternatively, if there’s a better SD1.5 model than anything that works with that. I’d really like to have at least 1024x1024 images at the end of it.)

3) Pretty much all the models are trained to generate squares, but the vast majority of AI images are portrait or landscape… how does that happen? How can I do that?
 
Last edited:
  • Like
Reactions: 2 users
1) Can you recommend any powerful but very lightweight SD1.5 models? I’m currently using an SDXL model that, while only taking ~6 steps for these 512x512 images, presumably still takes longer than fast SD1.5 models.
All models of the same type will render at the same speed. The "fast" models are just using tricks to improve the image quality per step count; there are a handful of them, you might see them called LCM, Turbo, Hyper, Lightning, or Schnell. That speedup comes at a cost: the image quality and prompt adherence (how good the AI follows your prompt) isn't as good. You can generate a decent image in 6-8 steps instead of a better image in 30-40 steps. That's a tradeoff you'd probably find worthwhile.

SDXL and SD 1.5 have LCM and/or Lightning LoRAs available that you can use with any model to give them that speedup. Personally, I prefer Lightning over LCM, image quality is a bit better and you don't need to use the special LCM sampler.
2) Can you recommend a stupid-simple workflow for upscaling/enhancing images? The first ~10 recommendations were all completely different from another, often warned about high computing cost, and seemed very confusing. Then there’s the ~4 different upscaling-related nodes that come with ComfyUI, none of which I could easily figure out. Specifically, I’m interested in doing the following recommended task by the SDXL model:
"Highres fix: upscale by 1.4, 2-3 highres steps, 0.45 denoising."
That is referring to a feature in the A1111-family of UIs (A1111, Forge, SD.Next, maybe a few others). It renders an image and then immediately upscales it and passes it through an image-to-image process. If the base image is crap - and it's usually crap - this just wastes a whole bunch of processing time on something you're going to throw out anyway. I suggest render at low res first to find an image that you like and want to refine, then upscale that, and pass it through an image-to-image process. IDK how to do that in ComfyUI, but maybe someone else here knows how. If Comfy is too confusing, you might want to switch to A1111.

When you do the image-to-image processing step, be aware that the higher you set denoising, the more the image is going to change. More detail will be filled in, but you'll also be more susceptible to high-resolution artifacting (the double heads effect).
(Or alternatively, if there’s a better SD1.5 model than anything that works with that. I’d really like to have at least 1024x1024 images at the end of it.)

3) Pretty much all the models are trained to generate squares, but the vast majority of AI images are portrait or landscape… how does that happen? How can I do that?
They'll render other aspect ratios too. I recommend keeping the total pixel count close to or below what the model was trained on for the base image (before upscaling). SD 1.5 is 512x512. SDXL is 1024x1024. With SDXL, you can render 1280x720 (16x9 AR) just fine. Smaller pixel counts will render faster, so pick your desired aspect ratio and then figure out what image size is small enough to be reasonable to render, and is large enough that the features are sufficiently clear to upscale and process through image-to-image to yeild a viable final image. This will require experimentation to figure out what is optimal for you.
 
  • Like
  • Heart
Reactions: 2 users
That is referring to a feature in the A1111-family of UIs (A1111, Forge, SD.Next, maybe a few others). It renders an image and then immediately upscales it and passes it through an image-to-image process.
Ah, good to know.
SDXL and SD 1.5 have LCM and/or Lightning LoRAs available that you can use with any model to give them that speedup. Personally, I prefer Lightning over LCM, image quality is a bit better and you don't need to use the special LCM sampler.
I completely misunderstood what the LCM and Lightning thing were! Based on one base model referring itself as, what I thought, was a new version of Lightning/LMC, I assumed it was something model creators could do to make faster versions of their models.
This is especially embarrassing, because ComfyUI came with an LCM LoRa pre-added to the workflow. It also explains why I got extremely burnt garbage output at some point. I had switched the base model and changed the settings to what that base model recommended. As such, I was using an LCM model with 50 steps.

Thank you very very much! I’ve just spent some time testing the included LCM, and it sped up generations from ~10 minutes to 3m10s at similar quality. I’ll switch over to a lightning LoRa later to compare the two. You saved me a lot of time. (♥ω♥*)

They'll render other aspect ratios too. I recommend keeping the total pixel count close to or below what the model was trained on for the base image (before upscaling). SD 1.5 is 512x512.
Thanks. Rendering at 768x512 or 512x768 seems to work just fine, but I’ll be sure to figure out upscaling when I want bigger images.
 
  • Like
Reactions: 1 user
OK, I need help. Better, I'd like to understand a couple of things, so I implore the help of more tech savy people.

Out of curiosity I run a speed test of my faptop (VND™) with 8g VRAM after my debacle with 1111.
It was able to deliver a 832x1216 picture (no upscale), of a single charachter, 30 steps, Euler a, Pony model, adetailer on face in 1m3s. A similar picture with DPM++ SDE took less than 1m30s.

So, when trying 1111 I was expecting poorer perfomances, not a full debacle. The only low res, poor quality image I was able to get, after a long battle against a CUDA out of memory error, took more than half an hour to get done.
Can someone elaborate on the whys of my failure?
Many thanks in advance.
 
OK, I need help. Better, I'd like to understand a couple of things, so I implore the help of more tech savy people.

Out of curiosity I run a speed test of my faptop (VND™) with 8g VRAM after my debacle with 1111.
It was able to deliver a 832x1216 picture (no upscale), of a single charachter, 30 steps, Euler a, Pony model, adetailer on face in 1m3s. A similar picture with DPM++ SDE took less than 1m30s.

So, when trying 1111 I was expecting poorer perfomances, not a full debacle. The only low res, poor quality image I was able to get, after a long battle against a CUDA out of memory error, took more than half an hour to get done.
Can someone elaborate on the whys of my failure?
Many thanks in advance.
A1111 has a lot of performance settings and command line flags that you may need to set. I'd look up guides for setting it up with your specific GPU. What UI did you use that worked?

You may want to give a try. It has the same UI as A1111 and a lot of the same features, but has a more performant backend. Be aware that Forge made some UI upgrades that A1111 didn't which break a lot of the extensions that they used to share in common.
 
OK, I need help. Better, I'd like to understand a couple of things, so I implore the help of more tech savy people.

Out of curiosity I run a speed test of my faptop (VND™) with 8g VRAM after my debacle with 1111.
It was able to deliver a 832x1216 picture (no upscale), of a single charachter, 30 steps, Euler a, Pony model, adetailer on face in 1m3s. A similar picture with DPM++ SDE took less than 1m30s.

So, when trying 1111 I was expecting poorer perfomances, not a full debacle. The only low res, poor quality image I was able to get, after a long battle against a CUDA out of memory error, took more than half an hour to get done.
Can someone elaborate on the whys of my failure?
Many thanks in advance.
If your using Pony, Stable Diffusion Forge along with my thread also helps with Pony use.
https://lewdcorner.com/threads/best-prompts-and-negative-prompt-to-use-for-fun.13621/
 
  • Love
Reactions: 1 user
If your using Pony, Stable Diffusion Forge along with my thread also helps with Pony use.
https://lewdcorner.com/threads/best-prompts-and-negative-prompt-to-use-for-fun.13621/

Thanks bookmarked for when I get P4.

A1111 has a lot of performance settings and command line flags that you may need to set. I'd look up guides for setting it up with your specific GPU. What UI did you use that worked?

Thanks, I will look into it.

Be aware that Forge made some UI upgrades that A1111 didn't which break a lot of the extensions that they used to share in common.

That's exactly the reason why I wanted to switch to 1111. I had to downgrade to an older commit because i wanted to try (and now I want to keep on using) Regional Prompter (broken in post july updated) but at a cost of loosing a couple of features of newer versions.
 
Just an update on the cpu-only situation.

Fastest usable image generation has been 40s per image using a 4-Step Hyper-SD 1.5 node with the base 4 steps. Each additional step and at least some LoRas tended to add ~10 seconds per step to that.
For some reason I switched to the 8-step Hyper node (probably better quality?), and with a couple of LoRas and 8-12 steps depending on prompt complexity, it takes a few minutes.

I’ve also tried out ELLA for a bit. That system basically adds a small LLM in a parallel layer to CLIP. (Clip is where you write your positive and negative prompt.) As I understand it, the LLM will automatically adjust each generation step to be more in line with the prompt. As an LLM it also has much better understanding of natural language prompts and better coherence. (Stuff like "the person wears a red hat, holds a green apple in her left hand, and stands beside person two, who…"). It also has a longer token limit for its understanding. Apparently, the issue Stable Diffusion has with merging/melting different parts of a prompt together on occasion often stems from CLIP running out of tokens, forcing it to cut the prompt into different junks in the middle.

The apparent downside of ELLA for our use case is that, from what I’ve read between the lines of their posts, was very worried about people "misusing" the tool. As such, plans to implement ELLA for SDXL were scrapped, and the release for SD 1.5 was censored to hell and back. If it thinks part of a prompt is NSFW, it will just ignore it. It is dumb enough that "wearing no clothes" instead of "nude" seems to have worked, and you still have your parallel CLIP channel to add the NSFW stuff back in, but I don’t have to patience to figure out a new way of prompting that works for this.

As I understand, it the "correct" way to do this would be to write a casual paragraph about the scene in the ELLA prompt. (Characters, items, and where in the image they are supposed to be, and what they are wearing etc.) and then write all the typical tags and simple nsfw interactions in the CLIP prompt. (1960s candid photograph, soft lighting, masturbating, etc).

Given that SDXL already has much higher prompt adherence than SD 1.5, it would probably be much easier to use that instead. (Or go straight to FLUX.)

Speaking of FLUX, I went a bit crazy and tried that out as well. Well, I installed it and did three prompts with it, all of which were kind of garbage. But the thought counts(?) Turns out, figuring FLUX out is much easier than all the complicated ComfyUI stuff for Stable Diffusion. The quantised models are also smaller and less of a resource hog than SDXL, which is great. Took my computer pretty much exactly 30 minutes using the settings of a simple prompt, and 60 minutes for twice as many steps. Given how much more powerful FLUX seems to be at similar minimum system requirements, I wonder if it will largely replace SDXL/Pony once the community had a few months to build an ecosystem for it.

After those excursions, I decided to go back to SDXL/Pony and see if I could speed that up from the ~20 minutes per generation I had at the beginning. The good news: For some reason, it only took ~9 minutes, up to ~14 minutes with more steps and an upscaler. The 20 minutes I recorded previously probably coincided with the completely overbaked images I made when I used several times as many steps as I should.

Bad news: For some reason, using any base models other than LUSTIFY! causes ComfyUI to stall out after ~10 minutes, then freeze my computer for ~20 minutes, before crashing and freeing my computer again. I decided against testing this extensively, though I loading the model OnlyForNsfw118 worked once for some reason. (Sadly, I accidentally closed comfyUI at that point, so I don’t have more data on that…)


Anyway, one thing I would have loved to know getting into this is that someone made promtping guides for CyberRealistic and some other SD 1.5 models:

Knowing the limitations and how to phrase stuff in a way the program hopefully understands would have saved me some time. Based on my time on perchance.org/pretty-ai, I was used to just spamming related keywords in order to get something somewhat close to what I was going for.

Another useful resource/inspiration has also been . It seems to be same pretty-ai model, but it also adds a ton of drop-down menus for adjusting the camera settings, view, sex position and more. These can be good inspirations for you own prompt, though I’ve mostly consulted the list of view positions.
 
  • Like
Reactions: 1 user
Back
Top Bottom