I would highly recommend Fooocus to anyone who hasn't tried: https://github.com/lllyasviel/Fooocus
There are a bajillion local SD pipelines, but this one is, by far, the one with the highest quality output out-of-the-box, with short prompts. Its remarkable.
And thats because it integrates a bajillion SDXL augmentations that other UIs do not implement or enable by default. I've been using stable diffusion since 1.5 came out, and even having followed the space extensively, setting up an equivalent pipeline in ComfyUI (much less diffusers) would be a pain. Its like a "greatest hits and best defaults" for SDXL.
Looks like a complete contraption to setup and looks very unpleasant to use at first glance when compared against Noiselith.
The hundreds of python scripts and having the user to touch the terminal shows why something like Noiselith should exist for normal users rather than developers or programmers.
I would rather take a packaged solution that just works over a bunch of scripts requiring a terminal.
installation/setup is dead simple. up and running in under 3 minutes:
git clone https://github.com/lllyasviel/Fooocus.git
cd Fooocus
pip3 install -r requirements_versions.txt
python3 entry_with_update.py
Let's see...
Okay. I'll need to install it? What package might that be in, hmm. Moving on, I already know it's python.
Guess I'll use sudo...
= = =
Obviously I know better than to do this, but very few people would. This is not 'dead simple'! It's only simple for Python programmers who are already familiar with the ecosystem.
Now, fortunately the actual documentation does say to use venv. That's still not 'dead simple'; you still need to understand the commands involved. There's definitely space for a prepackaged binary.
The people that make software that does useful things, and the people that understand system security live on different planets. One day they'll meet each other and have a religious war.
This said, it's nice when developers attempt to detect the executable they need and warn what package is missing.
Depending on your platform, but if you read the readme, there’s a pre packaged release with Python embedded.
There are projects that set up "fat" Python executables or portable installs, but the problem in PyTorch ML projects is that the download would be many gigabytes.
Additionally, some package choices depend on hardware.
In the end, a lot of the more popular projects have "one click" scripts for auto installs, and there are some for Fooocus specifically, but the issue there is its not as visible as the main repo, and not necessarily something the dev wants to endorse.
Or you can use Stability Matrix package manager.
Yeah, VoltaML is also another excellent choice in stabilty matrix.
You have to make trade-off in software development. Fooocus trades on the best picture rather than the most beautiful interface, and also simplicity in its use. I think it is a good trade-off given the technology is improving at breaking-neck speed.
Look, DiffusionBee is still maintained but still no SDXL support.
Anyone who bet that the technology is done and it is time to focus on the UI is making the wrong bet.
This project is really cool and I like the stated philosophy on the README. I think it's making the right trade-off in terms of setting useful defaults and not showing you 100 arcane settings. However, the installation is too hard. It's a student project and free so I'm not criticizing the author at all but I think it's a pretty fair and useful criticism of the software and likely a significant bottleneck to adoption.
Huh? It has a really simple interface, much much much simpler than anything else that uses SD/SDXL locally. Installation is also simple for Windows/Linux, don't know about macOS.
I was afraid of the Python setup (even though I'm a Python developer), but yep: Make the virtualenv, install the dependencies, done. This is amazing, the images it generates are immediately beautiful.
It does look bad that it bundles GTM, though, as a sibling commenter says.
Samples:
https://imgz.org/i9oicVqo/
https://imgz.org/i8Ur3WjW/
https://imgz.org/i5j6r6TZ/
Be sure to try the styles as well. Thats actually a seperate input than the prompt for SDXL, and most other UIs dont implement the style prompting.
No, its not.
There are two text encoders, but they aren't really “prompt” and “style” inputs.
Most UIs default mode of operation sends the same input to both text encoders, but at least comfy has nodes that support sending separate text to them. OTOH, while there may be some cases where sending different text to the two encoders helps in a predictable way, AFAIK most of the testing people has done has shown that optimal prompt adherence usually comes from sending the same to both.
Hmm well that was a massive misunderstanding on my part.
I’m not sure the origin, but using ViT-L (the encoder shared with SD1. x) for what you might call the main prompt and ViT-G (the new SDXL encoder, and also a successor to the encoder used as the single encoder in SD 2.x) for a style prompt was a common idea shortly after SDXL launched, so its understandable.
What are the two text encoders?
OpenCLIP ViT-G and CLIP ViT-L. The latter is the same encoder used in SD 1.x, OpenCLIP ViT-H was used as the encoder in SD 2.x, and ViT-G is, as I understand it, a successor and improvement on ViT-H.
Have to build it yourself on Mac, and we all know how "fun" building Python projects is
Just spent about 10 minutes building it on MacBook Pro M1. I come with significant bias against Python projects, but getting Fooocus to run was very, very easy.
I finally had a chance to set it up and yes, it works great!
did you get fooocus to run on silicon mac with mps support? i cant get mps support running for the love of god - any help would be much appreciated from me (as well as the 20 or so people plus that are currently looking for a solution on github to achieve normal speed generation compared to the 15 minutes per image :)) thank you
I think so… it’s an M1 Max Macbook Pro and it takes about 2-3 mins per image. I followed all the prerequisites on their install for mac page
Mine takes about ~3 min per image, didn't do anything special. Left everything at default settings after my initial install (about 5 days ago). Not speedy but certainly not 15 min. Running on an M1 with 32gb ram.
That's good to know!
Looks like the Web UI of the self-hosted install of Fooocus sells out the user to Google Tag Manager.
Can our entire field please realize that running this surveillance is a bad move, and just stop doing it.
I think that's coming from gradio?
Probably, auto1111 does the same - but I agree with GP it shouldn’t be there.
If it isn’t explicitly surveillance, it could effectively be.
Yeah, it is, just need to set an env var of GRADIO_ANALYTICS_ENABLED=False to stop it - probably should be added into launch.py along with the other env vars being set at launch.
Is this at all usable on a CPU-only system with a ton of RAM?
Not really. There is a very fast LCM model preset now, but its still going to be painful.
SDXL in particular isn't one of those "compute light, bandwidth bound" models like llama (or Fooocus's own mini prompt expansion llm that in fact runs on the CPU).
There is a repo focused on CPU-only SD 1.5.
Yeah, llama runs acceptably on my server, but buying a GPU and setting it all up seems really unfun. Also much more expensive than my hobby budget
You don't need a big one, even an old 4GB GPU will massively accelerate the prompt ingestion.
Yeah, Fooocus is much better if you are going for the best local generated result. Lvmin puts all his energy into making beautiful pictures. Also it is GPL licensed, which is a + in my book.
Eh, I messed around with it for a while - it's okay and good for beginners, but without much more effort you can get better results out of A1111 or ComfyUI