What's the best Apache or MIT-licensed python library for Diffusion Transformers?
The train loop is wrong no ? neither x0s and eps are used in the expression of xts so it loons like your training to predict random noise
Yes, should be the same as the equation before. Like this:
xts = alpha_bar[t].sqrt() * x0s + (1.-alpha_bar[t]).sqrt() * eps
Additionally, the code isn't consistent. In the sampling code a time embedding is used, while in training it isn't.Oops, you're right. Fixed, thanks.
Not sure which eq you refer to, but from what I understand, the network never network "sees" the correct images. Rather, the network must learn to infer the information indirectly through the loss function.
The loss function encodes information about the noise and, because the network sees the noised up image exactly, this is equivalent to learning about the actual sample images. It's worth noting that you could design a loss function measuring the difference between the output and the real images. This contains equivalent information, but it turns out that the properties of gaussian noise make it much more conducive estimating the gradient.
But point being, the information on the true images is in the loop albeit only through the lense of some noise.
I spent 2022 learning to draw and was blindsided by the rise of AI art models like Stable Diffusion. Suddenly, the computer was a better artist than I could ever hope to be.
I hope the author stuck with it anyway. The more AI encroaches on creative work, the more I want to tear it all down.
Conversely I've become more motivated to draw things and try my hand at digital art since being exposed to Stable Diffusion, Midjourney et. al. I take the output from these tools and then attempt to recreate or trace over them.
People who do art for art's sake will do it regardless of AI.
After all, photography didn't stop people from drawing or painting.
This is a great post
I really like how readable the layout is. (Why do I waste so much time making hard-to-read layouts?)
The only disappointment came when I hit Reader View—almost to prove “this page is semantically perfect!!”—and alas, the nav list has a line height less than one in that medium, and scrunches up all crazy. I’ll let it slide ;)
Good post, I always thought diffusion originated from score matching, today I realized diffusion came before score matching theory, so when OpenAI trained on 250 million images, they didn’t even have great theory explaining why they were modeling the underlying distribution. Gutsy move
The original Dickstein 2015 paper [1] formulated diffusion as maximizing (a lower bound of) log-likelihood of generating the distribution, so there was some theory. But my understanding is that the breakthrough was empirical results from Ho [2] and Nichol [3] showing diffusion could produce not only high-quality samples but better than GANs in some cases.
[1] https://arxiv.org/abs/1503.03585 [2] https://arxiv.org/abs/2006.11239 [3] https://arxiv.org/abs/2105.05233
Thanks for sharing. This has given me much more insight into how and why diffusion models work. Randomness is oddly powerful. Time to try and code one up in some suitably unsuitable language.
Not much to TL;DR for the comment lurkers. This post is the TL;DR of stable diffusion.
HuggingFace Diffusers is Apache and supports Diffusion Transformers: https://huggingface.co/docs/diffusers/en/api/pipelines/dit
They are actually based on the Attribution - Non-commercial Facebook code and have the same license.
Are you sure about that? https://github.com/huggingface/diffusers lists the Apache 2 license.
https://github.com/huggingface/diffusers/blob/v0.27.2/src/di...
Pretty sure that license header ended up in the codebase from a clever guy making a PR or it was just a mistake.
Ok, I would like to believe that. That's great then thanks.
If it worries you, maybe open an issue? No sane man would allow a weird license that's an API call away from screwing up your own products.
Unfortunately this statement would not offer sufficient legal protection, so the original authors would have to be convinced to give up their previous rights and change the upstream copyright (and huggingface should update their repo license statement). Of course, these days it is typically easy enough to reimplement the code from a paper in plain pytorch, so I’m not sure one needs all this huggingface repo with the extra framework and risk, but to me it doesnt fit the requirement of the OP question.
That's sort of confusing (to me at least) because that particular header also lists MIT and Apache licenses.
Besides huggingface there's also the DDPT repo: https://github.com/lucidrains/denoising-diffusion-pytorch/
Nice. But is that a diffusion transformer?
I found this:
https://paperswithcode.com/paper/scalable-diffusion-models-w...
https://github.com/mindspore-lab/mindone/tree/master/example...