return to table of content

Diffusion Models

ilaksh
12 replies
1d17h

What's the best Apache or MIT-licensed python library for Diffusion Transformers?

ilaksh
7 replies
1d12h

They are actually based on the Attribution - Non-commercial Facebook code and have the same license.

mrbungie
3 replies
1d7h

Pretty sure that license header ended up in the codebase from a clever guy making a PR or it was just a mistake.

ilaksh
1 replies
1d2h

Ok, I would like to believe that. That's great then thanks.

mrbungie
0 replies
1d

If it worries you, maybe open an issue? No sane man would allow a weird license that's an API call away from screwing up your own products.

pama
0 replies
22h9m

Unfortunately this statement would not offer sufficient legal protection, so the original authors would have to be convinced to give up their previous rights and change the upstream copyright (and huggingface should update their repo license statement). Of course, these days it is typically easy enough to reimplement the code from a paper in plain pytorch, so I’m not sure one needs all this huggingface repo with the extra framework and risk, but to me it doesnt fit the requirement of the OP question.

bitvoid
0 replies
1d8h

That's sort of confusing (to me at least) because that particular header also lists MIT and Apache licenses.

ilaksh
0 replies
1d12h

Nice. But is that a diffusion transformer?

davidguetta
3 replies
23h55m

The train loop is wrong no ? neither x0s and eps are used in the expression of xts so it loons like your training to predict random noise

fisian
1 replies
23h34m

Yes, should be the same as the equation before. Like this:

  xts = alpha_bar[t].sqrt() * x0s + (1.-alpha_bar[t]).sqrt() * eps
Additionally, the code isn't consistent. In the sampling code a time embedding is used, while in training it isn't.

reasonableklout
0 replies
21h22m

Oops, you're right. Fixed, thanks.

kmacdough
0 replies
22h47m

Not sure which eq you refer to, but from what I understand, the network never network "sees" the correct images. Rather, the network must learn to infer the information indirectly through the loss function.

The loss function encodes information about the noise and, because the network sees the noised up image exactly, this is equivalent to learning about the actual sample images. It's worth noting that you could design a loss function measuring the difference between the output and the real images. This contains equivalent information, but it turns out that the properties of gaussian noise make it much more conducive estimating the gradient.

But point being, the information on the true images is in the loop albeit only through the lense of some noise.

Tao3300
2 replies
1d1h

I spent 2022 learning to draw and was blindsided by the rise of AI art models like Stable Diffusion. Suddenly, the computer was a better artist than I could ever hope to be.

I hope the author stuck with it anyway. The more AI encroaches on creative work, the more I want to tear it all down.

ctippett
1 replies
20h34m

Conversely I've become more motivated to draw things and try my hand at digital art since being exposed to Stable Diffusion, Midjourney et. al. I take the output from these tools and then attempt to recreate or trace over them.

arvinsim
0 replies
11h9m

People who do art for art's sake will do it regardless of AI.

After all, photography didn't stop people from drawing or painting.

sidcool
1 replies
1d10h

This is a great post

b33j0r
0 replies
19h6m

I really like how readable the layout is. (Why do I waste so much time making hard-to-read layouts?)

The only disappointment came when I hit Reader View—almost to prove “this page is semantically perfect!!”—and alas, the nav list has a line height less than one in that medium, and scrunches up all crazy. I’ll let it slide ;)

sashank_1509
1 replies
1d2h

Good post, I always thought diffusion originated from score matching, today I realized diffusion came before score matching theory, so when OpenAI trained on 250 million images, they didn’t even have great theory explaining why they were modeling the underlying distribution. Gutsy move

reasonableklout
0 replies
20h55m

The original Dickstein 2015 paper [1] formulated diffusion as maximizing (a lower bound of) log-likelihood of generating the distribution, so there was some theory. But my understanding is that the breakthrough was empirical results from Ho [2] and Nichol [3] showing diffusion could produce not only high-quality samples but better than GANs in some cases.

[1] https://arxiv.org/abs/1503.03585 [2] https://arxiv.org/abs/2006.11239 [3] https://arxiv.org/abs/2105.05233

kmacdough
0 replies
23h50m

Thanks for sharing. This has given me much more insight into how and why diffusion models work. Randomness is oddly powerful. Time to try and code one up in some suitably unsuitable language.

Not much to TL;DR for the comment lurkers. This post is the TL;DR of stable diffusion.