HN comments for: Diffusion Models

ilaksh

12 replies

1d17h

2024-05-26 00:44:34 UTC

What's the best Apache or MIT-licensed python library for Diffusion Transformers?

reasonableklout

8 replies

1d17h

2024-05-26 01:00:39 UTC

HuggingFace Diffusers is Apache and supports Diffusion Transformers: https://huggingface.co/docs/diffusers/en/api/pipelines/dit

ilaksh

7 replies

1d12h

2024-05-26 05:44:48 UTC

They are actually based on the Attribution - Non-commercial Facebook code and have the same license.

simonw

6 replies

1d12h

2024-05-26 06:18:25 UTC

Are you sure about that? https://github.com/huggingface/diffusers lists the Apache 2 license.

ilaksh

5 replies

1d9h

2024-05-26 09:17:10 UTC

https://github.com/huggingface/diffusers/blob/v0.27.2/src/di...

mrbungie

3 replies

1d7h

2024-05-26 11:21:54 UTC

Pretty sure that license header ended up in the codebase from a clever guy making a PR or it was just a mistake.

ilaksh

1 replies

1d2h

2024-05-26 15:56:45 UTC

Ok, I would like to believe that. That's great then thanks.

mrbungie

0 replies

2024-05-26 17:46:15 UTC

If it worries you, maybe open an issue? No sane man would allow a weird license that's an API call away from screwing up your own products.

pama

0 replies

22h9m

2024-05-26 20:19:16 UTC

Unfortunately this statement would not offer sufficient legal protection, so the original authors would have to be convinced to give up their previous rights and change the upstream copyright (and huggingface should update their repo license statement). Of course, these days it is typically easy enough to reimplement the code from a paper in plain pytorch, so I’m not sure one needs all this huggingface repo with the extra framework and risk, but to me it doesnt fit the requirement of the OP question.

bitvoid

0 replies

1d8h

2024-05-26 10:01:14 UTC

That's sort of confusing (to me at least) because that particular header also lists MIT and Apache licenses.

eli_gottlieb

1 replies

1d13h

2024-05-26 04:33:29 UTC

Besides huggingface there's also the DDPT repo: https://github.com/lucidrains/denoising-diffusion-pytorch/

ilaksh

0 replies

1d12h

2024-05-26 05:46:34 UTC

Nice. But is that a diffusion transformer?

ilaksh

0 replies

1d12h

2024-05-26 05:48:26 UTC

I found this:

https://paperswithcode.com/paper/scalable-diffusion-models-w...

https://github.com/mindspore-lab/mindone/tree/master/example...

davidguetta

3 replies

23h55m

2024-05-26 18:32:40 UTC

The train loop is wrong no ? neither x0s and eps are used in the expression of xts so it loons like your training to predict random noise

fisian

1 replies

23h34m

2024-05-26 18:54:05 UTC

Yes, should be the same as the equation before. Like this:

  xts = alpha_bar[t].sqrt() * x0s + (1.-alpha_bar[t]).sqrt() * eps

Additionally, the code isn't consistent. In the sampling code a time embedding is used, while in training it isn't.

reasonableklout

0 replies

21h22m

2024-05-26 21:05:51 UTC

Oops, you're right. Fixed, thanks.

kmacdough

0 replies

22h47m

2024-05-26 19:41:11 UTC

Not sure which eq you refer to, but from what I understand, the network never network "sees" the correct images. Rather, the network must learn to infer the information indirectly through the loss function.

The loss function encodes information about the noise and, because the network sees the noised up image exactly, this is equivalent to learning about the actual sample images. It's worth noting that you could design a loss function measuring the difference between the output and the real images. This contains equivalent information, but it turns out that the properties of gaussian noise make it much more conducive estimating the gradient.

But point being, the information on the true images is in the loop albeit only through the lense of some noise.

Tao3300

2 replies

1d1h

2024-05-26 17:10:15 UTC

I spent 2022 learning to draw and was blindsided by the rise of AI art models like Stable Diffusion. Suddenly, the computer was a better artist than I could ever hope to be.

I hope the author stuck with it anyway. The more AI encroaches on creative work, the more I want to tear it all down.

ctippett

1 replies

20h34m

2024-05-26 21:53:35 UTC

Conversely I've become more motivated to draw things and try my hand at digital art since being exposed to Stable Diffusion, Midjourney et. al. I take the output from these tools and then attempt to recreate or trace over them.

arvinsim

0 replies

11h9m

2024-05-27 07:18:56 UTC

People who do art for art's sake will do it regardless of AI.

After all, photography didn't stop people from drawing or painting.

sidcool

1 replies

1d10h

2024-05-26 07:46:47 UTC

This is a great post

b33j0r

0 replies

19h6m

2024-05-26 23:21:42 UTC

I really like how readable the layout is. (Why do I waste so much time making hard-to-read layouts?)

The only disappointment came when I hit Reader View—almost to prove “this page is semantically perfect!!”—and alas, the nav list has a line height less than one in that medium, and scrunches up all crazy. I’ll let it slide ;)

sashank_1509

1 replies

1d2h

2024-05-26 16:13:22 UTC

Good post, I always thought diffusion originated from score matching, today I realized diffusion came before score matching theory, so when OpenAI trained on 250 million images, they didn’t even have great theory explaining why they were modeling the underlying distribution. Gutsy move

reasonableklout

0 replies

20h55m

2024-05-26 21:33:14 UTC

The original Dickstein 2015 paper [1] formulated diffusion as maximizing (a lower bound of) log-likelihood of generating the distribution, so there was some theory. But my understanding is that the breakthrough was empirical results from Ho [2] and Nichol [3] showing diffusion could produce not only high-quality samples but better than GANs in some cases.

[1] https://arxiv.org/abs/1503.03585 [2] https://arxiv.org/abs/2006.11239 [3] https://arxiv.org/abs/2105.05233

kmacdough

0 replies

23h50m

2024-05-26 18:38:17 UTC

Thanks for sharing. This has given me much more insight into how and why diffusion models work. Randomness is oddly powerful. Time to try and code one up in some suitably unsuitable language.

Not much to TL;DR for the comment lurkers. This post is the TL;DR of stable diffusion.