return to table of content

Stable Diffusion 3

patates
95 replies
4h44m

Half of the announcement talks about safety. The next step will be these control mechanisms being built into all sorts of software I suppose.

It's "safe" for them, not for the users, at least they should make that clear.

s1k3s
35 replies
4h23m

I truly wonder what "unsafe" scenarios an image generator could be used for? Don't we already have software that can do pretty much anything if a professional human is using it?

t_von_doom
29 replies
4h14m

I would say the barrier to entry is stopping a lot of ‘candid’ unsafe behaviour. I think you allude to it yourself in implying currently it requires a professional to achieve the same results.

But giving that ability to _everyone_ will lead to a huge increase in undesirable and targeted/local behaviour.

Presumably it enables any creep to generate what they want by virtue of being able to imagine it and type it, rather than learn a niche skill set or employ someone to do it (who is then also complicit in the act)

hypocrticalCons
28 replies
4h8m

"undesirable local behavior"

Why don't you just say you believe thought crime should be punishable?

wongarsu
21 replies
3h53m

I imagine they might talk about things like students making nudes of their classmates and distributing them.

Or maybe not. It's hard to tell when nobody seems to want to spell out what behaviors we want to prevent.

4bpp
13 replies
3h41m

Would it be illegal for a student who is good at drawing to paint a nude picture of an unknowing classmate and distribute it?

If yes, why doesn't the same law apply to AI? If no, why are we only concerned about it when AI is involved?

Cthulhu_
11 replies
3h20m

Because AI lowers the barrier to entry; using your example, few people have the drawing skills (or the patience to learn them) or take the effort to make a picture like that, but the barrier is much lower when it takes five seconds of typing out a prompt.

Second, the tool will become available to anyone, anywhere, not just a localised school. If generating naughty nudes is frowned upon in one place, another will have no qualms about it. And that's just things that are about decency, then there's the discussion about legality.

Finally, when person A draws a picture, they are responsible for it - they produced it. Not the party that made the pencil or the paper. But when AI is used to generate it, is all of the responsibility still with the person that entered the prompt? I'm sure the T's and C's say so, but there may still be lawsuits.

4bpp
6 replies
3h13m

Right, these are the same arguments against uncontrolled empowerment that I imagine mass literacy and the printing press faced. I would prefer to live in a society where individual freedom, at least in the cognitive domain, is protected by a more robust principle than "we have reviewed the pros and cons of giving you the freedom to do this, and determined the former to outweigh the latter for the time being".

pixl97
4 replies
2h46m

You seem to be very confused about civil versus criminal penalties....

Feel free to make an AI model that does almost anything, though I'd probably suggest that it doesn't make porn of minors as that is criminal in most jurisdiction, short of that it's probably not a criminal offense.

Most companies are only very slightly worried about criminal offenses, they are far more concerned about civil trials. There is a far lower requirement for evidence. AI creator in email "Hmm, this could be dangerous". That's all you need to lose a civil trial.

Sohcahtoa82
1 replies
1h16m

You seem to be very confused about civil versus criminal penalties....

Nah, I think it's a disagreement over whether a tool's maker gets blamed for evil use or the tool's user.

It's a similar argument over whether or not gun manufacturers should have any liability for their products being used for murder.

pixl97
0 replies
10m

It's a similar argument over whether or not gun manufacturers

This is really only a debate in the US and only because it's directly written in the constitution. Pretty much no other product works that way.

4bpp
1 replies
1h28m

Why do you figure I would be confused? Whether any liability for drawing porn of classmates is civil or criminal is orthogonal to the AI comparison. The question is if we would hold manufacturers of drawing tools or software, or purveyors of drawing knowledge (such as learn-to-draw books), liable, because they are playing the same role as the generative AI does here.

pixl97
0 replies
9m

Because you seem to be very confused on civil liabilities in most products. Manufactures are commonly held liable for the users use of products, for example look at any number of products that have caused injury.

darkwater
0 replies
2h23m

Are we on the same HN that bashes Facebook/Twitter/X/TikTok/ads because they manipulate people, spread fake news or destroyed attention span?

sssilver
2 replies
2h9m

Photoshop also lowers that barrier of entry compared to pen and pencil. Paper also lowers the barrier compared to oil canvas.

Affordable drawing classes and YouTube drawing tutorials lower the barrier of entry as well.

Why on earth would manufacturers of pencils, papers, drawing classes, and drawing software feel responsible for censoring the result of combining their tool with the brain of their customer?

A sharp kitchen knife significantly lowers the barrier of entry to murder someone. Many murders are committed everyday using a kitchen knife. Should kitchen knife manufacturers blog about this every week?

freedomben
1 replies
1h43m

I agree with your point, but I would be willing to bet that if knives were invented today rather than having been around awhile, they would absolutely be regulated and restricted to law enforcement if not military use. Hell, even printers, maybe not if invented today but perhaps in a couple years if we stay on the same trajectory, would probably require some sort of ML to refuse to print or "reproduce" unsafe content.

I guess my point is that I don't think we're as inconsistent as a society as it seems when considering things like knives. It's not even strictly limited to thought crimes/information crimes. If alcohol were discovered today , I have no doubt that it would be banned and made schedule I

Sohcahtoa82
0 replies
1h9m

Hell, even printers, maybe not if invented today but perhaps in a couple years if we stay on the same trajectory, would probably require some sort of ML to refuse to print or "reproduce" unsafe content.

Fun fact: Many scanners and photocopiers will detect that you're trying to scan/copy a banknote and will refuse to complete the scan. One of the ways is detecting the EURion Constellation.

https://en.wikipedia.org/wiki/EURion_constellation

SV_BubbleTime
0 replies
3h0m

Can you point to other crimes that are based on skill or effort?

hospadar
0 replies
2h40m

IANAL but that sounds like harrassment, I assume the legality of that depends on the context (did the artist previously date the subject? lots of states have laws against harassment and revenge porn that seem applicable here [1]. are you coworkers? etc), but I don't see why such laws wouldn't apply to AI generated art as well. It's the distribution that's really the issue in most cases. If you paint secret nudes and keep them in your bedroom and never show them to anyone it's creepy, but I imagine not illegal.

I'd guess that stability is concerned with their legal liability, also perhaps they are decent humans who don't want to make a product that is primarily used for harassment (whether they are decent humans or not, I imagine it would affect the bottom line eventually if they develop a really bad rep, or a bunch of politicians and rich people are targeted by deepfake harassment).

[1] https://www.cagoldberglaw.com/states-with-revenge-porn-laws/...

^ a lot of, but not all of those laws seem pretty specific to photographs/videos that were shared with the expectation of privacy and I'm not sure how they would apply to a painting/drawing, and I certainly don't know how the courts would handle deepfakes that are indistinguishable from genuine photographs. I imagine juries might tend to side with the harassed rather than a bully who says "it's not illegal cause it's actually a deepfake but yeah i obviously intended to harass the victim"

polski-g
3 replies
2h30m

Such activity is legal per Ashcroft v Free Speech Coalition (2002). Artwork cannot be criminalized because of the contents of it.

nickthegreek
2 replies
1h49m

Artwork is currently criminalized because of its contents. You cannot paint nude children engaged in sex acts.

polski-g
1 replies
1h27m

The case I literally just referenced allows you to paint nude children engaged in sex acts.

The Ninth Circuit reversed, reasoning that the government could not prohibit speech merely because of its tendency to persuade its viewers to engage in illegal activity.[6] It ruled that the CPPA was substantially overbroad because it prohibited material that was neither obscene nor produced by exploiting real children, as Ferber prohibited.[6] The court declined to reconsider the case en banc.[7] The government asked the Supreme Court to review the case, and it agreed, noting that the Ninth Circuit's decision conflicted with the decisions of four other circuit courts of appeals. Ultimately, the Supreme Court agreed with the Ninth Circuit.
nickthegreek
0 replies
19m

I appreciate you taking the time to lay that out, I was under the opposite impression for US law.

hypocrticalCons
1 replies
3h51m

Students already share nudes every day.

Where are the Americans asking about Snapchat? If I were a developer at Scnapchat I could prolly open a few Blob Storage accounts and feed a darknet account big enough to live off of. You people are so manipulatable.

jncfhnb
0 replies
3h42m

Students don’t share photorealistic renders of nude classmates getting gangbanged though

AuryGlenz
0 replies
3h22m

That's not even necessarily a bad thing (as a whole - individually it can be). Now, any leaked nudes can be claimed to be AI. That'll probably save far more grief than it causes.

pixl97
2 replies
3h3m

What do you mean should be... it 100% is.

In a large number of countries if you create an image that represents a minor in a sexual situation you will find yourself on the receiving side of the long arm of the law.

If you are the maker of an AI model that allows this, you will find yourself on the receiving side of the long arm of the law.

Moreso, many of these companies operate in countries where thought crime is illegal. Now, you can argue that said companies should not operate in those countries, but companies will follow money every time.

hypocrticalCons
1 replies
2h21m

I think it's pretty important to specify that you have to willingly seek and share all of these illegal items. That's why this is so sketch. These things are being baked with moral codes that'll _share_ the information, incriminating everyone. Like why? Why not just let it work and leave it up to the criminal to share their crimes? People are such authoritarian shit-stains, and acting like their existence is enough to justify their stance is disgusting.

pixl97
0 replies
1h25m

I think it's pretty important to specify that you have to willingly seek and share all of these illegal items.

This is not obvious at all when it comes to AI models.

People are such authoritarian shit-stains

Yes, but this is a different conversation altogether.

mempko
1 replies
2h39m

Once it is outside your mind and in a physical form, is it still just a thought sir?

hypocrticalCons
0 replies
2h27m

In my country there is legal precedent setting that private, unshared documents are tantamount to thought.

KittenInABox
0 replies
3h56m

[Edited: I'm realizing the person I'm responding to is kinda unhinged, so I'm retracting out of the convo.]

martiuk
1 replies
3h59m

Similar to why Google's latest image generator refuses to produce a correct image of a 'Realistic, historically accurate, Medieval English King'. They have guard rails and system prompts set up to force the output of the generator with the company's values, or else someone would produce Nazi propaganda or worse. It (for some reason) would be attributed to Google and their AI, rather than the user who found the magic prompt words.

s1k3s
0 replies
1h7m

Yeah this is probably the most realistic reason

fragmede
0 replies
3h16m

For some scenarios, it's not the image itself but the associations that the model might possibly make from being fed a diet of 4chan and Stormfront's unofficial YouTube channel. The worry is over horrible racist shit, like if you ask it for a picture of a black person, and it outputs a picture of a gorilla. Or if you ask it for a picture of a bad driver, and it only manages to output pictures of Asian women. I'm sure you can think up other horrible stereotypes that would result in a PR disaster.

Sharlin
0 replies
4h3m

Eh, a professional human could easily lockpick the majority of front doors out there. Nevertheless I don't think we're going to give up on locking our doors any time soon.

PeterisP
0 replies
3h12m

The major risky use cases for image generators are (a) sexual imagery of kids and (b) public personalities in various contexts usable for propaganda.

BryanLegend
21 replies
3h58m

From George Hotz on Twitter (https://twitter.com/realGeorgeHotz/status/176060391883954211...)

"It's not the models they want to align, it's you."

jtr1
18 replies
3h48m

What specific cases are being prevented by safety controls that you think should be allowed?

bonton89
5 replies
3h32m

Well for starters, ChatGPT shouldn't balk at creating something "in Tim Burton's style" just because Tim Burton complained about AI. I guess its fair use unless a select rich person who owns the data complains. Seems like it isn't fair use at all then, just theft from those who cannot legally defend themselves.

archontes
4 replies
3h9m

Fair use is an exception to copyright. The issue here is that it's not fair use, because copyright simply does not apply. Copyright explicitly does not, has never, and will never protect style.

SamBam
2 replies
3h0m

Didn't Tom Waits successfully sue Frito Lay when the company found an artist that could closely replicate his style and signature voice, who sang a song for a commercial that sounded very Tom Waits-y?

dangrossman
0 replies
2h34m

Yes, though explicitly not for copyright infringement. Quoting the court's opinion, "A voice is not copyrightable. The sounds are not 'fixed'." The case was won under the theory of "voice misappropriation", which California case law (Midler v Ford Motor Co) establishes as a violation of the common law right of publicity.

aimor
0 replies
2h33m

Yes but that was not a copyright or trademark violation. This article explained it to me:

https://grr.com/publications/hey-thats-my-voice-can-i-sue-th...

bonton89
0 replies
2h9m

That makes it even more ridiculous, as that means they are giving rights to rich complaining people that no one has.

Examples: Can you great an image of a cat in Tim Burton's style? Oops! Try another prompt Looks like there are some words that may be automatically blocked at this time. Sometimes even safe content can be blocked by mistake. Check our content policy to see how you can improve your prompt.

Can you create an image of a cat in Wes Anderson's style? Certainly! Wes Anderson’s distinctive style is characterized by meticulous attention to detail, symmetrical compositions, pastel color palettes, and whimsical storytelling. Let’s imagine a feline friend in the world of Wes Anderson...

AuryGlenz
4 replies
3h26m

As far as Stable Diffusion goes - when the released SD 2.1/XL/Stable Cascade, you couldn't even make a (woman's) nipple.

I don't use them for porn like a lot of people seem too, but it seems weird to me that something that's kind of made to generate art can't generate one of the most common subjects in all of art history - nude humans.

araes
1 replies
1h13m

I seem to have the opposite problem a lot of the time. I tried using Meta's image gen tool, and had such a time trying to get it to make art that was not "kind of" sexual. It felt like Facebook's entire learning chain must have been built on people's sexy images of their girlfriend that's all now hidden in the art.

These were examples that were not super blatant, like a tree landscape that just happens to have a human figure and cave in their crotch. Examples:

https://i.imgur.com/RlH4NNy.jpg - Art is very focused on the monster's crotch

https://i.imgur.com/0M8RZYN.jpg - The comparison should hopefully be obvious

Fischgericht
0 replies
59m

Not meant in a rude way, but please consider that your brain is making these up and you might need to see a therapist. I can see absolutely nothing "kind of sexual" in those two pictures.

b33j0r
0 replies
3h16m

For some reason its training thinks they are decorative, I guess it’s a pretty funny elucidation of how it works.

I have seen a lot of “pasties” that look like Sorry! game pieces, coat buttons, and especially hell-forged cybernetic plumbuses. Did they train it at an alien strip club?

The LoRAs and VAEs work (see civit.ai), but do you really want something named NSFWonly in your pipeline just for nipples? Haha

Aeolun
0 replies
3h13m

I’m not sure if they updated them to rectify those “bugs” but you certainly can now.

rmi_
2 replies
3h28m

Tell me what they mean by "safety controls" first. It's very vaguely worded.

DALL-E, for example, wrongly denied serveral request of mine.

bergen
0 replies
2h27m

You are using someone elses propietary technology, you have to deal with their limitations. If you don't like there are endless alternatives.

"Wrongly denied" in this case depends on your point of view, clearly DALL-E didn't want this combination of words created, but you have no right for creation of these prompts.

I'm the last one defending large monolithic corps, but if you go to one and want to be free to do whatever you want you are already starting from a very warped expectation.

Aeolun
0 replies
3h6m

I don’t feel like it truly matters since they’ll release it and people will happily fine-tune/train all that safety right back out.

It sounds like a reputation/ethics thing to me. You probably don’t want to be known as the company that freely released a model that gleefully provides images of dismembered bodies (or worse).

stale2002
0 replies
53m

Oh the big one would be models weights being released for anyone to use or fine tune themselves.

Sure, the safety people lost that battle for Stable diffusion and LLama. And because they lost, entire industries were created by startups that could now use models themselves, without it being locked behind someone else's AI.

But it wasn't guaranteed to go that way. Maybe the safetyists could have won.

I don't we'd be having our current AI revolution if facebook or SD weren't the first to release models, for anyone to use.

slily
0 replies
2h48m

Parody and pastiche

miohtama
0 replies
2h26m
Tomte
0 replies
3h37m

Not specifically SD, but DallE: I wanted to get an image of a pure white British shorthair cat on the arm of a brunette middle-aged woman by the balcony door, both looking outside.

It wasn‘t important, just something I saw in the moment and wanted to see what DallE makes of it.

Generation denied. No explanation given, I can only imagine that it triggered some detector of sexual request?

(It wasn‘t the phrase "pure white", as far as I can tell, because I have lots of generated pics of my cat in other contexts)

thefourthchime
0 replies
1h38m

No, it's the cacophony of zealous point scores on X they want to avoid.

dang
0 replies
46m

We detached this subthread from https://news.ycombinator.com/item?id=39466910.

beefield
11 replies
3h16m

I get a slightly uncomfortable feeling with this talk about AI safety. Not in the sense that there is anything wrong with that (may be or may be not), but in the sense I don't understand what people are talking about when they talk about safety in this context. Could someone explain like I have Asperger (ELIA?) whats this about? What are the "bad actors" possibly going to do? Generate (child) porn/ images with violence etc. and sell them? Pollute the training data so that the racist images pops up when someone wants to get an image of a white pussycat? Or produce images that contain vulnerabilities so that when you open that in your browser you get compromised? Or what?

Tadpole9181
5 replies
3h10m

Could someone explain like I have Asperger (ELIA?)

Excuse me?

beefield
4 replies
2h36m

You sound offended. My apologies. I had no intention whatsoever to offend anyone. Even if I am not diagnosed, I think I am at least borderline somewhere in the spectrum, and thought that would be a good way to ask people explain without assuming I can read between the lines.

Tadpole9181
3 replies
2h29m

Let's just stick with the widely understood "Explain Like I'm 5" (ELI5). Nobody knows you personally, so this comes off quite poorly.

beefield
2 replies
2h18m

I think ELI5 means that you simplify a complex issue so that even a small kid understands it. In this case there is no need to simplify anything, just explain what a term actually means without assuming reader understanding nuances of terms used. And I still do not quite get how ELIA can be considered hostile, but given the feedback, maybe I avoid it in the future.

Tadpole9181
1 replies
2h4m

Saying "explain like I have <specific disability>" is blatantly inappropriate. As a gauge: Would you say this to your coworkers? Giving a presentation? Would you say this in front of (a caretaker for) someone with Autism? Especially since Asperger's hasn't even been used in practice for, what, over a decade?

In this case there is no need to simplify anything

Then just ask the question itself.

charcircuit
0 replies
1h8m

AI isn't a coworker, not a human so it's not as awkward to talk about one's disability.

reaperman
2 replies
3h5m

I'm not part of Stability AI but I can take a stab at this:

explain like I have ~~Asperger (ELIA?)~~ limited understanding of how the world really works.

The AI is being limited so that it cannot produce any "offensive" content which could end up on the news or go viral and bring negative publicity to Stability AI.

Viral posts containing generated content that brings negative publicity to Stability AI are fine as long as they're not "offensive". For example, wrong number of fingers is fine.

There is not a comprehensive, definitive list of things that are "offensive". Many of them we are aware of - e.g. nudity, child porn, depictions of Muhammad. But for many things it cannot be known a priori whether the current zeitgeist will find it offensive or not (e.g. certain depictions of current political figures, like Trump).

Perhaps they will use AI to help decide what might be offensive if it does not explicitly appear on the blocklist. They will definitely keep updating the "AI Safety" to cover additional offensive edge cases.

It's important to note that "AI Safety", as defined above (cannot produce any "offensive" content which could end up on the news or go viral and bring negative publicity to Stability AI) is not just about facially offensive content, but also about offensive uses for milquetoast content. Stability AI won't want news articles detailing how they're used by fraudsters, for example. So there will be some guards on generating things that look like scans of official documents, etc.

beefield
1 replies
2h32m

So it's just fancy words for safety (legal/reputational) for Stability AI, not users?

reaperman
0 replies
1h50m

Yes*. At least for the purposes of understanding what the implementations of "AI safety" are most likely to entail. I think that's a very good cognitive model which will lead to high fidelity predictions.

*But to be slightly more charitable, I genuinely think Stability AI / OpenAI / Meta / Google / MidJourney believe that there is significant overlap in the set of protections which are safe for the company, safe for users, and safe for society in a broad sense. But I don't think any released/deployed AI product focuses on the latter two, just the first one.

Examples include:

Society + Company: Depictions of Muhammad could result in small but historically significant moments of civil strife/discord.

Individual + Company: Accidentally generating NSFW content at work could be harmful to a user. Sometimes your prompt won't seem like it would generate NSFW content, but could be adjacent enough: e.g. "I need some art in the style of a 2000's R&B album cover" (See: Sade - Love Deluxe, Monica - Makings of Me, Rihanna - Unapologetic, Janet Jackson - Damita Jo)

Society + Company: Preventing the product from being used for fraud. e.g. CAPTCHA solving, fraudulent documentation, etc.

Individual + Company: Preventing generation of child porn. In the USA, this would likely be illegal both for the user and for the company.

vprcic
0 replies
3h10m
Q6T46nT668w6i3m
0 replies
3h9m

The bad actor might be the model itself, e.g., returning unwanted pornography or violence. Do you have a problem with Google’s SafeSearch?

root_axis
4 replies
4h10m

This is the world we live in. CYA is necessary. Politicians, media organizations, activists and the parochial masses will not brook a laissez faire attitude towards the generation of graphic violence and illegal porn.

Sharlin
2 replies
4h5m

Not even legal porn, unfortunately. Or even the display of a single female nipple…

realusername
1 replies
3h59m

looking at the manual censorship of the big channels on youtube, you don't even need to display anything, just suggesting it is enough to get a strike.

(of course unless you are into yoga, then everything is permitted)

Sohcahtoa82
0 replies
1h5m

(of course unless you are into yoga, then everything is permitted)

...or children's gymnastics.

hypocrticalCons
0 replies
4h3m

This is the world we live in.

Great talk about slavery and religious-persecution, Jim! Wait, what were we talking about? Fucking American fascists trying to control our thoughts and actions, right right.

matthewmacleod
3 replies
4h19m

I really wish that every discussion about a new model didn’t rapidly become a boring and shallow discussion about AI safety.

jprete
1 replies
4h2m

AI is not an engineered system; it's emergent behavior from a system we can vaguely direct but do not fundamentally understand. So it's natural that the boundaries of system behavior would be a topic of conversation pretty much all the time.

EDIT: Boring and shallow are, unfortunately, the Internet's fault. Don't know what to do about those.

PeterisP
0 replies
3h9m

At least in some latest controversies (e.g. Gemini generation of people) all of the criticized behavior was not emergent from ML training, but explicitly intentionally engineered manually.

cypress66
0 replies
2h32m

This announcement only mentions safety. What else do you expect to talk about?

wongarsu
2 replies
3h58m

What's equally interesting is that while they spend a lot of words on safety, they don't actually say anything. The only hint what they even mean by safety is that they took "reasonable steps" to "prevent misuse by bad actors". But it's hard to be more vague than that. I still have no idea what they did and why they did it, or what the threat model is.

Maybe that will be part of future papers or the teased technical report. But I find it strange to put so much emphasis on safety and then leave it all up to the reader's imagination.

fortran77
1 replies
3h53m

Remember when AI safety meant the computers weren’t going to kill us?

SV_BubbleTime
0 replies
3h3m

Now people spend a lot of time making them worse to ensure we don’t see boobs.

hypocrticalCons
1 replies
4h9m

BTW Nvidia and AMD are baking safety mechanisms into the fucking video drivers

No where is safe

jprete
0 replies
4h0m

Do you have a reference on this?

hedora
1 replies
4h11m

PSA: There are now calls to embed phone-home / remote kill switch mechanisms into hardware because “AI safety”.

newzisforsukas
0 replies
3h51m

examples? seems like it would be easier to instead communicate with ISPs.

TulliusCicero
1 replies
2h40m

I agree with you, but when companies don't implement these things, they get absolutely trashed in the press & social media, which I'm sure affects their business.

What would you have them do? Commit corporate suicide?

TylerLives
0 replies
57m

This is a good question. I think it would be best for them to give some sort of signal, which would mean "We're doing this because we have to. We are willing to change if you offer us an alternative." If enough companies/people did this, at some point change would become possible.

wiz21c
0 replies
4h32m

They rather talk about "reasonable steps" to safety. Sounds like "just the minimum so we don't end up in legal trouble" to me...

tasty_freeze
0 replies
4h23m

There is some truth in what you say, just like saying you're a "free speech absolutist" sounds good at first blush. But the real world is more complicated, and the provider adds safety features because they have to operate in the real world and not just make superficial arguments about how things should work.

Yes, they are protecting themselves from lawsuits, but they are also protecting other people. Preventing people asking for specific celebrities (or children) having sex is for their benefit too.

spir
0 replies
4h32m

thanks, i hadn't fully realized that 'safety' means 'safe to offer' and not 'safe for users'. i won't forget it

mempko
0 replies
2h42m

I think this AI safety thing is great. These models will be used by people to make boring art. The exciting art will be left for people to make.

This idea of AI doing the boring stuff is good. Nothing prevents you from making exciting, dangerous, or 'unsafe' art on your own.

My feeling is that most people who are upset about AI safety really just mean they want it to generate porn. And because it doesn't, they are upset. But they hide it under the umbrella of user freedom. You want to create porn in your bedroom? Then go ahead and make some yourself. Nothing stopping you, the person, from doing that.

dmezzetti
0 replies
3h37m

Any large publicly available model has no choice but to do this. Otherwise, they're petrified of a PR nightmare.

Models with a large user base will have an inverse relationship with usability. That's why it's important to have options to train your own with open source.

acomjean
0 replies
1h6m

I think this isn’t software as much as a service. When viewed through this lens the guard rails make more sense.

Spivak
0 replies
4h19m

It's also "safety" in the sense that you can deploy it as part of your own application without human review and not have to worry that it's gonna generate anything that will get you in hot water.

13of40
45 replies
2h16m

"we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors"

It's kind of a testament to our times that the person who chooses to look at synthetic porn instead of supporting a real-life human trafficking industry is the bad actor.

sigmoid10
32 replies
2h4m

I don't think the problem is watching synthetic images. The problem is generating them based off actual people and sharing them on the internet in a way that the people watching can't tell the difference anymore. This was already somewhat of a problem with Photoshop and once everyone with zero skills can do it in seconds and with far better quality, it will become a nightmare.

idle_zealot
15 replies
1h51m

once everyone with zero skills can do it in seconds and with far better quality, it will become a nightmare.

Will it be a nightmare? If it becomes so easy and common that anyone can do it, then surely trust in the veracity of damaging images will drop to about 0. That loss of trust presents problems, but not ones that "safe" AI can solve.

sigmoid10
3 replies
1h45m

surely trust in the veracity of damaging images will drop to about 0

Maybe, eventually. But we don't know how long it will take (or if it will happen at all). And the time until then will be a nightmare for every single woman out there who has any sort of profile picture on any website. Just look at how celebrity deepfakes got reddit into trouble even though their generation was vastly more complex and you could still clearly tell that the videos were fake. Now imagine everyone can suddenly post undetectable nude selfies of your girlfriend on nsfw subreddits. Even if people eventually catch on, that first shock will be unavoidable.

swatcoder
0 replies
1h25m

Your anxiety dream relies on there currently being some technical bottleneck limiting the creation or spread of embarassing fake nudes as a way of cyberbullying.

I don't see any evidence of that. What I see is that people who want to embarass and bully others are already fully enabled to do so, and do so.

It seems more likely to me and many of us that the bottleneck that stops it from being worse is simply that only so many people think it's reasonable or satisfying to distribute embarassing fake nudes of someone. Society already shuns it and it's not that effective as a way of bullying and embarassing people, so only so many people are moved to bother.

Assuming that the hyped up new product is due to swoop in and disrupt the cyberbullying "industry" is just a classic technologist's fantasy.

It ignores all the boring realities of actual human behavior, social norms, and secure equilibriums, etc; skips any evidence building or research effort; and just presumes that some new technology is just sooooo powerful that none of that prior ground truth stuff matters.

I get why people who think that way might be on HN or in some Silicon Valley circles, but it can be one of the eyeroll-inducing vices of these communities as much as it can be one of its motivational virtues.

mdasen
0 replies
1h1m

This: it won't happen immediately and I'd go even further to say that it even if trust in images drops to zero, it's still going to generate a lot of hell.

I've always been able to say all sorts of lies. People have known for millennia that lies exist. Yet lies still hurt people a ton. If I say something like, "idle_zealot embezzled from his last company," people know that could be a lie (and I'm not saying you did, I have no idea who you are). But that kind of stuff can certainly hurt people. We all know that text can be lies and therefore we should have zero trust in any text that we read - yet that isn't how things play out in the real world.

Images are compelling even if we don't trust that they're authentic. Hell, paintings were used for thousands of years to convey "truth", but a painting can be a lie just as much as text or speech.

We created tons of religious art in part because it makes the stories people want others to believe more concrete for them. Everyone knows that "Christ in the Storm on the Sea of Galilee" isn't an authentic representation of anything. It was painted in 1633, more than a century and a half after the event was purported to have happened. But it's still the kind of thing that's powerful.

An AI generated image of you writing racist graffiti is way more believable to be authentic. I have no reason to think you'd do such a thing, but it's within the realm of possibility. There's zero possibility (disregarding supernatural possibilities) that Rembrandt could accurately represent his scene in "Christ in the Storm on the Sea of Galilee". What happens when all the search engine results for your name start calling you a racist - even when you aren't?

The fact is that even when we know things can be faked, we still put a decent amount of trust in them. People spread rumors all the time. Did your high school not have a rumor mill that just kinda destroyed some kids?

Heck, we have right-wing talking heads making up outlandish nonsense that's easily verifiable as false that a third of the country believes without questioning. I'm not talking about stuff like taxes or gun control or whatever - they're claiming things like schools having to have litter boxes for students that identify as cats (https://en.wikipedia.org/wiki/Litter_boxes_in_schools_hoax). We know that people lie. There should be zero trust in a statement like "schools are installing litter boxes for students that identify as cats." Yet it spread like crazy, many people still believe it despite it being proven false, and it has been used to harm a lot of LGBT students. That's a way less believable story than an AI image of you with a racist tattoo.

Finally, no one likes their name and image appropriated for things that aren't them. We don't like lies being spread about us even if 99% of people won't believe the lies. Heck, we see Donald Trump go on rants about truthful images of him that portray his body in ways he doesn't like (and they're just things like him golfing, but an unflattering pose). I don't want fake naked images of me even if they're literally labeled as fake. It still feels like an invasion of privacy and in a lot of ways it would end up that way - people would debate things like "nah, her breasts probably aren't that big." Words can hurt. Images can hurt even more - even if it's all lies. There's a reason why we created paintings even when we knew that paintings weren't authentic: images have power and that power is going to hurt people even more than the words we've always been able to use for lies.

tl;dr: 1) It will take a long time before people's trust in images "drops to zero"; 2) Even when people know an image isn't real, it's still compelling - it's why paintings have existed and were important politically for millennia; 3) We've always known speech and text can be lies, but we regularly see lies believed and hugely damage people's lives - and images will always be more compelling than speech/text; 4) Even if no one believes something is true, there's something psychologically damaging about someone spreading lies about you - and it's a lot worse when they can do it with imagery.

jquery
0 replies
1h33m

The tide is rolling in and we have two options... yell at the tide really loud that we were here first and we shouldn't have to move... or get out of the way. I'm a lot more sympathetic to the latter option myself.

BryantD
3 replies
1h23m

Let me give you a specific counterexample: it's easy and common to generate phishing emails. Trust in email has not dropped to the degree that phishing is not a problem.

Al-Khwarizmi
2 replies
1h17m

Phishing emails mostly work because they apparently come from a trusted source, though. The key is that they fake the source, not that people will just trust random written words just because they are written, as they do with videos.

A better analogy would be Nigerian prince emails, but only a tiny minority of people believe those... or at least that's what I want to think!

BryantD
1 replies
1h1m

The trusted source thing is important, but there's some degree of evidence that videos and images generate trust in a source, I think?

amenhotep
0 replies
33m

That's the point. They do, but they no longer should. Our technical capabilities for lying have begun to overwhelm the old heuristics, and the sooner people realise the better.

Sohcahtoa82
2 replies
1h28m

if it becomes so easy and common that anyone can do it, then surely trust in the veracity of damaging images will drop to about 0.

Spend more time on Facebook and you'll lose your faith in humanity.

I've seen obviously AI generated pictures of a 5 year old holding a chainsaw right next to a beautiful wooden sculpture, and the comments are filled with boomers amazed at that child's talent.

There are still people that think the IRS will call them and make them pay their taxes over the phone with Apple gift cards.

SkyBelow
1 replies
1h22m

If we follow the idea of safety, should we restrict the internet so either such users can safely use the internet (and phones, gift cards, technology in general) without being scammed, or otherwise restrict it so that at risk individuals can't use the technology at all?

Otherwise, why is AI specifically being targeted, other than the fear of new things that looks similar to the moral panics of video games.

themoonisachees
0 replies
28m

In concept this is maybe desirable; boot anyone off the internet that isn't able to use it safely.

In reality this is a disaster. The elderly and homeless people are already being left behind massively by a society that believes internet access is something everybody everywhere has. This is somewhat fine when the thing they want to access is twitter (and even then, even with the current state of twitter, who are you to judge who should and should not be on it?), but it becomes a Major Problem™ when the thing they want to access is their bank. Any technological solutions you just thought about for this problem are not sufficient when we're talking about "Can everybody continue to live their lives considering we've kinda thrust the internet on them without them asking"

IanCal
2 replies
1h28m

If it becomes so easy and common that anyone can do it, then surely trust in the veracity of damaging images will drop to about 0

People believe plenty of just written words - which are extremely easy to "fake", you just type them. Why has that trust not dropped to about 0?

UberFly
0 replies
1h21m

Exactly. They are giving people's deductive reasoning skills too much credit.

Al-Khwarizmi
0 replies
1h20m

It kind of has? People believe written words when they come from a source that they consider, erroneously or not, to be trustworthy (newspaper, printed book, Wikipedia, etc.). They trust the source, not the words themselves just due to being written somewhere.

This has so far not been true of videos (e.g. a video of a celebrity from a random source has typically been trusted by laypeople) and should change.

foobarian
0 replies
1h46m

Arguably that loss of trust would be a net positive.

725686
5 replies
1h53m

We are already there, you can no longer trust any image or video you see, so what is the point? Bad actors will still be able to create fake images and videos as they already do. Limiting it for the average user is stupid.

mplewis
3 replies
1h51m

You guys know you can just draw porn, right?

seanmcdirmid
2 replies
1h47m

Generating porn is easier and cheaper. You don’t have to spend the time learning to draw naked bodies, which can be substantial. (The joke being that serious drawers go through the draw naked model sessions a lot, but it isn’t porn)

tourmalinetaco
1 replies
1h35m

but it isn’t porn

In my experience with 2D artists, studying porn is one of their favorite forms of naked model practice.

seanmcdirmid
0 replies
59m

The models art schools get for naked drawing sessions usually aren’t that attractive, definitely not at a porn ideal. The objective is to learn the body, not become aroused.

There is a lot of (mostly non realistic) porn that comes out of art school students via the skills they gain.

sigmoid10
0 replies
1h48m

We are not actually there yet. First, you still need some technical understanding and a somewhat decent setup to run these models yourself without the guardrails. So the average greasy dude who wants to share HD porn based on your daugther's linkedin profile pic on nsfw subreddits still has too many hoops to jump through. Right now you can also still spot AI images pretty easily, if you know what to look for. Especially for previous stable diffusion models. But all of this could change very soon.

fennecbutt
1 replies
31m

But just like privacy issues, this'll be possible.

It's only bad because society still hasn't normalised sex, from a gay perspective y'all are prude af.

It's a shortcut, for us to just accept that these social ideals and expectations will have to change so we may as well do it now.

In 100 years, people will be able to make a personal AI that looks, sounds and behaves like any person they want and does anything they want. We'll have thinking dust, you can already buy cameras like a mm^2, in the future I imagine they'll be even smaller.

At some point it's going to get increasingly unproductive trying to safeguard technology without people's social expectations changing.

Same thing with Google Glass, shunned pretttty much exclusively bc it has a camera on it (even tho phones at the time did too), but now we got Ray Bans camera glasses and 50 years from now all glasses will have cameras, if we even still wear them.

spazx
0 replies
21m

Yes this. This is what I've been trying to explain to my friends.

When Tron came out in 1982, it was disliked because back then using CGI effects was considered "cheating". Then awhile later Pixar did movies entirely with CGI and they were hits. Now almost every big studio movie uses CGI. Shunned to embraced in like, 13 years.

I think over time the general consensus's views about AI models will soften. Although it might take longer in some communities. (Username checks out lol, furry here also. I think the furs may take longer to embrace it.)

(Also, people will still continue to use older tools like Photoshop to accomplish similar things.)

cooper_ganglia
1 replies
1h38m

I watched an old Tom Scott video of him predicting what the distant year 2030 would look like. In his talk, he mentioned privacy becoming something quaint that your grandparents used to believe in.

I’ve wondered for a while if we just adapt to the point that we’re unfazed by fake nude photos of people. The recent Bobbi Althoff “leaks” reminded me of this. That’s a little different since she’s a public figure, but I really wonder if we just go into the future assuming all photos like that have been faked, and if someone’s iCloud gets leaked now it’ll actually be less stressful because 1. They can claim it’s AI images, or 2. There’s already lewd AI images of them, so the real ones leaking don’t really make much of a difference.

flir
0 replies
1h31m

There's an argument that privacy (more accurately anonymity) is a temporary phenomenon, a consequence of the scale that comes with industrialization. We didn't really have it in small villages, and we won't really have it in the global village.

(I'm not a fan of the direction, but then I'm a product of stage 2).

Szpadel
1 replies
1h11m

serious question, is that really that hard to remove personal information from training data so model does not know how specific public figures look like?

I believe this worked with nudity and model when asked generated "smooth" intimate regions (like some kind of doll)

so you could ask for eg. generic president but not any specific one, so it would be very hard to generate anyone specific

amenhotep
0 replies
30m

Proprietary, inaccessible models can somewhat do that. Locally hosted models can simply be trained on what a specific person looks like by the user, you just need a couple dozen photos. Keyword: LoRA.

monitorlizard
0 replies
1h49m

Perhaps I'm being overly contrarian, but from my point of view, I feel that could be a blessing in disguise. For example, in a world where deepfake pornography is ubiquitous, it becomes much harder to tarnish someone's reputation through revenge porn, real or fake. I'm reminded of Syndrome from The Incredibles: "When everyone is super no one will be."

fimdomeio
0 replies
1h46m

The censuring of porn content exists for PR reasons. They just want to have a way to say "we tried to prevent it". If anyone wants to generate porn, then it just needs 30 min of research to find the huge amount of models based on stable diffusion with nsfw content.

If you can generate synthetic images and have a channel to broadcast them, then you could generate way bigger problems then fake celebrity porn.

Not saying that it is not a problem, but rather that it is a problem inherent to the whole tool, not to some specific subjects.

boringuser2
0 replies
1h43m

If that ever becomes an actual problem, our entire society will be at a filter point.

This is the problem with these kind of incremental mitigations philosophically -- as soon as the actual problem were to manifest it would instantly become a civilization-level threat that would only be resolved with drastic restructuring of society.

Same logic for an AI that replaces a programmer. As soon as AI is that advanced the problem requires vast changes.

Incremental mitigations don't do anything.

Salgat
0 replies
1h51m

I'll challenge this idea and say that once it becomes ubiquitous, it actually does more good than harm. Things like revenge porn become pointless if there's no way to prove it's even real, and I have yet to ever see deep fakes of porn amount to anything.

user_7832
9 replies
2h4m

Agree, I think it fundamentally stems from the old conservative view that porn = bad. Morally policing such models is questionable.

echelon
3 replies
1h58m

Horeshoe theory [1] is one of the most interesting viewpoints I've been introduced to recently.

Both sides view censorship as a moral prerogative to enforce their world view.

Some conservatives want to ban depictions of sex.

Some conservatives want to ban LGBT depictions.

Some women's rights folks want to ban depictions sex. (Some view it as empowerment, some view it as exploitation.)

Some liberals want to ban non-diverse, dangerous representation.

Some liberals want to ban conservative views against their thoughts.

Some liberals want to ban religion.

...

It's team sports with different flavors on each side.

The best policy, IMO, is to avoid centralized censorship and allow for individuals to control their own algorithmic boosting / deboosting.

[1] https://en.wikipedia.org/wiki/Horseshoe_theory

stared
1 replies
1h49m

Yes and no.

I mean, a lot of moderates would like to avoid seeing any extreme content, regardless of whether it is too much left, right, or just in a non-political uncanny valley.

While the Horseshoe Theory has some merits (e.g., both left and right extremes may favor justified coercion, have the we-vs-them mentality, etc), it is grossly oversimplified. Still, a very simple (yet two-dimensional) model of Political Compass is much better.

echelon
0 replies
1h41m

I think it's just a different projection to highlight similarities in left and right and is by no means the only lens to use.

The fun quirk is that there are similarities, and this model draws comparison front and center.

There are multiple useful models for evaluating politics, though.

crashmat
0 replies
1h38m

I don't think there are any (even far) leftwanting to ban non-diverse representation. I think it's impossible to ban 'conservative thoughts' because that's such a poorly defined phrase. However there are people who want to ban religion. One difference is that a much larger proportion of far right (almost all of them) want to ban lgbtq depiction and existence compared to the number of far left who want to ban religion or non-diverse representation.

It says on the wikipedia article itself 'The horseshoe theory does not enjoy wide support within academic circles; peer-reviewed research by political scientists on the subject is scarce, and existing studies and comprehensive reviews have often contradicted its central premises, or found only limited support for the theory under certain conditions.'

rockooooo
2 replies
2h1m

no AI company wants to be the one generating pornographic deepfakes of someone and getting in legal / PR hot water

seanw444
0 replies
1h53m

Which is why this should be a much more decentralized effort. Hard to take someone to court when it's not one single person or company doing something.

mrkramer
0 replies
1h39m

But what if you flip the things the other way around; deepfake porn is problematic not because porn is per se problematic but because deepfake porn or deepfake revenge porn is made without consent, but what if you give consent to some AI company or porn company to make porn content of you. I see this as evolution of OnlyFans where you could make AI generated deepfake porn of yourself.

Another use case would be that retired porn actors could license their porn persona (face/body) to some AI porn company to make new porn.

I see big business opportunity in the generative AI porn.

Cookingboy
1 replies
1h58m

This is why I think generative AI tech should either be banned or be completely open sourced. Mega tech corporations are plenty of things already, they don't need to be the morality police for our society too.

pksebben
0 replies
1h21m

Even if it is all open sourced, we still have the structural problem of training models large enough to do interesting stuff.

Until we can train incrementally and distribute the workload scalably, it doesn't matter how open the models / methods for training are if you still need a bajilllion A100 hours to train the damn things.

stared
0 replies
1h53m

It is not only about morals but the incentives of parties. The need for sexual-explicit content is bigger than, say, for niche artistic experiments of geometrical living cupboards owned by a cybernetic dragon.

Stability AI, very understandably, does not want to be associated with "the porn-generation tool". And if, even occasionally, it generates criminal content, the backslash would be enormous. Censoring the data requires effort but is (for companies) worth it.

nonrandomstring
0 replies
1h31m

The term "bad actor" is starting to get cringe.

Ronald Reagan was a bad actor.

George Bush wore out "evildoers"?

Where next... fiends, miscreants, baddies, hooligans, deadbeats?

Dastardly digital deviants Batman!

JonathanFly
34 replies
4h30m

From: https://twitter.com/EMostaque/status/1760660709308846135

Some notes:

- This uses a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements.

- This takes advantage of transformer improvements & can not only scale further but accept multimodal inputs..

- Will be released open, the preview is to improve its quality & safety just like og stable diffusion

- It will launch with full ecosystem of tools

- It's a new base taking advantage of latest hardware & comes in all sizes

- Enables video, 3D & more..

- Need moar GPUs..

- More technical details soon

Can we create videos similar like sora

Given enough GPUs and good data yes.

How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able to have fun with it ?

Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment.

(adding some later replies)

awesome. I assume these aren't heavily cherry picked seeds?

No this is all one generation. With DPO, refinement, further improvement should get better.

Do you have any solves coming for driving coherency and consistency across image generations? For example, putting the same dog in another scene?

yeah see @Scenario_gg's great work with IP adapters for example. Our team builds ComfyUI so you can expect some really great stuff around this...

Dall-e often doesn’t even understand negation, let alone complex spatial relations in combination with color assignments to objects.

Imagine the new version will. DALLE and MJ are also pipelines, you can pretty much do anything accurately with pipelines now.

Nice. Is it an open-source / open-parameters / open-data model?

Like prior SD models it will be open source/parameters after the feedback and improvement phase. We are open data for our LMs but not other modalities.

Cool!!! What do you mean by good data? Can it directly output videos?

If we trained it on video yes, it is very much like the arch of sora.

netdur
11 replies
3h53m

- Need moar GPUs..

Why is there not a greater focus on quantization to optimize model performance, given the evident need for more GPU resources?

memossy
9 replies
3h37m

We have highly efficient models for inference and a quantization team.

Need moar GPUs to do a video version of this model similar to Sora now they have proved that Diffusion Transformers can scale with latent patches (see stablevideo.com and our work on that model, currently best open video model).

We have 1/100th of the resources of OpenAI and 1/1000th of Google etc.

So we focus on great algorithms and community.

But now we need those GPUs.

sylware
4 replies
3h28m

Don't fall for it: OpenAI is microsoft. They have as much as google, if not more.

px43
0 replies
3h22m

To be clear here, you think that Microsoft has more AI compute than Google?

pavon
0 replies
1h3m

Yes, they have deep pockets and could increase investment if needed. But the actual resources devoted today are public, and in line with the parent said.

SV_BubbleTime
0 replies
3h9m

This isn’t OpenAI that make GPTx.

It’s StabilityAI that makes Stable Diffusion X.

Jensson
0 replies
3h21m

Google got cheap TPU chips, means they circumvent the extremely expensive Nvidia corporate licenses. I can easily see them having 10x the resources of OpenAI for this.

Solvency
3 replies
3h8m

can someone explain why nVidia doesn't just hold their own AI? And literally devote 50% of their production to their own compute center? In an age where even ancient companies like Cisco are getting in the AI race, why wouldn't the people with the keys to the kingdom get involved?

swamp40
0 replies
1m

Jensen was just talking about a new kind of data center: AI-generation factories.

downWidOutaFite
0 replies
2h58m

1. the real keys to the kingdom are held by TSMC whose fab capacity rules the advanced chips we all get, from NVIDIA to Apple to AMD to even Intel these days.

2. the old advice is to sell shovels during a gold rush

chompychop
0 replies
2h55m

"The people that made the most money in the gold rush were selling shovels, not digging gold".

supermatt
0 replies
3h37m

I believe he means for training

sandworm101
7 replies
3h50m

> all sorts of edge to giant GPU deployment.

Soon the GPU and its associated memory will be on different cards, as once happened with CPUs. The day of the GPU with ram slots is fast approaching. We will soon plug terabytes of ram into our 4090s, then plug a half-dozen 4090s into a raspberry PI to create a Cronenberg rendering monster. Can it generate movies faster than Pixar can write them? Sure. Can it play Factorio? Heck no.

jsheard
3 replies
3h39m

Any seperation of a GPU from its VRAM is going to come at the expense of (a lot of) bandwidth. VRAM is only as fast as it is because the memory chips are as close as possible to the GPU, either on seperate packages immediately next to the GPU package or integrated onto the same package as the GPU itself in the fanciest stuff.

If you don't care about bandwidth you can already have a GPU access terabytes of memory across the PCIe bus, but it's too slow to be useful for basically anything. Best case you're getting 64GB/sec over PCIe 5.0 x16, when VRAM is reaching 3.3TB/sec on the highest end hardware and even mid-range consumer cards are doing >500GB/sec.

Things are headed the other way if anything, Apple and Intel are integrating RAM onto the CPU package for better performance than is possible with socketed RAM.

mysterydip
1 replies
3h28m

Is there a way to partition the data so that a given GPU had access to all the data it needs but the job itself was parallelized over multiple GPUs?

Thinking on the classic neural network for example, each column of nodes would only need to talk to the next column. You could group several columns per GPU and then each would process its own set of nodes. While an individual job would be slower, you could run multiple tasks in parallel, processing new inputs after each set of nodes is finished.

zettabomb
0 replies
2h23m

Of course, this is common with LLMs which are too large to fit in any single GPU. I believe Deepspeed implements what you're referring to.

sandworm101
0 replies
2h43m

That depends on whether performance or capacity is the goal. Smaller amounts of ram closer to the processing unit makes for faster computation, but AI also presents a capacity issue. If the workload needs the space, having a boatload of less-fast ram is still preferable to offloading data to something more stable like flash. That is where bulk memory modules connected though slots may one day appear on GPUs.

ltbarcly3
1 replies
3h26m

I don’t think you really understand the current trends in computer architecture. Even cpus are being moved to have on package ram for higher bandwidth. Everything is the opposite of what you said.

sandworm101
0 replies
2h33m

Higher bandwidth but lower capacity. The real trend is different physical architectures for different compute loads. There is a place in AI for bulk albeit slower memory such as extremely large date sets that want to run internally on a discreet card without involving pci lanes.

zettabomb
0 replies
2h25m

I doubt it. The latest GPUs utilize HBM which is necessarily part of the same package as the main die. If you had a RAM slot for a GPU you might as well just go out to system RAM, way too much latency to be useful.

VikingCoder
6 replies
3h6m

I'm curious - where are the GPUs with decent processing power but enormous memory? Seems like there'd be a big market for them.

p1esk
1 replies
2h44m

H200 has 141GB, B100 (out next month) will probably have even more. How much memory do you need?

holoduke
0 replies
1h23m

We need 128gb with a 4070 chip for about 2000 dollars. Thats what we want.

wongarsu
0 replies
2h51m

Nvidia is making way too much money keeping cards with lots of memory exclusive to server GPUs they sell with insanely high margins.

AMD still suffers from limited resources and doesn't seem willing to spend too much chasing a market that might just be a temporary hype, Google's TPUs are a pain to use and seem to have stalled out, and Intel lacks commitment, and even their products that went roughly in that direction aren't a great match for neural networks because of their philosophy of having fewer more complex cores.

ls612
0 replies
2h32m

MacBooks with M2 or M3 Max. I’m serious. They perform like a 2070 or 2080 but have up to 128GB of unified memory, most of which can be used as VRAM.

iosjunkie
0 replies
1h53m

I dream of AMD or Intel creating cards to do just that

SV_BubbleTime
0 replies
2h43m

I’ll bet you the Nvidua 50xx series will have cards that are asymmetric for this reason. But nothing that will cannibalize their gaming market.

You’ll be able to get higher resolution but slowly. Or pay the $2800 for a 5090 and get high res with good speed.

cheald
3 replies
4h7m

SD 1.5 is 983m parameters, SDXL is 3.5b, for reference.

Very interesting. I've been streching my 12GB 3060 as far as I can; it's exciting that smaller hardware is still usable even with modern improvements.

memossy
1 replies
3h46m

800m is good for mobile, 8b for graphics cards.

Bigger than that is also possible, not saturated yet but need more GPUs.

vorticalbox
0 replies
3h32m

you ca also quantisation which lowers memory requirements at a small lose of performance.

liuliu
0 replies
33m

I am going to look at quantization for 8b. But also, these are transformers, so variety of merging / Frankenstein-tune is possible. For example, you can use 8b model to populate the KV cache (which computes once, so can load from slower devices, such as RAM / SSD) and use 800M model for diffusion by replicating weights to match layers of the 8b model.

albertzeyer
2 replies
38m

I understand that Sora is very popular, so it makes sense to refer to it, but when saying it is similar to Sora, I guess it actually makes more sense to say that it uses a Diffusion Transformer (DiT) (https://arxiv.org/abs/2212.09748) like Sora. We don't really know more details on Sora, while the original DiT has all the details.

tithe
1 replies
24m

Is anyone else struck by the similarities in textures between the images in the appendix of the above "Scalable Diffusion Models with Transformers" paper?

If you size the browser window right, paging with the arrow keys (so the document doesn't scroll) you'll see (eg, pages 20-21) the textures of the parrot's feathers are almost identical to the textures of bark on the tree behind the panda bear, or the forest behind the red panda is very similar to the undersea environment.

Even if I'm misunderstanding something fundamental here about this technique, I still find this interesting!

jachee
0 replies
1m

Could be that they’re all generated from the same seed. And we humans are really good at spotting patterns like that.

4bpp
34 replies
3h22m

I guess we should count our blessings and be grateful that literacy, the printing press, computers and the internet became normalised before this notion of "harm" and harm prevention was. Going forward, it's hard to imagine how any new technology that is unconditionally intellectually empowering to the individual will be tolerated; after all, just think of the harms someone thus empowered could be enabled to perpetrate.

Perhaps eventually, once every forum has been assigned a trust-and-safety team and word processor has been aligned and most normal people have no need for communication outside the Metaverse (TM) in their daily lives, we will also come around to reviewing the necessity of teaching kids to write, considering the epidemic of hateful graffiti and children being caught with handwritten sexualised depictions of their classmates.

xanderlewis
14 replies
2h56m

unconditionally intellectually empowering

What makes you think those who’ve worked hard over a lifetime to provide (with no compensation) the vast amounts of data required for these — inferior by every metric other than quantity — stochastic approximations of human thought should feel empowered?

I think the genAI / printing press analogy is wearing rather thin now.

graphe
7 replies
2h39m

WHO exactly worked hard over a lifetime with no compensation?

xanderlewis
4 replies
2h32m

By compensation I mean from the companies creating the models, like OpenAI.

graphe
3 replies
2h16m

Computers and drafters had their work taken by machines. IBM did not pay off the computers and drafters. In this case you could make a steady decent wage. My grandfather was trained in a classic drawing style (yes it was his main job).

He did not get into the profession to make money. He did it out of passion and died poor. Artists are not being tricked by the promise of wealth. You will get a cloned style if you can't afford the real artist making it and if the commission goes to a computer how is that not the same as plagerism by a human? Artists were not being paid well before. The anime industry has proven the endpoint of what happens to artists as a profession despite their skills. Chess still exists despite better play by machines. Art as a commercial medium has always been tainted by outside influences such as government, religion and pedophilia.

In the end, drawing wasn't going to survive in the age of vector art and computers. They are mainly forgettable jpgs you scroll past in a vast array like DeviantArt.

xanderlewis
2 replies
2h4m

Sorry, but every one of your talking points — ‘computers were replaced’ , ‘chess is still being played’, etc. — and good counterarguments to them have been covered ad nauseam (and practically verbatim) by now.

Anyway, my point isn’t that ‘AI is evil and must be stopped’; it’s that it doesn’t feel ‘intellectually empowering’. I (in my personal work) can’t get anything done with ChatGPT that I can’t on my own, and with less frustration. We’ve created machines that can superficially mimic real work, and the world is going bonkers over it. The only magic power these systems have is sheer speed: they can output reams and reams of twaddle in the time it takes me to make a cup of tea. And no doubt those in bullshit jobs are soon going to find out.

My argument might not be what you expect from someone who is sad to see the way artists’ lives are going: if your work is truly capable of being replaced by a large language model or a diffusion model, maybe it wasn’t very original to begin with.

The sad thing is, artists who create genuinely superior work will still lose out because those financially enabling them will think (wrongly) that they can be replaced. And we’ll all be worse off.

graphe
1 replies
39m

I definitely feel more empowered, and making imperfect art and generating code that doesn't work and proofreading it is definitely changing people's lives. Which specific artist are you talking about who will suffer? Many of the ones I talk to are excited about using it.

You keep going back to value and finances. The less money is in it the better. Art isn't good because it's valuable, unless you were only interested in it commercially.

xanderlewis
0 replies
34m

Art isn't good because it's valuable, unless you were only interested in it commercially.

Of course not; I’m certainly not suggesting so. But I do think money is important because it is what has enabled artists to do what they do. Without any prospect of monetising one’s art, most of us (and I’m not an artist) would be out working in the potato fields, with very little time to develop skills.

samstave
1 replies
2h5m

Slaves.

xanderlewis
0 replies
2h3m

Yes, but that’s clearly not what I’m getting at.

ben_w
1 replies
1h23m

inferior by every metric other than quantity

And the metric of "beating most of our existing metrics so we had to rewrite the metrics to keep feeling special, but don't worry we can justify this rewriting by pointing at Goodhart's law".

The only reason the question of compensating people for their input into these models even matters is specifically because the models are, in actual fact, good. The bad models don't replace anyone.

xanderlewis
0 replies
43m

beating most of our existing metrics so we had to rewrite the metrics to keep feeling special

This is needlessly provocative, and also wrong. My metrics have been the same from the very beginning (i.e. ‘can it even come close to doing my work for me?’). This question may yet come to evaluate to ‘yes’, but I think you seriously underestimate the real power of these models.

The only reason the question of compensating people for their input into these models even matters is specifically because the models are, in actual fact, good.

No. They don’t need to be good, they simply need to fool people into thinking they’re good.

And before you reflexively rebut with ‘what’s the difference?’, let me ask you this: is the quality of a piece of work or the importance of a job and all of its indirect effects always immediately apparent? Is it possible for managers to short term cost-cut at the expense of the long term? Is it conceivable that we could at some point slip into a world in which there is no funding for genuinely interesting media anymore because 90% of the population can’t distinguish it? The real danger of genAI is that it convinces non-experts that the experts are replaceable when the reality is utterly different. In some cases this will lead to serious blowups and the real experts will be called back in, but in more ambiguous cases we’ll just quietly lose something of real value.

Vetch
1 replies
1h21m

thought should feel empowered?

This is a strange question since augmentation can be objectively measured even as its utility is contextual. With MidJourney I do not feel augmented because while it makes pretty images, it does not make precisely the pretty images I want. I find this useless, but for the odd person who is satisfied only with looking at pretty pictures, it might be enough. Their ability to produce pretty pictures to satisfaction is thus augmented.

With GPT4 and Copilot, I am augmented in a speed instead of capabilities sense. The set of problems I can solve is not meaningfully enhanced, but my ability to close knowledge gaps is. While LLMs are limited in their global ability to help design, architect or structure the approach to a novel problem or its breakdown, they can tell local tricks and implementation approaches I do not know but can verify as correct. And even when wrong, I can often work out how to fix their approach (this is still a speed up since I likely would not have arrived at this solution concept on my own). This is a significant augmentation even if not to the level I'd like.

The reason capabilities are not much enhanced is to get the most out of LLMs, you need to be able to verify solutions due to their unreliability. If a solution contains concepts you do not know, the effort to gain the knowledge required to verify the approach (which the LLM itself can help with) needs to be manageable in reasonable time.

xanderlewis
0 replies
40m

With GPT4 and Copilot…

I am not a programmer, so none of this applies to me. I can only speak for myself, and I’m not claiming that no one can feel empowered by these tools - in fact it seems obvious that they can.

I think programmers tend to assume that all other technical jobs can be attacked in the same way, which is not necessarily true. Writing code seems to be an ideal use case for LLMs, especially given the volume of data available on the open web.

4bpp
1 replies
1h15m

Empowering to their users. A lot of things that empower their users necessarily disempower others, especially if we define power in a way that is zero-sum - the printing press disempowered monasteries and monks that spent a lifetime perfecting their book-copying craft (and copied books that no doubt were used in the training of would-be printing press operators in the process, too).

It seems to me that the standard use of "empowering" implies in particular that you get more power for less effort - which in many cases tends to be democratizing, as hard-earned power tends to be accrued by a handful of people who dedicate most of their lives to pursuit of power in one form or another. With public schooling and printing, a lot of average people were empowered at the expense of nobles and clerics, who put in a lifetime of effort for the power literacy conveys in a world without widespread literacy. With AI, likewise, average people will be empowered at the expense of those who dedicated their life to learn to (draw, write good copy, program) - this looks bad because we hold those people in high esteem in a world where their talents are rare, but consider that following that appearance is analogously fallacious to loathing democratization of writing because of how noble the nobles and monks looked relative to the illiterate masses.

xanderlewis
0 replies
37m

I get why you might describe these tools as ‘democratising’, but it also seems rather strange when you consider that the future of creativity is now going to be dependent on huge datasets and amounts of computation only billion-dollar companies can afford. Isn’t that anything but democratic? Sure, you can ignore the zeitgeist and carry on with traditional dumb tools if you like, but you’ll be utterly left behind.

laminatedsmore
9 replies
2h35m

"grateful that literacy, the printing press, computers and the internet became normalised before this notion of "harm" and harm prevention was"

Printing Press -> Reformation -> Thirty Years' War -> Millions Dead

I'm sure that there were lots of different opinions at the time about what kind of harm was introduced by the printing press and what to do about it, and attempts to control information by the Catholic church etc.

The current fad for 'safe' 'AI' is corporate and naive. But there's no simple way to navigate a revolutionary change in the way information is accessed / communicated.

light_hue_1
3 replies
2h22m

Way to blame the printing press for the actions of religious extremists.

The lesson isn't. printing press bad, it's extremist irrational belief in any entity is bad (whether it's religion, Trump, etc.).

samstave
0 replies
2h6m

The printing press is the leading cause of tpyos!

herculity275
0 replies
1h53m

It's not about assigning blame. A revolutionary technology enables revolutionary change and all sorts of bad actors will take advantage of it.

freedomben
0 replies
2h8m

Way to blame the printing press for the actions of religious extremists.

I don't see GP blaming the printing press for that, they're merely pointing out that one enabled the other, which is absolutely true. I'm damn near a free speech absolutist, and I think the heavy "safety" push by AI is well-meaning but will have unintended consequences that cause more harm than they are meant to prevent, but it seems obvious to me that they can be used much the same as printing presses were by the extremists.

The lesson isn't. printing press bad, it's extremist irrational belief in any entity is bad (whether it's religion, Trump, etc.).

Could not agree more

biomcgary
2 replies
1h56m

Safetyism is the standard civic religion since 9/11 and I doubt it will go quietly into the night. Much like the bishops and the king had a symbiotic relationship to maintain control and limit change (e.g., King James of KJV Bible fame), the government and corporations have a similarly tense, but aligned relationship. Boogeymen from the left or the right can always be conjured to provide the fear necessary to control

Would millions have died if the old religion gave way to the new one without a fight? The problem for the Vatican was that their rhetoric wasn't at top form after mentally stagnating for a few centuries since arguing with Roman pagans, so war was the only possibility to win.

(Don't forget Luther's post hoc justification of killing 100k+ peasants, but he won because he had better rhetorical skills AND the backing of aristocrats and armies. https://en.wikipedia.org/wiki/Against_the_Murderous,_Thievin... and https://en.wikipedia.org/wiki/German_Peasants%27_War)

kurthr
0 replies
58m

"Think of the Children" has been the norm since long before it was re-popularized in the 80s for song lyrics, in the 90s encryption, and now everything else.

I almost think it's the eras between that are more notable.

EchoReflection
0 replies
50m

"The Coddling of the American Mind" by Jonathan Haidt and Greg Lukianoff is a very good (and troubling) book that talks a lot about "safetyism". I can't recommend it enough.

https://jonathanhaidt.com/

https://www.betterworldbooks.com/product/detail/the-coddling...

https://www.audible.com/pd/The-Coddling-of-the-American-Mind...

fngjdflmdflg
0 replies
49m

I agree. There should have been guardrails in place to prevent people who espouse extremist viewpoints like Martin Luther from spreading their dangerous and hateful rhetoric. I rest easy knowing that only people with the correct intentions will be able to use AI.

dotancohen
0 replies
56m

The current focus on "safety" (I would prefer a less gracious term) are based as much on fear as on morality. Fear of government intervention and woke morality. The progress in technology is astounding, the focus on sabotaging then publicly available versions of the technology to promote (and deny) narratives is despicable.

gjulianm
3 replies
2h37m

I feel like this analogy is not very appropriate. The main problem with AI generated images and videos is that, with every improvement, it becomes more and more difficult to distinguish what's real and what's not. That's not something that happened with literacy or printing press or computers.

Think about it: the saturation of content on the Internet has become so bad that people are having a hard time knowing what's true or not, to the point that we're having again outbreaks of preventable diseases such as measles because people can't identify what's real scientific information and what's not. Imagine what will happen when anyone can create an image of whatever they want that looks just like any other picture, or worse, video. We are not at all equipped to deal with that. We are risking a lot just for the ability to spend massive amounts of compute power on generating images. It's not curing cancer, not solving world hunger, not making space travel free, no: it's generating images.

gpderetta
2 replies
2h8m

I don't understand. Are you saying that before AI there was a reliable way to distinguish fiction from factual?

gjulianm
0 replies
1h33m

It definitely is easier without AI. Before, if you saw a photo you could be fairly confident that most of it was real (yes, photo manipulation exists but you can't really create a photo out of nothing). Videos, far more trustworthy (and yes, I know that there's some amazing 3D renders out there but they're not really accessible). With these technologies and the rate at which they're improving, I feel like that's going out of the window. Not to mention that the more content that is generated, the easier it is that something slips by despite being fake.

UberFly
0 replies
53m

"it becomes more and more difficult to distinguish what's real and what's not" - Is literally what they said.

miohtama
1 replies
2h28m

British banned printing press in 1662 in the name of the harm

https://en.m.wikipedia.org/wiki/Licensing_of_the_Press_Act_1...

freedomben
0 replies
2h13m

Yes, and fortunately that banning was the end of hateful printed content. Since that ban, the only way to print objectionable material has been to do it by hand with pen and ink.

(For clarity, I'm joking, and I know you're also not implying any such thing. I appreciate your comment/link)

someuser2345
0 replies
2h27m

Harm prevention is definitely not new; books have been subject to censorship for centuries. Just look at the U.S., where we had the Hays code and the Comic Code Authority. The only difference is that now, Harm is defined by California tech companies rather than the Church or the Monarchy.

jsight
0 replies
55m

The core problem is centralization of control. If everyone uses their own desktop computer, then everyone is responsible for their own behavior.

If everyone uses Hosting Service F, then at some point people will blur the lines and expect "Hosting Service F" to remove vulgar or offensive content. The lines themselves will be a zeitgeist of sorts with inevitable decisions that are acceptable to some but not all.

Can you even blame them? There are lots of ways for this to go wrong and noone wants to be on the wrong side of a PR blast.

So heavy guardrails are effectively inevitable.

ben_w
0 replies
1h27m

I don't think your golden age ever truly existed — the Overton Window for acceptable discourse has always been narrow, we've just changed who the in-group and out-groups are.

The out group used to be atheists, or gays, or witches, or republicans (in the British sense of the word), or people who want to drink. And each of Catholics and Protestants made the other unwelcome across Europe for a century or two. When I was a kid, it was anyone who wanted to smoke weed, or (because UK) any normalised depiction of gay male relationships as being at all equivalent to heterosexual ones[0]. I met someone who was embarrassed to admit they named their son "Hussein"[1], and absolutely any attempt to suggest that ecstasy was anything other than evil. I know at least one trans person who started out of the closet, but was very eager to go into the closet.

[0] "promote the teaching in any maintained school of the acceptability of homosexuality as a pretended family relationship" - https://en.wikipedia.org/wiki/Section_28

[1] https://en.wikipedia.org/wiki/Hussein

londons_explore
26 replies
4h33m

I really wonder what harm would come to the company if they didn't talk about safety?

Would investors stop giving them money? Would users sue that they now had PTSD after looking at all the 'unsafe' outputs? Would regulators step in and make laws banning this 'unsafe' AI?

What is it specifically that company management is worried about?

dorkwood
13 replies
4h16m

They're attempting to guard themselves against incoming regulation. The big players, such as Microsoft, want to squash Stable Diffusion while protecting themselves, and they're going to do it by wielding the "safety is important and only we have the resources to implement it" hammer.

HeatrayEnjoyer
11 replies
3h53m

Safety is a very real concern, always has been in ML research. I'm tired of this trite "they want a moat" narrative.

I'm glad tech orgs are for once thinking about what they're building before putting out society-warping democracy-corroding technology instead of move fast break things.

atahanacar
7 replies
2h41m

Safety from what? Human anatomy?

bergen
6 replies
2h32m

See the recent Taylor Swift scandal. Safety from never ending amounts of deepfake porn and gore for example.

atahanacar
3 replies
2h15m

This isn't a valid concern in my opinion. Photo manipulation has been around for decades. People have been drawing other people for centuries.

Also, where do we draw the line? Should Photoshop stop you from manipulating human body because it could be used for porn? Why stop there, should text editors stop you from writing about sex or describing human body because it could be used for "abuse". Should your comment be removed because it make me imagine Taylor Swift without clothes for a brief moment?

spencerflem
0 replies
2h7m

Doing it effortlessly and instantly makes a difference.

(This applies to all AI discussions)

kristopolous
0 replies
35m

That's fine. But the question was what are they referring to and that's the answer.

bergen
0 replies
1h41m

No, but AI requires zero learning curve and can be automated. I can't spit out 10 images of Tay per second in photoshop. If I want and the API delivers I can easily do that with AI. (Given, would one becoding this it requires a learning curve, but in principal with the right interface and they exist i can churn out hundreds of images without me actively putting work in)

chasd00
1 replies
1h34m

See the recent Taylor Swift scandal

but that's not dangerous. It's definitely worthy of unlocking the cages of the attack lawyers but it's not dangerous. The word "safety" is being used by big tech to trigger and gas light society.

shrimp_emoji
0 replies
1h32m

I.e., controlling through fear

rwmj
0 replies
3h40m

That would make sense if it was in the slightest about avoiding "society-warping democracy-corroding technology". Rather than making sure no one ever sees a naked person which would cause governments to come down on them like a ton of bricks.

jquery
0 replies
1h11m

To the extent these models don't blindly regurgitate hate speech, I appreciate that. But what I do not appreciate is when they won't render a human nipple or other human anatomy. That's not safety, and calling it such is gaslighting.

dorkwood
0 replies
3h43m

It doesn't strike you as hypocritical that they all talk about safety while continuing to push out tech that's upending multiple industries as we speak? It's tough for me to see it as anything other than lip service.

I'd be on your side if any of them actually chose to keep their technology in the lab instead of tossing it out into the world and gobbling up investment dollars as fast as they could.

ballenf
0 replies
3h42m

AI/ML/GPT/etc are looking increasingly like other media formats -- a source of mass market content.

The safety discussion is proceeding very much like it did for movies, music, and video games.

memossy
4 replies
3h43m

As the leader in open image models it is incumbent upon us as the models get to this level of quality to take seriously how we can release open and safe models from a legal, societal and other considerations.

Not engaging in this will indeed lead to bad laws, sanctions and more as well as not fulfilling our societal obligations of ensuring this amazing technology is used for as positive outcomes as possible.

Stability AI was set up to build benchmark open models of all types in a proper way, this is why for example we are one of the only companies to offer opt out of datasets (stable cascade and SD3 are opted out), have given millions of supercompute hours in grants to safety related research and more.

Smaller players with less uptake and scrutiny don't need to worry so much about some of these complex issues, it is quite a lot to keep on top of, doing our best.

zmgsabst
1 replies
3h16m

“We need to enforce our morality on you, for our beliefs are the true ones — and you’re unsafe for questioning them!”

You sound like many authoritarian regimes.

memossy
0 replies
2h54m

I mean open models yo

GenerWork
1 replies
3h29m

it is incumbent upon us as the models get to this level of quality to take seriously how we can release open and safe models from a legal, societal and other considerations.

Can you define what you mean by "societal and other considerations"? If not, why not?

memossy
0 replies
2h53m

I could but I won't as legal stuff :)

brainwipe
1 replies
4h19m

All of the above! Additionally... I think AI companies are trying to steer the conversation about safety so that when regulations do come in (and they will) that the legal culpability is with the user of the model, not the trainer of it. The business model doesn't work if you're liable for harm caused by your training process - especially if the harm is already covered by existing laws.

One example of that would be if your model was being used to spot criminals in video footage and it turns out that the bias of the model picks one socioeconomic group over another. Most western nations have laws protecting the public against that kind of abuse (albeit they're not applied fairly) and the fines are pretty steep.

graphe
0 replies
2h34m

They have already used "AI" with success to give people loans and they were biased. Nothing happened legally to that company.

summerlight
0 replies
35m

Likely public condemnation followed by unreasonable regulations when populists see their campaign opportunities. We've historically seen this when new types of media (e.g. TV, computer games) debut and there are real, early signals of such actions.

I don't think those companies being cautious is necessarily a bad thing even for AI enthusiasts. Open source models will quickly catch up without any censorship while most of those public attacks are concentrated into those high profile companies, which have established some defenses. That would be a much cheaper price than living with some unreasonable degree of regulations over decades, driven by populist politicians.

shapefrog
0 replies
2h55m

What is it specifically that company management is worried about?

As with all hype techs, even the most talented management are barely literate in the product. When talking about their new trillion $ product they must take their talking points from the established literature and "fake it till they make it".

If the other big players say "billions of parameters" you chuck in as many as you can. If the buzz words are "tokens" you say we have lots of tokens. If the buzz words are "safety" you say we are super safe. You say them all and hope against hope that nobody asks a simple question you are not equipped to answer that will show you dont actually know what you are talking about.

renewiltord
0 replies
42m

It's a bit rich when HN itself is chock full with camp followers who pick the most mainstream opinion. Previously it was AI danger, then it became hallucinations, now it's that safety is too much.

The rest of the world is also like that. You can make a thing that hurts your existing business. Spinning off the brand is probably Google's best bet.

chasd00
0 replies
1h38m

they risk reputational harm and since there's so many alternatives outright "brand cancellation". For example, vocal groups can lobby payment processors to deny service to any AI provider deemed unworthy. Ironic that tech enabled all of that behavior to begin with and now they're worried about it turning on them.

bitcurious
0 replies
4h8m

The latter; there is already an executive order around AI safety. If you don't address it out loud you'll draw attention to yourself.

https://www.whitehouse.gov/briefing-room/presidential-action...

hizanberg
24 replies
3h1m

IMO the "safety" in Stable Diffusion is becoming more overzealous where most of my images are coming back blurred, where I no longer want to waste my time writing a prompt only for it to return mostly blurred images. Prompts that worked in previous versions like portraits are coming back mostly blurred in SDXL.

If this next version is just as bad, I'm going to stop using Stability APIs. Are there any other text-to-image services that offer similar value and quality to Stable Diffusion without the overzealous blurring?

Edit:

Example prompt's like "Matte portrait of Yennefer" return 8/9 blurred images [1]

[1] https://imgur.com/a/nIx8GBR

nickthegreek
10 replies
2h58m

Run it locally.

lolinder
6 replies
2h55m

I haven't tried SD3, but my local SD2 regularly has this pattern where while the image is developing it looks like it's coming along fine and then suddenly in the last few rounds it introduces weird artifacts to mask faces. Running locally doesn't get around censorship that's baked into the model.

I tend to lean towards SD1.5 for this reason—I'd rather put in the effort to get a good result out of the lesser model than fight with a black box censorship algorithm.

EDIT: See the replies below. I might just have been holding it wrong.

yreg
2 replies
2h39m

Do you use the proper refiner model?

lolinder
1 replies
2h35m

Probably not, since I have no idea what you're talking about. I've just been using the models that InvokeAI (2.3, I only just now saw there's a 3.0) downloads for me [0]. The SD1.5 one is as good as ever, but the SD2 model introduces artifacts on (many, but not all) faces and copyrighted characters.

EDIT: based on the other reply, I think I understand what you're suggesting, and I'll definitely take a look next time I run it.

[0] https://github.com/invoke-ai/InvokeAI

yreg
0 replies
1h23m

SDXL should be used together with a refiner. You can usually see the refiner kicking in if you have a UI that shows you the preview of intermediate steps. And it can sometimes look like the situation you describe (straining further away from your desired result).

Same goes for upscalers, of course.

fnordpiglet
2 replies
2h37m

Be sure to turn off the refiner. This sounds like you’re making models that aren’t aligned with their base models and the refiner runs in the last steps. If it’s a prompt out of alignment with the default base model it’ll heavily distort. Personally with SDXL I never use the refiner I just use more steps.

zettabomb
0 replies
2h33m

SD2 isn't SDXL. SD2 was a continuation of the original models that didn't see much success. It didn't have a refiner.

lolinder
0 replies
2h34m

That makes sense. I'll try that next time!

hizanberg
2 replies
2h51m

Don't expect my current desktop will be able to handle it, which is why I'm happy to pay for API access, but my next Desktop should be capable.

Is the OSS'd version of SDXL less restrictive than their API hosted version?

yreg
0 replies
2h31m

You can set up the same thing you would have locally on some spot cloud instance.

nickthegreek
0 replies
2h44m

If you run into issues, switch to a fine-tuned model from civitai.

Tenoke
4 replies
2h50m

The nice thing about Stable Diffusion is that you can very easily set it up on a machine you control without any 'safety' and with a user-finetuned checkpoint.

cyanydeez
3 replies
2h36m

they're nerfing the models, not just the prompt engineering.

After SD1.5 they started directly modifying the dataset.

it's only other users who "restore" the porno.

and that's what we're discussing. there's a real concern about it as a public offering.

Tenoke
1 replies
2h34m

Sure, but again if you run it yourself you can use the finetuned by users checkpoints that have it.

cyanydeez
0 replies
2h17m

yes, but the GP is discussing the API, and specifically the company that offers the base model.

they both don't want to offer anything that's legally dubious and it's not hard to understand why.

jncfhnb
0 replies
1h41m

No it’s not. It’s perfectly reasonable not to want to generate porn for customers.

The models being open sourced makes them very easy to turn into the most deprived porno machines ever conceived. And they are.

It is in no way a meaningful barrier to what people can do. That’s the benefit of open source software.

gangstead
2 replies
2h12m

I've never seen blurring in my images. Is that something that they add when you do API access? I'm running SD 1.5 and SDXL 1.0 models locally. Maybe I'm just not prompting for things they deem naughty. Can you share an example prompt where the result gets blurred?

stavros
0 replies
1h20m

It's a filter they apply after generation.

jncfhnb
0 replies
1h39m

If you run locally with the basic stack it’s literally a bool flag to hide nsfw content. It’s trivial to turn off and off by default in most open source setups.

lancesells
1 replies
2h21m

I don't use it at all but do you mind sharing what prompts don't work?

hizanberg
0 replies
2h12m

Last prompt I tried was "Matte portrait of Yennefer" returned 8/9 blurred images [1]

[1] https://imgur.com/a/nIx8GBR

NoMoreNicksLeft
1 replies
2h19m

Wait, blurring (black) means that it objected to the content? I tried it a few times on one of the online/free sites (Huggingspace, I think) and I just assumed I'd gotten a parameter wrong.

pksebben
0 replies
1h16m

Not necessarily, but it can. Black squares can come from a variety of problems.

araes
0 replies
1h27m

Taking the actual example you provided, I can understand the issue. Since it amounts to blurring images of a virtual character, that are not actually "naughty." Equivalent images in bulk quantity are available on every search engine with "yennefer witcher 3 game" [1][2][3][4][5][6] Returns almost the exact generated images, just blurry.

[1] Google: https://www.google.com/search?sca_esv=a930a3196aed2650&q=yen...

[2] Bing via Ecosia: https://www.ecosia.org/images?q=yennefer%20witcher%203%20gam...

[3] Bing: https://www.bing.com/images/search?q=yennefer+witcher+3+game...

[4] DDG: https://duckduckgo.com/?va=e&t=hj&q=yennefer+witcher+3+game&...

[5] Yippy: https://www.alltheinternet.com/?q=yennefer+witcher+3+game&ar...

[6] Dogpile: https://www.dogpile.com/serp?qc=images&q=yennefer+witcher+3+...

AuryGlenz
23 replies
3h23m

It's really unfortunate that Silicon Valley ended up in an area that's so far left - and to be clear, it'd be just as bad if it was in a far right area too. Purple would have been nice, to keep people in check. 'Safety' seems to be actively making AI advances worse.

spencerflem
9 replies
3h4m

Silicon Valley is not "far left" by any stretch, which implies socialism, redistribution of wealth, etc. This is obvious by inspection.

I assume by far left, you mean progressive on social issues, which is not really a leftist thing but the groups are related enough that I'll give you a pass.

Silicon valley techies are also not socially progressive. Read this thread or anything published by Paul Graham or any of the AI leaders for proof of that.

However most normal city people are. A large enough percent of the country that big companies that want to make money feel the need to appeal to them.

Funnily enough, what is a uniquely Silicon Valley political opinion is valuing the progress of AI over everything else

chasd00
3 replies
1h25m

when i think of "far left" i think of an authoritative regime disguised as serving the common good and ready to punish and excommunicate any thought or action deemed contrary to the common good. However, the regime defines "common good" themselves and remains in power indefinitely. In that regard, SV is very "far left". At the extremes far-left and far-right are very similar when you empathize as a regular person on the street.

foolofat00k
1 replies
1h8m

That's just not what that term means.

acheron
0 replies
8m

It’s not right wing unless they sit on the right side of the National Assembly and support Louis XVI.

spencerflem
0 replies
1h19m

Well, you're wrong.

TulliusCicero
3 replies
2h38m

Techies are socially progressive as a whole. Yes there are some outliers, and tech leaders probably aren't as far left socially as the ground level workers.

KittenInABox
1 replies
2h14m

I disagree techies are socially progressive as a whole; there is very minimal, almost no push for labor rights or labor protection even though our group is disproportionately hit with abusing employees under the visa program.

TulliusCicero
0 replies
10m

Labor protections are generally seen as a fiscal issue, rather than a social one. E.g. libertarians would usually be fine with gay rights but against greater labor regulation.

spencerflem
0 replies
2h34m

I wish :/, I really do

I find them in general to not be Republican and all the baggage that entails but the typical techie I meet is less concerned with social issues than the typical city Democrat.

If I can speculate wildly, I think it is because tech has this veneer of being an alternative solution to the worlds problems, so a lot of techies believe that advancing of tech is both the most important goal and also politically neutral. And also, now that tech is a uniquely profitable career, the types of people that would be in business majors are now CS majors. Ie. those that are mainly interested in getting as much money as possible for themselves.

skinpop
0 replies
1h2m

indeed they are not really left but neoliberals with a leftist aesthetic, just like most republicans are neoliberals with a conservative aesthetic.

bergen
6 replies
2h20m

Put in any historical or political context SV is in no way left. They're hardcore libertarian. Just look at their poster boys, Elon Musk, Peter Thiel, and a plethora of others are very oriented towards totalitarianism from the right. Just because they blow their brains out on lsd and ketamine and go on 2 week spiritual retreats doesn't make them leftists. They're billionares that only care about wealth and power, living in segregated communities from the common folk of the area - nothing lefty about that.

freedomben
3 replies
1h54m

Elon Musk and Peter Thiel are two of the most hated people in tech, so this doesn't seem like a compelling example. Also I don't think Elon Musk and Peter Thiel qualify as "hardcore libertarian." Thiel was a Trump supporter (hardly libertarian at all, let alone hardcore) and Elon has supported Democrats and much government his entire life until the last few years. He's mainly only waded into "culture war" type stuff that I can think of. What sort of policies has Elon argued for that you think are "hardcore libertarian?"

bergen
2 replies
1h44m

He wanted to replace public transport with a system where you don't have to ride the public transport with the plebs, he want's to colonize mars with the best minds (equal most money for him), he built a tank for urban areas. He promotes free speech even if it incites hate, he likes ayn rand, he implies government programs calling for united solutions is either communism, orwell or basically hitler. He actively promotes the opinion of those that pay above others on X.

freedomben
1 replies
1h31m

Thank you, truly, I appreciate the effort you put in to list those. It helps me understand more where you're coming from.

He wanted to replace public transport with a system where you don't have to ride the public transport with the plebs

I don't think this is any more libertarian than kings and aristocrats of days past were. I know a bunch of people who ride public transit in New York and San Francisco who would readily agree with this, and they are definitely not libertarian. If anything it seems a lot more democratic since he wants it to be available to everyone

he want's to colonize mars with the best minds (equal most money for him)

This doesn't seem particularly "libertarian" either, excepting maybe the aspect of it that is highly capitalistic. That point I would grant. But you could easily be socialist and still support the idea of colonizing something with the best minds.

he built a tank for urban areas.

I admit I don't know anything about this one

He promotes free speech even if it incites hate

This is a social libertarian position, although it's completely disconnected from economic libertarianism. I have a good friend who is a socialist (as in wants to outgrow capitalism such as marx advocated) who supports using the state to suppress capitalist activity/"exploitation", and he also is a free speech absolutist.

he likes ayn rand

That's a reasonable point, although I think it's worth noting that there are plenty of hardcore libertarians who hate ayn rand.

he implies government programs calling for united solutions is either communism, orwell or basically hitler.

Eh, lots of republicans including Trump do the same thing, and they're not libertarian. Certainly not "hardcore libertarian"

He actively promotes the opinion of those that pay above others on X.

This could be a good one, although Google, Meta, Reddit, Youtube, and any other company that runs ads or has "sponsored content" is doing the same thing, so we would have to define all the big tech companies as "hardcore libertarian" to stay consistent.

Overall I definitely think this is a hard debate to have because "hardcore libertarian" can mean different things to different people, and there's a perpetual risk of "no true scotsman" fallacy. I've responded above with how I think most people would imagine libertarianism, but depending on when in history you use it, many anarcho-socialists used the label for themselves yet today "libertarian" is a party that supports free market economics and social liberty. But regardless the challenges inherent, I appreciate the exchange

bergen
0 replies
1h10m

I don't think this is any more libertarian than kings and aristocrats of days past were. So very libertarian.

If anything it seems a lot more democratic since he wants it to be available to everyone No, he want's a solution that minimizes contact to other people and let you live in your bubble. This minimizes exposure to others from the same city and is a commercial system, not a publicly created one. Democratization would be a cheap public transport where you don't get mugged, proven to work in every european and most asian cities.

I admit I don't know anything about this one The cybertruck. Again a vehicle to isolate you from everyday life being supposed bulletproof and all.

lots of republicans including Trump do the same thing, and they're not libertarian They are all "little government, individual choice" - of course they feed their masters, but the kochs and co want exactly this.

Appreciate the exchange too, thanks for factbased formulation of opinions.

njarboe
1 replies
1h42m

Musk main residence is a $50k house he rents in Boca Chica. Grimes wanted a bigger, nicer residence for her and their kids and that was one of the reasons she left him.

bergen
0 replies
1h37m
rightbyte
2 replies
2h54m

SV area far left? I wouldn't even regard the area as left leaning, at all.

I looked at Wikipedia and there seem to be no socialist representation.

Like, from an European perspective hearing that is ludicrous.

kristofferR
1 replies
1h57m

They are the worst kind of left, the "prudish and constantly offended left", not the "free healthcare and good government" left.

I'm glad I live in Norway, where state TV shows boobs and does offensive jokes without anyone really caring.

jquery
0 replies
58m

Prudish? San Francisco? The same city that has outdoor nude carnivals without any kind of age restrictions?

If by prudish you mean intolerant of hate speech, sure. But generally few will freak out over some nudity here.

College here is free. We also have free healthcare here, as limited as it is: https://en.wikipedia.org/wiki/Healthy_San_Francisco

Not sure what you mean by "offensive jokes", that could mean a lot of things...

dang
1 replies
28m

We detached this subthread from https://news.ycombinator.com/item?id=39467056.

spencerflem
0 replies
13m

thank you, the thread looks so much nicer now with interesting technical details at the top

asadotzler
0 replies
12m

So far left the techies dont even have a labor union. You're a joke.

keiferski
15 replies
4h38m

The obsession with safety in this announcement feels like a missed marketing opportunity, considering the recent Gemini debacle. Isn’t SD’s primary use case the fact that you can install it on your own computer and make what you want to make?

jsheard
8 replies
4h33m

At some point they have to actually make money, and I don't see how continuously releasing the fruits of their expensive training for people to run locally on their own computer (or a competing cloud service) for free is going to get them there. They're not running a charity, the walls will have to go up eventually.

Likewise with Mistral, you don't get half a billion in funding and a two billion valuation on the assumption that you'll keep giving the product away for free forever.

keiferski
4 replies
4h32m

But there are plenty of other business models available for open source projects.

I use Midjourney a lot and (based on the images in the article) it’s leaps and bounds beyond SD. Not sure why I would switch if they are both locked down.

bee_rider
2 replies
2h57m

Is it possible to fine-tune Midjourney or produce a LORA?

nickthegreek
0 replies
14m

No. You can provide a photos to merge though.

keiferski
0 replies
2h37m

Sorry I don’t know what that means, but a quick google shows some results about it.

AuryGlenz
0 replies
3h20m

SD would probably be a lot better if they didn't have to make sure it worked on consumer GPUs. Maybe this announcement is a step towards that where the best model will only be able to be accessed by most using a paid service.

archerx
2 replies
4h18m

Ironically their over sensitive nsfw image detector in their api caused me to stop using it and run it locally instead. I was using it to render animations of hundreds of frames but when every 20th to 30th image comes out blurry it ruins the whole animation and it would double the cost or more to rerender it with a different seed hoping to not trigger the over zealous blurring.

I don’t mind that they don’t want to let you generate nsfw images but their detector is hopelessly broken, it once censored a cube, yes a cube...

Sharlin
1 replies
3h56m

Unfortunately their financial and reputational incentives are firmly aligned with preventing false negatives at the cost of a lot of false positives.

archerx
0 replies
1h45m

Unfortunately I don't want to pay for hundreds if not thousands of images I have to throw away because it decided some random innocent element is offensive and blurs the entire image.

Here is the red cube it censored because my innocent eyes wouldn't be able to handle it; https://archerx.com/censoredcube.png

What they are achieving with the over zealous safety issues are driving developers to on demand GPU hosts that will let them host their own models, which also opens up a lot more freedom. I wanted to use the stability AI api as my main source for Stable Diffusion but they make it really really hard especially if you want use it as part of your business.

causal
5 replies
4h8m

Open source models can be fine-tuned by the community if needed.

I would much rather have this than a company releasing models this size into the wild without any safety checks whatsoever.

srid
4 replies
3h44m

Could you list the concrete "safety checks" that you think prevents real-world harm? What particular image that you think a random human will ask the AI to generate, which then leads to concrete harm in the real world?

causal
2 replies
2h58m

If 1 in 1,000 generations will randomly produce memorized CSAM that slipped into the training set then yeah, it's pretty damn unsafe to use. Producing memorized images has precedent[0].

Is it unlikely? Sure, but worth validating.

[0] https://arxiv.org/abs/2301.13188

yreg
0 replies
2h16m

Why not run the safety check on the training data?

srid
0 replies
2h36m

Okay, by "safety checks" you meant the already unlawful things like CSAM, but not politically-overloaded beliefs like "diversity"? The latter is what the comment[1] you were replying to was referring to (viz. "considering the recent Gemini debacle"[2]).

[1] https://news.ycombinator.com/item?id=39466991

[2] https://news.ycombinator.com/item?id=39456577

politician
0 replies
3h37m

Not even the large companies will explain with precision their implementation of safety.

Until then, we must view this “safety” as both a scapegoat and a vector for social engineering.

subzel0
13 replies
3h59m

“Photo of a red sphere on top of a blue cube. Behind them is a green triangle, on the right is a dog, on the left is a cat”

https://pbs.twimg.com/media/GG8mm5va4AA_5PJ?format=jpg&name=...

jetrink
3 replies
3h33m

One thing that jumps out to me is that the white fur on the animals has a strong green tint due to the reflected light from the green surfaces. I wonder if the model learned this effect from behind the scenes photos of green screen film sets.

zero_iq
0 replies
2h3m

The models do a pretty good job at rendering plausible global illumination, radiosity, reflections, caustics, etc. in a whole bunch of scenarios. It's not necessarily physically accurate (usually not in fact), but usually good enough to trick the human brain unless you start paying very close attention to details, angles, etc.

This fascinated me when SD was first released, so I tested a whole bunch of scenarios. While it's quite easy to find situations that don't provide accurate results and produce all manner of glitches (some of which you can use to detect some SD-produced images), the results are nearly always convincing at a quick glance.

diggan
0 replies
2h57m

It's just diffuse irradiance, visible in most real (and CGI) pictures although not as obvious as that example. Seems like a typical demo scene for a 3D renderer, so I bet that's why it's so prominent.

awongh
0 replies
1h31m

I think you have to conceptualize how diffusion models work, which is that once the green triangle has been put into the image in the early steps, the later generations will be influenced by the presence of it, and fill in fine details like reflection as it goes along.

The reason it knows this is that this is how any light in a real photograph works, not just CGI.

Or if your prompt was “A green triangle looking at itself in the mirror” then early generation steps would have two green triangle like shapes. It doesn’t need to know about the concept of light reflection. It does know about composition of an image based on the word mirror though.

iamgopal
1 replies
2h43m

Interesting is that Left and right taken from viewer’s perspective instead of red sphere’s perspective

ebertucc
0 replies
41m

How do you know which way the red sphere is facing? A fun experiment would be to write two prompts for "a person in the middle, a dog to their left, and a cat to their right", and have the person either facing towards or away from the viewer.

Workaccount2
1 replies
3h38m

Not bad, I'm curious of the output if you ask for a mirrored sphere instead.

svenmakes
0 replies
12m

This is actually the approach of one paper to estimate lighting conditions. Their strategy is to paint a mirrored sphere onto an existing image: https://diffusionlight.github.io/

Hugsun
1 replies
3h10m

That's very impressive!

yreg
0 replies
2h25m

It is! This isn't something orevious models could do.

Filligree
1 replies
25m

That's _amazing_.

I imagine this doesn't look impressive to anyone unfamiliar with the scene, but this was absolutely impossible with any of the older models. Though, I still want to know if it reliabily does this--so many other things are left to chance, if I need to also hit a one-in-ten chance of the composition being right, it still might not be very useful.

Feuilles_Mortes
0 replies
0m

What was difficult about it?

leumon
0 replies
1h4m

"When in doubt, scale it up." - openai.com/careers

btbuildem
8 replies
4h20m

That's nice, but could we please have an unsafe alternative? I would like to footgun both my legs off, thank you.

viraptor
3 replies
3h35m

Just wait some time. People release SD loras all the time. Once SD3 is open, you'll be able to get a patched model in days/weeks.

SV_BubbleTime
2 replies
2h40m

A blogger I follow had an article explaining that the NSFW models for SDXL, are just now SORT OF coming up to the quality of SD1.5 “pre safety” models.

It’s been 6 months and it still isn’t there. SD3 is going to be quite awhile if they’re baking “safety” in even harder.

viraptor
0 replies
2h31m

1.5 is still more popular than xl and 2 for reasons unrelated to safety. The size and generation speed matter a lot. This is just a matter of practical usability, not some idea of the model being locked down. Feed it enough porn and you'll get porn out of it. If people have incentive to do that (better results than 1.5), it really will happen within days.

Der_Einzige
0 replies
2h30m

Due to the pony community the SDXL nsfw models are far superior to SD1.5. Only issue is that controlnets don’t work with that pony SDXL fine tune

dougmwne
2 replies
3h59m

Since these are open models, people can fine tune them to do anything.

politician
1 replies
3h42m

It’s not obvious that fine-tuning can remove all latent compulsions from these models. Consider that the creators know that fine-tuning exists and have vastly more resources to explore the feasibility of removing deep bias using this method.

dougmwne
0 replies
3h34m

Go check out the Unstable Diffusion Discord.

wokwokwok
0 replies
3h39m

How would that be meaningfully different to SDXL?

I mean, SDXL is great. Until you’ve had a chance to actually use this model, isn’t calling it out for some imagined offence that may or may not exist seems like you’re drinking some Kool-aid rather than responding to something based in concrete actual reality.

You get access to it… and it does the google thing and puts people of colour in every frame? Sure, complain away.

You get access to it, you can’t even generate pictures of girls? Sure. Burn the house down.

…you haven’t even seen it and you’re already bitching about it?

Come on… give them a chance. Judge what it is when you see it not what you imagine it is before you’ve even had a chance to try it out…

Lots of models, free, multiple sizes, hot damn. This is cool stuff. Be a bit grateful for the work they’re doing.

…and even if sucks, it’s open. If it’s not what you want, you can retune it.

glimshe
7 replies
4h42m

This reinforces my impression that Google is at least one year behind. Stunning images, 3D, video while Gemini had to be partially halted this morning.

bamboozled
5 replies
4h33m

For "political" reasons, not for technical reasons. Don't get it twisted.

coeneedell
1 replies
4h23m

I would describe those issues as technical. It’s genuinely getting things wrong because the “safety” element was implemented poorly.

anononaut
0 replies
2h4m

Those are safety elements which exist for political reasons, not technical ones.

verticalscaler
0 replies
4h15m

You think that technology is first. You think that mathematicians and computer engineers or mechanical engineers or doctors are first. They’re very important, but they’re not first. They’re second. Now I’ll prove it to you.

There was a country that had the best mathematicians, the best physicists, the best metallurgists in the world. But that country was very poor. It’s called the Soviet Union. But when you took one of these mathematicians or physicists, who was smuggled out or escaped, put him on a plane and brought him to Palo Alto. Within two weeks, they were producing added value that could produce great wealth.

What comes first is markets. If you have great technology without markets, without a market-friendly economy, you’ll get nowhere. But if you have a market-friendly economy, sooner or later the market forces will give you the technology you want.

And that my friend, simply won't come from an office paralyzed by internal politics of fear and conformity. Don't get it twisted.

ethbr1
0 replies
4h23m

Of all criticism that could be leveled at Google, 'shipping a product and supporting it' being the only thing that matters seems fair.

Which takes all the behind the scenes steps, not just the technical ones.

TulliusCicero
0 replies
2h35m

I mean, it's kind of both? Making Nazis look diverse isn't just a political error, it's also a technical one. By default, showing Nazis should show them as they actually were.

chickenpotpie
0 replies
1h44m

I don't think that's a fair comparison because they're fulfilling substantially different niches. Gemini is a conversational model that can generate images, but is mainly designed for text. Stable Diffusion is only for images. If you compare a model that can do many things and a model that can only do images by how well they generate images, of course the image generation model looks better.

Stability does have an LLM, but it's not provided in a unified framework like Gemini is.

ametrau
7 replies
2h57m

“Safety” = safe to our reputation. It’s insulting how they imply safety from “harm”.

kingkawn
5 replies
2h42m

So they should dash their company on the rocks of your empty moral positions about freedom?

dingnuts
4 replies
2h39m

should pens be banned because a talented artist could draw a photorealistic image of something nasty happening to someone real?

mrighele
3 replies
2h32m

Photoshop and the likes (modern day's pens) should have an automatic check that you are not drawing porn, censor the image and report you to the authorities if it thinks it involves minors.

edit: yes it is sarcasm, though I fear somebody will think it is in fact the right way to go.

mtlmtlmtlmtl
0 replies
2h23m

That's ridiculous. What about real pens and paintbrushes? Should they be mandated to have a camera that analyses everything you draw/write just to be "safe"?

Maybe we should make it illegal to draw or write anything without submitting it to the state for "safety" analysis.

gambiting
0 replies
2h21m

I hope that's sarcasm.

IMTDb
0 replies
1h10m

Text editors and the likes (modern day's typewriters) should have an automatic check that you are not criticizing the government, censor the text and report you to the authorities if it thinks it an alternate political party.

Hopefully you are going to be absolutely shocked by the prospect of the above sentence. But as you can see, surveillance is a slippery slope. "Safety" is a very dangerous word because everybody wants to be "safe" but no one is really ready to define what "safe" actually means. The moment we start baking cultural / political / environmental preferences and biases in the tools we use to produce content, we allow other group of people with different views to use those "safeguards" to harm us or influence us in ways we might not necessarily like.

The safest notebook I can find is indeed a simple pen and paper because it does not know or care what is being written, it just does it's best regardless of how amazing or horrible the content is.

jameshart
0 replies
1h10m

Safety is also safe for people trying to make use of the technology at scale for most benign usecases.

Want to install a plugin into Wordpress to autogenerate fun illustrations to go at the top of the help articles in your intranet? You probably don’t want the model to have a 1 in 100 chance of outputting porn or extreme violence.

gat1
6 replies
4h49m

I guess we do not know anything about the training dataset ?

_1
3 replies
4h47m

It's ethical

wtcactus
0 replies
2h41m

Who decides what's ethical in this scenario? Is it some independent entity?

kranke155
0 replies
4h44m

"Ethical"

amirhirsch
0 replies
4h30m

The dataset is so ethical that it is actually just a press release and not generally available.

thelazyone
1 replies
3h31m

This is a good question - not only for the actual ethics of the training, but for the future of AI use for art. It's both gonna damage the livelyhood of many artists (me included, probably) but also make it accessibly to many more people. As long as the training dataset is ethical, I think fighting it is hard and pointless.

yreg
0 replies
2h3m

What data would you consider making the dataset unethical vs. ethical?

inference-lord
5 replies
4h31m

Cool but it's hard to keep getting "blown away" at this stage. The "incredible" is routine now.

dougmwne
3 replies
3h56m

At this point, the next thing that will blow me away is AGI at human expert level or a Gaussian Splat diffusion model that can build any arbitrary 3D scene from text or a single image. High bar, but the technology world is already full of dark magic.

inference-lord
0 replies
3h43m

Will ask it for immortality, endless wealth, and still get bored.

consumer451
0 replies
1h8m

I would be a big fan of solid infographics or presentation slides. That would be very useful.

attilakun
0 replies
3h44m

Is there a Guassian splat model that works without the "Structure from Motion" step to extract the point cloud? That feels a bit unsatisfying to me.

danparsonson
0 replies
4h24m

So... they should just stop?

alexb_
5 replies
4h45m

We believe in safe, responsible AI practices. This means we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors. Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment. In preparation for this early preview, we’ve introduced numerous safeguards. By continually collaborating with researchers, experts, and our community, we expect to innovate further with integrity as we approach the model’s public release.

What exactly does this mean? Will we be able to see all of the "safeguards" and access all of the technology's power without someone else's restrictions on them?

Tiberium
2 replies
4h43m

For SDXL this meant that there were almost no NSFW (porn and similar) images included in the dataset, so the community had to fine-tune the model themselves to make it generate those.

hhjinks
1 replies
4h38m

The community would've had to do that anyway. The SD1.5-based NSFW models of today are miles ahead of those from just a year ago.

Der_Einzige
0 replies
2h27m

And the pony SDXL nsfw model is miles ahead of SD1.5 NSFW models. Thank you bronies!

sschueller
1 replies
4h42m

No worries, the safeguards are only for the general public. Criminals will have no issues going around them. /s

SXX
0 replies
4h2m

Criminals? We dont care about those.

Think of childern! We must stop people from generating porn!

willsmith72
4 replies
4h44m

at this point perfect text would be a gamechanger if it can be solved

midjourney 6 can be completely photorealistic and include valid text, but also sometimes adds bad text. it's not much, but having to use an image editor for that is still annoying. for creating marketing material, getting perfect text every time and never getting bad text would be amazing

falcor84
3 replies
4h39m

I wonder if we could get it to generate a layered output, to make it easy to change just the text layer. It already creates the textual part in a separate pass, right?

spywaregorilla
1 replies
4h22m

Current open source tools include pretty decent off the shelf segment anything based detectors. It leaves a lot to be desired, but you do layer-like operations automatically detecting certain concept and applying changes to them or, less commonly exporting the cropped areas. But not the content "beneath" the layers as they don't exist.

snovv_crash
0 replies
3h9m

Which tools would you recommend for this kind of thing?

deprecative
0 replies
4h33m

I would bet that Adobe is definitely salivating at that. Might not be for a long time but it seems like a no brainer once the technology can handle it. Just the last few years have been fast and I interacted with the JS landscape for a few years. It moves faster than Sonic and this tech iterates quick.

londons_explore
4 replies
4h35m

All the demo images are 'artwork'.

will the model also be able to produce good photographs, technical drawings, and other graphical media?

spywaregorilla
2 replies
4h27m

Photorealism is well within current capabilities. Technical drawings absolutely not. Not sure what other graphical media includes.

sweezyjeezy
0 replies
3h51m

Yeah but try getting e.g. Dall-E 3 to do photorealism, I think they've RLHF'd the crap out of it in the name of safety.

Jensson
0 replies
3h48m

Not sure what other graphical media includes.

I'd want a model that can draw website designs and other UIs well. So I give it a list of things in the UI, and I get back a bunch of UI design examples with those elements.

Sharlin
0 replies
3h50m

Photographs, digital illustrations, comic or cartoon style images, whatever graphical style you can imagine are all easy to achieve with current models (though no single model is a master of all trades). Things that look like technical drawings are as well, but don't expect them to make any sense engineering-wise unless maybe if you train a finetune specifically for that purpose.

amelius
4 replies
4h38m

Does anyone know of a good tutorial on how diffusion models work?

jasonjmcghee
1 replies
3h25m

https://jalammar.github.io/illustrated-stable-diffusion/

His whole blog is fantastic. If you want more background (e.g. how transformers work) he's got all the posts you need

amelius
0 replies
2h3m

This looks nice, thank you, but I'm looking for a more hands-on tutorial, with e.g. Python code, like Andrej Karpathy makes them.

spaceheater
0 replies
4h25m
Ologn
0 replies
4h34m

I liked this 18 minute video ( https://www.youtube.com/watch?v=1CIpzeNxIhU ). Computerphile has other good videos with people like Brian Kernighan.

FloatArtifact
4 replies
4h43m

I'm curious to know if they're safeguards are eliminated when users find tune the model?

pmx
3 replies
4h22m

There are some VERY nsfw model fine tunes available for other versions of SD

witcH
2 replies
4h9m

such as?

mdrzn
1 replies
4h4m

Check out civitai.com for finetuned models for a wide range of uses

AuryGlenz
0 replies
3h6m

I believe you need to be signed in to see the NSFW stuff, for what it's worth.

lreeves
3 replies
4h37m

People in this discussion seem to be hand-wringing about Stability's "saftey" comments but every model they've released has been fine tuned for porn in like 24 hours.

mopierotti
2 replies
4h30m

That's not entirely true. This wasn't the case for SD 2.0/2.1, and I don't think SD 3.0 will be available publicly for fine tuning.

viraptor
0 replies
3h28m

2 is not popular because people have better quality results with 1.5 and xl. That's it. If 3 is released and works better, it will be fine tuned too.

lreeves
0 replies
4h28m

SD 2 definitely seems like an anomaly that they've learned from though and was hard for everyone to use for various reasons. SDXL and even Cascade (the new side-project model) seems to be embraced by horny people.

123yawaworht456
3 replies
4h42m

This preview phase, as with previous models, is crucial for gathering insights to improve its performance and safety ahead of an open release.

oh, for fuck's sake.

memossy
2 replies
4h39m

We did this for every stable diffusion release, you get the feedback data to improve it continuously ahead of open release.

123yawaworht456
1 replies
4h21m

I was referring to 'safety'. how the hell can an image generation model be dangerous? we had software for editing text, images, videos and audio for half a century now.

Jensson
0 replies
3h43m

Advertisers will cancel you if you do anything they don't like, 'safety' is to prevent that.

the_duke
2 replies
3h51m

So, they just announced StableCascade.

Wouldn't this v3 supersede the StableCascade work?

Did they announce it because a team had been working on it and they wanted to push it out to not just lose it as an internal project, or are there architectural differences that make both worthwile?

whywhywhywhy
0 replies
3h0m

There's architectural differences, although I found Stable Cascade a bit underwhelming, while yes it can actual manage text, the text it does manage just looks like someone just wrote text over the image it doesn't feel integrated a lot of the time.

SD3 seems to be more towards SOTA, not sure why Cascade took so long to get out, seemed to be up and running months ago

Kubuxu
0 replies
3h38m

I think of the SD3 as a further evolution of SD1.5/2/XL and StableCascade as a branching path. It is unclear which will be better in the long term, so why not cover both directions if they have the resources to do so?

sjm
2 replies
2h54m

The example images look so bad. Absolutely zero artistic value.

wongarsu
0 replies
2h37m

From a technical perspective they are impressive. The depth of field in the classroom photo and the macro shot. The detail in the chameleon. The perfect writing in very different styles and fonts. The dust kicked up by the donut.

The artistic value is something you have to add with a good prompt with artistic vision. These images are probably the AI equivalent of "programmer art". It fulfills its function, but lacks aesthetic considerations. I wouldn't attribute that to the model just yet.

the_duke
0 replies
2h27m

I'm willing to bet that they are avoiding artistic images on purpose to not get any heat from artists feeling ripped off, which did happen previously.

pama
2 replies
4h4m

I wish they put out the report already. Has anyone else published a preprint combining ideas similar to diffusion transformers and flow matching?

lairv
1 replies
3h34m

Pretty exciting indeed to see they used flow matching, which have been unpopular for the last few years

memossy
0 replies
3h26m

It'll be out soon, doing benchmark tests etc

deepsdev
2 replies
4h27m

Can we use it create SORA like videos?

nickthegreek
0 replies
2h46m

No.

memossy
0 replies
3h25m

If we trained it with videos yes but need more GPUs for that.

bsaul
2 replies
3h45m

Anyone knows which AI could be used to generate UI design elements ? (such as "generate a real estate app widget list") as well as the kind of prompts one would use to obtain good results ?

I'm only now investigating using AI to increase velocity in my projects, and the field is moving so fast, i'm a bit outdated.

kevinbluer
0 replies
2h35m

v0 by Vercel could be worth a look: https://v0.dev

From the FAQ: "v0 is a generative user interface system by Vercel powered by AI. It generates copy-and-paste friendly React code based on shadcn/ui and Tailwind CSS that people can use in their projects"

gwern
0 replies
2h14m

If by design elements you include vector images, you could try https://www.recraft.ai/ or Adobe Firefly 2 - there's not a lot of vector work right now, so your choices are either the handful of vector generators, or just bite the bullet and use eg DALL-E 3 to generate raster images you convert to SVG/recreate by hand.

(The second is what we did for https://gwern.net/dropcap because the PNG->SVG filesizes & quality were just barely acceptable for our web pages.)

101008
2 replies
3h50m

What's the best way to use SD (3 or 2) online? I can't run it on my PC and I want to do some experiments to generate assets for a POC videogame I'm working on. I pay MidJOurney and I woulnd't mind pay something like 5 or 10 dollars per month to experiment with SD, but I can't find anything.

Liquix
0 replies
3h0m

poke around stablediffusion.fr and trending public huggingface spaces

Gracana
0 replies
3h43m

I used Rundiffusion for a while before I bought a 4090, and I thought their service was pretty nice. You pay for time on a system of whatever size you choose, with whatever tool/interface you select. I think it's worth tossing a few bucks into it to try it out.

satisfice
1 replies
4h48m

Can it make a picture of a woman chasing a bear?

The old one can't.

cheald
0 replies
4h1m

SD 1.5 (using RealisticVision 5.1, 20 steps, Euler A) spit out something technically correct (but hilarious) in just a few generations.

"a woman chasing a bear, pursuit"

https://i.imgur.com/RqCXVYC.png

robertwt7
1 replies
2h53m

It’ll be interesting to see what “safety” means in this case given the censorship in diffuser models nowadays. Look what’s happening with Gemini, it’s quite scary really how different companies have different censorship values

I’ve had some fair share of frustation with DallE as well when trying to generate weapon images for game assets. Had to tweak a lot of my prompt

yreg
0 replies
1m

it’s quite scary really how different companies have different censorship values

The fact that they have censorship values is scary. But the fact that those are different is better than the alternative.

pqdbr
1 replies
4h53m

The sample images are absolutely stunning.

Also, I was blown away by the "Stable Diffusion" written on the side of the bus.

kzrdude
0 replies
4h13m

Is it just me or is the stable diffusion bus image broken in the background? The bus back there does not look logical w.r.t placement and size relative to the sidewalk.

miohtama
1 replies
2h30m

No model. Half of the announcement text is “we area really really responsible and safe, believe us.”

Kind of a dud for an announcement.

nextworddev
0 replies
42m

The company itself is about to go run out of money hence the Hail Mary at trying to get acquired

cuckatoo
1 replies
4h11m

NSFW fine tune when? Or will "safety" win this time?

SXX
0 replies
4h4m

They need to release model first. Then it's will be fine-tuned.

GenericPoster
1 replies
18m

The talk of "safety" and harm in every image or language model release is getting quite boring and repetitive. The reasons why it's there is obvious and there are known workarounds yet the majority of conversations seems to be dominated by it. There's very little discussion regarding the actual technology and I'm aware of the irony of mentioning this. Really wish I could filter out these sorts of posts.

Hopefuly it dies down soon but I doubt it. At least we don't have to hear garbage about "WHy doEs opEn ai hAve oPEn iN thE namE iF ThEY aReN'T oPEN SoURCe"

learningerrday
0 replies
8m

I hope the safety conversation doesn't die. The societal effects of these technologies are quite large, and we should be okay with creating the space to acknowledge and talk about the good and the bad, and what we're doing to mitigate the negative effects. In any case, even though it's repetitive, there exists someone out there on the Interwebs who will discover that information for the first time today (or whenever the release is), and such disclosures are valuable. My favorite relevant XKCD comic: https://xkcd.com/1053/

wtcactus
0 replies
4h15m

I notice they are avoiding images of people in the announcement.

I wonder if they are afraid of the same debacle as google AI and what they mean by "safety" is actually heavy bias against white people and their culture like what happened with Gemini.

treesciencebot
0 replies
4h40m

Quite nice to see diffusion transformers [0] becoming the next dominant architecture on the generative media.

[0]: https://twitter.com/EMostaque/status/1760660709308846135

spywaregorilla
0 replies
4h28m

Impressive text in the images.

redder23
0 replies
4h9m

Horrible website, hijacks scrolling. I have my scrolling speed up with Chromium Wheel Smooth Scroller. This website's scrolling is extremely slow, so the extension is not working because they are "doing it wrong" TM and somehow hijack native scrolling and do something with it.

poulpy123
0 replies
4h39m

Didn't they released another model few days ago ?

kbumsik
0 replies
4h47m

So there is no license information yet?

k__
0 replies
5m

So, they block all bad actors, but themselves?

iterateAutomate
0 replies
16m

What is with these names haha, Stable Diffusion XL 1.0 and now to Stable Diffusion 3??

haolez
0 replies
2h19m

Rewriting the "safety" part, but replacing the AI tool with an imaginary knife called Big Knife:

"We believe in safe, responsible knife practices. This means we have taken and continue to take reasonable steps to prevent the misuse of Big Knife by bad actors."

declan_roberts
0 replies
2h33m

Can it generate an image of people without injecting insufferable diversity quotas into each image? If so then it’s the most advanced model on the internet right now!

coldcode
0 replies
4h29m

No details in the announcement, is it still pixel size in = pixel size out?

animex
0 replies
2h18m

Ugh, another startup(?) requiring Discord to use their product. :(

SubiculumCode
0 replies
2h55m

It is interesting to me that these diffusion image models are so much smaller than the LLMs.

PcChip
0 replies
4h49m

The text/spelling part is a huge step forward