Half of the announcement talks about safety. The next step will be these control mechanisms being built into all sorts of software I suppose.
It's "safe" for them, not for the users, at least they should make that clear.
Half of the announcement talks about safety. The next step will be these control mechanisms being built into all sorts of software I suppose.
It's "safe" for them, not for the users, at least they should make that clear.
"we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors"
It's kind of a testament to our times that the person who chooses to look at synthetic porn instead of supporting a real-life human trafficking industry is the bad actor.
I don't think the problem is watching synthetic images. The problem is generating them based off actual people and sharing them on the internet in a way that the people watching can't tell the difference anymore. This was already somewhat of a problem with Photoshop and once everyone with zero skills can do it in seconds and with far better quality, it will become a nightmare.
once everyone with zero skills can do it in seconds and with far better quality, it will become a nightmare.
Will it be a nightmare? If it becomes so easy and common that anyone can do it, then surely trust in the veracity of damaging images will drop to about 0. That loss of trust presents problems, but not ones that "safe" AI can solve.
surely trust in the veracity of damaging images will drop to about 0
Maybe, eventually. But we don't know how long it will take (or if it will happen at all). And the time until then will be a nightmare for every single woman out there who has any sort of profile picture on any website. Just look at how celebrity deepfakes got reddit into trouble even though their generation was vastly more complex and you could still clearly tell that the videos were fake. Now imagine everyone can suddenly post undetectable nude selfies of your girlfriend on nsfw subreddits. Even if people eventually catch on, that first shock will be unavoidable.
Your anxiety dream relies on there currently being some technical bottleneck limiting the creation or spread of embarassing fake nudes as a way of cyberbullying.
I don't see any evidence of that. What I see is that people who want to embarass and bully others are already fully enabled to do so, and do so.
It seems more likely to me and many of us that the bottleneck that stops it from being worse is simply that only so many people think it's reasonable or satisfying to distribute embarassing fake nudes of someone. Society already shuns it and it's not that effective as a way of bullying and embarassing people, so only so many people are moved to bother.
Assuming that the hyped up new product is due to swoop in and disrupt the cyberbullying "industry" is just a classic technologist's fantasy.
It ignores all the boring realities of actual human behavior, social norms, and secure equilibriums, etc; skips any evidence building or research effort; and just presumes that some new technology is just sooooo powerful that none of that prior ground truth stuff matters.
I get why people who think that way might be on HN or in some Silicon Valley circles, but it can be one of the eyeroll-inducing vices of these communities as much as it can be one of its motivational virtues.
This: it won't happen immediately and I'd go even further to say that it even if trust in images drops to zero, it's still going to generate a lot of hell.
I've always been able to say all sorts of lies. People have known for millennia that lies exist. Yet lies still hurt people a ton. If I say something like, "idle_zealot embezzled from his last company," people know that could be a lie (and I'm not saying you did, I have no idea who you are). But that kind of stuff can certainly hurt people. We all know that text can be lies and therefore we should have zero trust in any text that we read - yet that isn't how things play out in the real world.
Images are compelling even if we don't trust that they're authentic. Hell, paintings were used for thousands of years to convey "truth", but a painting can be a lie just as much as text or speech.
We created tons of religious art in part because it makes the stories people want others to believe more concrete for them. Everyone knows that "Christ in the Storm on the Sea of Galilee" isn't an authentic representation of anything. It was painted in 1633, more than a century and a half after the event was purported to have happened. But it's still the kind of thing that's powerful.
An AI generated image of you writing racist graffiti is way more believable to be authentic. I have no reason to think you'd do such a thing, but it's within the realm of possibility. There's zero possibility (disregarding supernatural possibilities) that Rembrandt could accurately represent his scene in "Christ in the Storm on the Sea of Galilee". What happens when all the search engine results for your name start calling you a racist - even when you aren't?
The fact is that even when we know things can be faked, we still put a decent amount of trust in them. People spread rumors all the time. Did your high school not have a rumor mill that just kinda destroyed some kids?
Heck, we have right-wing talking heads making up outlandish nonsense that's easily verifiable as false that a third of the country believes without questioning. I'm not talking about stuff like taxes or gun control or whatever - they're claiming things like schools having to have litter boxes for students that identify as cats (https://en.wikipedia.org/wiki/Litter_boxes_in_schools_hoax). We know that people lie. There should be zero trust in a statement like "schools are installing litter boxes for students that identify as cats." Yet it spread like crazy, many people still believe it despite it being proven false, and it has been used to harm a lot of LGBT students. That's a way less believable story than an AI image of you with a racist tattoo.
Finally, no one likes their name and image appropriated for things that aren't them. We don't like lies being spread about us even if 99% of people won't believe the lies. Heck, we see Donald Trump go on rants about truthful images of him that portray his body in ways he doesn't like (and they're just things like him golfing, but an unflattering pose). I don't want fake naked images of me even if they're literally labeled as fake. It still feels like an invasion of privacy and in a lot of ways it would end up that way - people would debate things like "nah, her breasts probably aren't that big." Words can hurt. Images can hurt even more - even if it's all lies. There's a reason why we created paintings even when we knew that paintings weren't authentic: images have power and that power is going to hurt people even more than the words we've always been able to use for lies.
tl;dr: 1) It will take a long time before people's trust in images "drops to zero"; 2) Even when people know an image isn't real, it's still compelling - it's why paintings have existed and were important politically for millennia; 3) We've always known speech and text can be lies, but we regularly see lies believed and hugely damage people's lives - and images will always be more compelling than speech/text; 4) Even if no one believes something is true, there's something psychologically damaging about someone spreading lies about you - and it's a lot worse when they can do it with imagery.
The tide is rolling in and we have two options... yell at the tide really loud that we were here first and we shouldn't have to move... or get out of the way. I'm a lot more sympathetic to the latter option myself.
Let me give you a specific counterexample: it's easy and common to generate phishing emails. Trust in email has not dropped to the degree that phishing is not a problem.
Phishing emails mostly work because they apparently come from a trusted source, though. The key is that they fake the source, not that people will just trust random written words just because they are written, as they do with videos.
A better analogy would be Nigerian prince emails, but only a tiny minority of people believe those... or at least that's what I want to think!
The trusted source thing is important, but there's some degree of evidence that videos and images generate trust in a source, I think?
That's the point. They do, but they no longer should. Our technical capabilities for lying have begun to overwhelm the old heuristics, and the sooner people realise the better.
if it becomes so easy and common that anyone can do it, then surely trust in the veracity of damaging images will drop to about 0.
Spend more time on Facebook and you'll lose your faith in humanity.
I've seen obviously AI generated pictures of a 5 year old holding a chainsaw right next to a beautiful wooden sculpture, and the comments are filled with boomers amazed at that child's talent.
There are still people that think the IRS will call them and make them pay their taxes over the phone with Apple gift cards.
If we follow the idea of safety, should we restrict the internet so either such users can safely use the internet (and phones, gift cards, technology in general) without being scammed, or otherwise restrict it so that at risk individuals can't use the technology at all?
Otherwise, why is AI specifically being targeted, other than the fear of new things that looks similar to the moral panics of video games.
In concept this is maybe desirable; boot anyone off the internet that isn't able to use it safely.
In reality this is a disaster. The elderly and homeless people are already being left behind massively by a society that believes internet access is something everybody everywhere has. This is somewhat fine when the thing they want to access is twitter (and even then, even with the current state of twitter, who are you to judge who should and should not be on it?), but it becomes a Major Problem™ when the thing they want to access is their bank. Any technological solutions you just thought about for this problem are not sufficient when we're talking about "Can everybody continue to live their lives considering we've kinda thrust the internet on them without them asking"
If it becomes so easy and common that anyone can do it, then surely trust in the veracity of damaging images will drop to about 0
People believe plenty of just written words - which are extremely easy to "fake", you just type them. Why has that trust not dropped to about 0?
Exactly. They are giving people's deductive reasoning skills too much credit.
It kind of has? People believe written words when they come from a source that they consider, erroneously or not, to be trustworthy (newspaper, printed book, Wikipedia, etc.). They trust the source, not the words themselves just due to being written somewhere.
This has so far not been true of videos (e.g. a video of a celebrity from a random source has typically been trusted by laypeople) and should change.
Arguably that loss of trust would be a net positive.
We are already there, you can no longer trust any image or video you see, so what is the point? Bad actors will still be able to create fake images and videos as they already do. Limiting it for the average user is stupid.
You guys know you can just draw porn, right?
Generating porn is easier and cheaper. You don’t have to spend the time learning to draw naked bodies, which can be substantial. (The joke being that serious drawers go through the draw naked model sessions a lot, but it isn’t porn)
but it isn’t porn
In my experience with 2D artists, studying porn is one of their favorite forms of naked model practice.
The models art schools get for naked drawing sessions usually aren’t that attractive, definitely not at a porn ideal. The objective is to learn the body, not become aroused.
There is a lot of (mostly non realistic) porn that comes out of art school students via the skills they gain.
We are not actually there yet. First, you still need some technical understanding and a somewhat decent setup to run these models yourself without the guardrails. So the average greasy dude who wants to share HD porn based on your daugther's linkedin profile pic on nsfw subreddits still has too many hoops to jump through. Right now you can also still spot AI images pretty easily, if you know what to look for. Especially for previous stable diffusion models. But all of this could change very soon.
But just like privacy issues, this'll be possible.
It's only bad because society still hasn't normalised sex, from a gay perspective y'all are prude af.
It's a shortcut, for us to just accept that these social ideals and expectations will have to change so we may as well do it now.
In 100 years, people will be able to make a personal AI that looks, sounds and behaves like any person they want and does anything they want. We'll have thinking dust, you can already buy cameras like a mm^2, in the future I imagine they'll be even smaller.
At some point it's going to get increasingly unproductive trying to safeguard technology without people's social expectations changing.
Same thing with Google Glass, shunned pretttty much exclusively bc it has a camera on it (even tho phones at the time did too), but now we got Ray Bans camera glasses and 50 years from now all glasses will have cameras, if we even still wear them.
Yes this. This is what I've been trying to explain to my friends.
When Tron came out in 1982, it was disliked because back then using CGI effects was considered "cheating". Then awhile later Pixar did movies entirely with CGI and they were hits. Now almost every big studio movie uses CGI. Shunned to embraced in like, 13 years.
I think over time the general consensus's views about AI models will soften. Although it might take longer in some communities. (Username checks out lol, furry here also. I think the furs may take longer to embrace it.)
(Also, people will still continue to use older tools like Photoshop to accomplish similar things.)
I watched an old Tom Scott video of him predicting what the distant year 2030 would look like. In his talk, he mentioned privacy becoming something quaint that your grandparents used to believe in.
I’ve wondered for a while if we just adapt to the point that we’re unfazed by fake nude photos of people. The recent Bobbi Althoff “leaks” reminded me of this. That’s a little different since she’s a public figure, but I really wonder if we just go into the future assuming all photos like that have been faked, and if someone’s iCloud gets leaked now it’ll actually be less stressful because 1. They can claim it’s AI images, or 2. There’s already lewd AI images of them, so the real ones leaking don’t really make much of a difference.
There's an argument that privacy (more accurately anonymity) is a temporary phenomenon, a consequence of the scale that comes with industrialization. We didn't really have it in small villages, and we won't really have it in the global village.
(I'm not a fan of the direction, but then I'm a product of stage 2).
serious question, is that really that hard to remove personal information from training data so model does not know how specific public figures look like?
I believe this worked with nudity and model when asked generated "smooth" intimate regions (like some kind of doll)
so you could ask for eg. generic president but not any specific one, so it would be very hard to generate anyone specific
Proprietary, inaccessible models can somewhat do that. Locally hosted models can simply be trained on what a specific person looks like by the user, you just need a couple dozen photos. Keyword: LoRA.
Perhaps I'm being overly contrarian, but from my point of view, I feel that could be a blessing in disguise. For example, in a world where deepfake pornography is ubiquitous, it becomes much harder to tarnish someone's reputation through revenge porn, real or fake. I'm reminded of Syndrome from The Incredibles: "When everyone is super no one will be."
The censuring of porn content exists for PR reasons. They just want to have a way to say "we tried to prevent it". If anyone wants to generate porn, then it just needs 30 min of research to find the huge amount of models based on stable diffusion with nsfw content.
If you can generate synthetic images and have a channel to broadcast them, then you could generate way bigger problems then fake celebrity porn.
Not saying that it is not a problem, but rather that it is a problem inherent to the whole tool, not to some specific subjects.
If that ever becomes an actual problem, our entire society will be at a filter point.
This is the problem with these kind of incremental mitigations philosophically -- as soon as the actual problem were to manifest it would instantly become a civilization-level threat that would only be resolved with drastic restructuring of society.
Same logic for an AI that replaces a programmer. As soon as AI is that advanced the problem requires vast changes.
Incremental mitigations don't do anything.
I'll challenge this idea and say that once it becomes ubiquitous, it actually does more good than harm. Things like revenge porn become pointless if there's no way to prove it's even real, and I have yet to ever see deep fakes of porn amount to anything.
Agree, I think it fundamentally stems from the old conservative view that porn = bad. Morally policing such models is questionable.
Horeshoe theory [1] is one of the most interesting viewpoints I've been introduced to recently.
Both sides view censorship as a moral prerogative to enforce their world view.
Some conservatives want to ban depictions of sex.
Some conservatives want to ban LGBT depictions.
Some women's rights folks want to ban depictions sex. (Some view it as empowerment, some view it as exploitation.)
Some liberals want to ban non-diverse, dangerous representation.
Some liberals want to ban conservative views against their thoughts.
Some liberals want to ban religion.
...
It's team sports with different flavors on each side.
The best policy, IMO, is to avoid centralized censorship and allow for individuals to control their own algorithmic boosting / deboosting.
Yes and no.
I mean, a lot of moderates would like to avoid seeing any extreme content, regardless of whether it is too much left, right, or just in a non-political uncanny valley.
While the Horseshoe Theory has some merits (e.g., both left and right extremes may favor justified coercion, have the we-vs-them mentality, etc), it is grossly oversimplified. Still, a very simple (yet two-dimensional) model of Political Compass is much better.
I think it's just a different projection to highlight similarities in left and right and is by no means the only lens to use.
The fun quirk is that there are similarities, and this model draws comparison front and center.
There are multiple useful models for evaluating politics, though.
I don't think there are any (even far) leftwanting to ban non-diverse representation. I think it's impossible to ban 'conservative thoughts' because that's such a poorly defined phrase. However there are people who want to ban religion. One difference is that a much larger proportion of far right (almost all of them) want to ban lgbtq depiction and existence compared to the number of far left who want to ban religion or non-diverse representation.
It says on the wikipedia article itself 'The horseshoe theory does not enjoy wide support within academic circles; peer-reviewed research by political scientists on the subject is scarce, and existing studies and comprehensive reviews have often contradicted its central premises, or found only limited support for the theory under certain conditions.'
no AI company wants to be the one generating pornographic deepfakes of someone and getting in legal / PR hot water
Which is why this should be a much more decentralized effort. Hard to take someone to court when it's not one single person or company doing something.
But what if you flip the things the other way around; deepfake porn is problematic not because porn is per se problematic but because deepfake porn or deepfake revenge porn is made without consent, but what if you give consent to some AI company or porn company to make porn content of you. I see this as evolution of OnlyFans where you could make AI generated deepfake porn of yourself.
Another use case would be that retired porn actors could license their porn persona (face/body) to some AI porn company to make new porn.
I see big business opportunity in the generative AI porn.
This is why I think generative AI tech should either be banned or be completely open sourced. Mega tech corporations are plenty of things already, they don't need to be the morality police for our society too.
Even if it is all open sourced, we still have the structural problem of training models large enough to do interesting stuff.
Until we can train incrementally and distribute the workload scalably, it doesn't matter how open the models / methods for training are if you still need a bajilllion A100 hours to train the damn things.
It is not only about morals but the incentives of parties. The need for sexual-explicit content is bigger than, say, for niche artistic experiments of geometrical living cupboards owned by a cybernetic dragon.
Stability AI, very understandably, does not want to be associated with "the porn-generation tool". And if, even occasionally, it generates criminal content, the backslash would be enormous. Censoring the data requires effort but is (for companies) worth it.
The term "bad actor" is starting to get cringe.
Ronald Reagan was a bad actor.
George Bush wore out "evildoers"?
Where next... fiends, miscreants, baddies, hooligans, deadbeats?
Dastardly digital deviants Batman!
From: https://twitter.com/EMostaque/status/1760660709308846135
Some notes:
- This uses a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements.
- This takes advantage of transformer improvements & can not only scale further but accept multimodal inputs..
- Will be released open, the preview is to improve its quality & safety just like og stable diffusion
- It will launch with full ecosystem of tools
- It's a new base taking advantage of latest hardware & comes in all sizes
- Enables video, 3D & more..
- Need moar GPUs..
- More technical details soon
Can we create videos similar like sora
Given enough GPUs and good data yes.
How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able to have fun with it ?
Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment.
(adding some later replies)
awesome. I assume these aren't heavily cherry picked seeds?
No this is all one generation. With DPO, refinement, further improvement should get better.
Do you have any solves coming for driving coherency and consistency across image generations? For example, putting the same dog in another scene?
yeah see @Scenario_gg's great work with IP adapters for example. Our team builds ComfyUI so you can expect some really great stuff around this...
Dall-e often doesn’t even understand negation, let alone complex spatial relations in combination with color assignments to objects.
Imagine the new version will. DALLE and MJ are also pipelines, you can pretty much do anything accurately with pipelines now.
Nice. Is it an open-source / open-parameters / open-data model?
Like prior SD models it will be open source/parameters after the feedback and improvement phase. We are open data for our LMs but not other modalities.
Cool!!! What do you mean by good data? Can it directly output videos?
If we trained it on video yes, it is very much like the arch of sora.
- Need moar GPUs..
Why is there not a greater focus on quantization to optimize model performance, given the evident need for more GPU resources?
We have highly efficient models for inference and a quantization team.
Need moar GPUs to do a video version of this model similar to Sora now they have proved that Diffusion Transformers can scale with latent patches (see stablevideo.com and our work on that model, currently best open video model).
We have 1/100th of the resources of OpenAI and 1/1000th of Google etc.
So we focus on great algorithms and community.
But now we need those GPUs.
Don't fall for it: OpenAI is microsoft. They have as much as google, if not more.
To be clear here, you think that Microsoft has more AI compute than Google?
Yes, they have deep pockets and could increase investment if needed. But the actual resources devoted today are public, and in line with the parent said.
This isn’t OpenAI that make GPTx.
It’s StabilityAI that makes Stable Diffusion X.
Google got cheap TPU chips, means they circumvent the extremely expensive Nvidia corporate licenses. I can easily see them having 10x the resources of OpenAI for this.
can someone explain why nVidia doesn't just hold their own AI? And literally devote 50% of their production to their own compute center? In an age where even ancient companies like Cisco are getting in the AI race, why wouldn't the people with the keys to the kingdom get involved?
Jensen was just talking about a new kind of data center: AI-generation factories.
1. the real keys to the kingdom are held by TSMC whose fab capacity rules the advanced chips we all get, from NVIDIA to Apple to AMD to even Intel these days.
2. the old advice is to sell shovels during a gold rush
"The people that made the most money in the gold rush were selling shovels, not digging gold".
I believe he means for training
> all sorts of edge to giant GPU deployment.
Soon the GPU and its associated memory will be on different cards, as once happened with CPUs. The day of the GPU with ram slots is fast approaching. We will soon plug terabytes of ram into our 4090s, then plug a half-dozen 4090s into a raspberry PI to create a Cronenberg rendering monster. Can it generate movies faster than Pixar can write them? Sure. Can it play Factorio? Heck no.
Any seperation of a GPU from its VRAM is going to come at the expense of (a lot of) bandwidth. VRAM is only as fast as it is because the memory chips are as close as possible to the GPU, either on seperate packages immediately next to the GPU package or integrated onto the same package as the GPU itself in the fanciest stuff.
If you don't care about bandwidth you can already have a GPU access terabytes of memory across the PCIe bus, but it's too slow to be useful for basically anything. Best case you're getting 64GB/sec over PCIe 5.0 x16, when VRAM is reaching 3.3TB/sec on the highest end hardware and even mid-range consumer cards are doing >500GB/sec.
Things are headed the other way if anything, Apple and Intel are integrating RAM onto the CPU package for better performance than is possible with socketed RAM.
Is there a way to partition the data so that a given GPU had access to all the data it needs but the job itself was parallelized over multiple GPUs?
Thinking on the classic neural network for example, each column of nodes would only need to talk to the next column. You could group several columns per GPU and then each would process its own set of nodes. While an individual job would be slower, you could run multiple tasks in parallel, processing new inputs after each set of nodes is finished.
Of course, this is common with LLMs which are too large to fit in any single GPU. I believe Deepspeed implements what you're referring to.
That depends on whether performance or capacity is the goal. Smaller amounts of ram closer to the processing unit makes for faster computation, but AI also presents a capacity issue. If the workload needs the space, having a boatload of less-fast ram is still preferable to offloading data to something more stable like flash. That is where bulk memory modules connected though slots may one day appear on GPUs.
I don’t think you really understand the current trends in computer architecture. Even cpus are being moved to have on package ram for higher bandwidth. Everything is the opposite of what you said.
Higher bandwidth but lower capacity. The real trend is different physical architectures for different compute loads. There is a place in AI for bulk albeit slower memory such as extremely large date sets that want to run internally on a discreet card without involving pci lanes.
I doubt it. The latest GPUs utilize HBM which is necessarily part of the same package as the main die. If you had a RAM slot for a GPU you might as well just go out to system RAM, way too much latency to be useful.
I'm curious - where are the GPUs with decent processing power but enormous memory? Seems like there'd be a big market for them.
H200 has 141GB, B100 (out next month) will probably have even more. How much memory do you need?
We need 128gb with a 4070 chip for about 2000 dollars. Thats what we want.
Nvidia is making way too much money keeping cards with lots of memory exclusive to server GPUs they sell with insanely high margins.
AMD still suffers from limited resources and doesn't seem willing to spend too much chasing a market that might just be a temporary hype, Google's TPUs are a pain to use and seem to have stalled out, and Intel lacks commitment, and even their products that went roughly in that direction aren't a great match for neural networks because of their philosophy of having fewer more complex cores.
MacBooks with M2 or M3 Max. I’m serious. They perform like a 2070 or 2080 but have up to 128GB of unified memory, most of which can be used as VRAM.
I dream of AMD or Intel creating cards to do just that
I’ll bet you the Nvidua 50xx series will have cards that are asymmetric for this reason. But nothing that will cannibalize their gaming market.
You’ll be able to get higher resolution but slowly. Or pay the $2800 for a 5090 and get high res with good speed.
SD 1.5 is 983m parameters, SDXL is 3.5b, for reference.
Very interesting. I've been streching my 12GB 3060 as far as I can; it's exciting that smaller hardware is still usable even with modern improvements.
800m is good for mobile, 8b for graphics cards.
Bigger than that is also possible, not saturated yet but need more GPUs.
you ca also quantisation which lowers memory requirements at a small lose of performance.
I am going to look at quantization for 8b. But also, these are transformers, so variety of merging / Frankenstein-tune is possible. For example, you can use 8b model to populate the KV cache (which computes once, so can load from slower devices, such as RAM / SSD) and use 800M model for diffusion by replicating weights to match layers of the 8b model.
I understand that Sora is very popular, so it makes sense to refer to it, but when saying it is similar to Sora, I guess it actually makes more sense to say that it uses a Diffusion Transformer (DiT) (https://arxiv.org/abs/2212.09748) like Sora. We don't really know more details on Sora, while the original DiT has all the details.
Is anyone else struck by the similarities in textures between the images in the appendix of the above "Scalable Diffusion Models with Transformers" paper?
If you size the browser window right, paging with the arrow keys (so the document doesn't scroll) you'll see (eg, pages 20-21) the textures of the parrot's feathers are almost identical to the textures of bark on the tree behind the panda bear, or the forest behind the red panda is very similar to the undersea environment.
Even if I'm misunderstanding something fundamental here about this technique, I still find this interesting!
Could be that they’re all generated from the same seed. And we humans are really good at spotting patterns like that.
I guess we should count our blessings and be grateful that literacy, the printing press, computers and the internet became normalised before this notion of "harm" and harm prevention was. Going forward, it's hard to imagine how any new technology that is unconditionally intellectually empowering to the individual will be tolerated; after all, just think of the harms someone thus empowered could be enabled to perpetrate.
Perhaps eventually, once every forum has been assigned a trust-and-safety team and word processor has been aligned and most normal people have no need for communication outside the Metaverse (TM) in their daily lives, we will also come around to reviewing the necessity of teaching kids to write, considering the epidemic of hateful graffiti and children being caught with handwritten sexualised depictions of their classmates.
unconditionally intellectually empowering
What makes you think those who’ve worked hard over a lifetime to provide (with no compensation) the vast amounts of data required for these — inferior by every metric other than quantity — stochastic approximations of human thought should feel empowered?
I think the genAI / printing press analogy is wearing rather thin now.
WHO exactly worked hard over a lifetime with no compensation?
By compensation I mean from the companies creating the models, like OpenAI.
Computers and drafters had their work taken by machines. IBM did not pay off the computers and drafters. In this case you could make a steady decent wage. My grandfather was trained in a classic drawing style (yes it was his main job).
He did not get into the profession to make money. He did it out of passion and died poor. Artists are not being tricked by the promise of wealth. You will get a cloned style if you can't afford the real artist making it and if the commission goes to a computer how is that not the same as plagerism by a human? Artists were not being paid well before. The anime industry has proven the endpoint of what happens to artists as a profession despite their skills. Chess still exists despite better play by machines. Art as a commercial medium has always been tainted by outside influences such as government, religion and pedophilia.
In the end, drawing wasn't going to survive in the age of vector art and computers. They are mainly forgettable jpgs you scroll past in a vast array like DeviantArt.
Sorry, but every one of your talking points — ‘computers were replaced’ , ‘chess is still being played’, etc. — and good counterarguments to them have been covered ad nauseam (and practically verbatim) by now.
Anyway, my point isn’t that ‘AI is evil and must be stopped’; it’s that it doesn’t feel ‘intellectually empowering’. I (in my personal work) can’t get anything done with ChatGPT that I can’t on my own, and with less frustration. We’ve created machines that can superficially mimic real work, and the world is going bonkers over it. The only magic power these systems have is sheer speed: they can output reams and reams of twaddle in the time it takes me to make a cup of tea. And no doubt those in bullshit jobs are soon going to find out.
My argument might not be what you expect from someone who is sad to see the way artists’ lives are going: if your work is truly capable of being replaced by a large language model or a diffusion model, maybe it wasn’t very original to begin with.
The sad thing is, artists who create genuinely superior work will still lose out because those financially enabling them will think (wrongly) that they can be replaced. And we’ll all be worse off.
I definitely feel more empowered, and making imperfect art and generating code that doesn't work and proofreading it is definitely changing people's lives. Which specific artist are you talking about who will suffer? Many of the ones I talk to are excited about using it.
You keep going back to value and finances. The less money is in it the better. Art isn't good because it's valuable, unless you were only interested in it commercially.
Art isn't good because it's valuable, unless you were only interested in it commercially.
Of course not; I’m certainly not suggesting so. But I do think money is important because it is what has enabled artists to do what they do. Without any prospect of monetising one’s art, most of us (and I’m not an artist) would be out working in the potato fields, with very little time to develop skills.
Slaves.
Yes, but that’s clearly not what I’m getting at.
inferior by every metric other than quantity
And the metric of "beating most of our existing metrics so we had to rewrite the metrics to keep feeling special, but don't worry we can justify this rewriting by pointing at Goodhart's law".
The only reason the question of compensating people for their input into these models even matters is specifically because the models are, in actual fact, good. The bad models don't replace anyone.
beating most of our existing metrics so we had to rewrite the metrics to keep feeling special
This is needlessly provocative, and also wrong. My metrics have been the same from the very beginning (i.e. ‘can it even come close to doing my work for me?’). This question may yet come to evaluate to ‘yes’, but I think you seriously underestimate the real power of these models.
The only reason the question of compensating people for their input into these models even matters is specifically because the models are, in actual fact, good.
No. They don’t need to be good, they simply need to fool people into thinking they’re good.
And before you reflexively rebut with ‘what’s the difference?’, let me ask you this: is the quality of a piece of work or the importance of a job and all of its indirect effects always immediately apparent? Is it possible for managers to short term cost-cut at the expense of the long term? Is it conceivable that we could at some point slip into a world in which there is no funding for genuinely interesting media anymore because 90% of the population can’t distinguish it? The real danger of genAI is that it convinces non-experts that the experts are replaceable when the reality is utterly different. In some cases this will lead to serious blowups and the real experts will be called back in, but in more ambiguous cases we’ll just quietly lose something of real value.
thought should feel empowered?
This is a strange question since augmentation can be objectively measured even as its utility is contextual. With MidJourney I do not feel augmented because while it makes pretty images, it does not make precisely the pretty images I want. I find this useless, but for the odd person who is satisfied only with looking at pretty pictures, it might be enough. Their ability to produce pretty pictures to satisfaction is thus augmented.
With GPT4 and Copilot, I am augmented in a speed instead of capabilities sense. The set of problems I can solve is not meaningfully enhanced, but my ability to close knowledge gaps is. While LLMs are limited in their global ability to help design, architect or structure the approach to a novel problem or its breakdown, they can tell local tricks and implementation approaches I do not know but can verify as correct. And even when wrong, I can often work out how to fix their approach (this is still a speed up since I likely would not have arrived at this solution concept on my own). This is a significant augmentation even if not to the level I'd like.
The reason capabilities are not much enhanced is to get the most out of LLMs, you need to be able to verify solutions due to their unreliability. If a solution contains concepts you do not know, the effort to gain the knowledge required to verify the approach (which the LLM itself can help with) needs to be manageable in reasonable time.
With GPT4 and Copilot…
I am not a programmer, so none of this applies to me. I can only speak for myself, and I’m not claiming that no one can feel empowered by these tools - in fact it seems obvious that they can.
I think programmers tend to assume that all other technical jobs can be attacked in the same way, which is not necessarily true. Writing code seems to be an ideal use case for LLMs, especially given the volume of data available on the open web.
Empowering to their users. A lot of things that empower their users necessarily disempower others, especially if we define power in a way that is zero-sum - the printing press disempowered monasteries and monks that spent a lifetime perfecting their book-copying craft (and copied books that no doubt were used in the training of would-be printing press operators in the process, too).
It seems to me that the standard use of "empowering" implies in particular that you get more power for less effort - which in many cases tends to be democratizing, as hard-earned power tends to be accrued by a handful of people who dedicate most of their lives to pursuit of power in one form or another. With public schooling and printing, a lot of average people were empowered at the expense of nobles and clerics, who put in a lifetime of effort for the power literacy conveys in a world without widespread literacy. With AI, likewise, average people will be empowered at the expense of those who dedicated their life to learn to (draw, write good copy, program) - this looks bad because we hold those people in high esteem in a world where their talents are rare, but consider that following that appearance is analogously fallacious to loathing democratization of writing because of how noble the nobles and monks looked relative to the illiterate masses.
I get why you might describe these tools as ‘democratising’, but it also seems rather strange when you consider that the future of creativity is now going to be dependent on huge datasets and amounts of computation only billion-dollar companies can afford. Isn’t that anything but democratic? Sure, you can ignore the zeitgeist and carry on with traditional dumb tools if you like, but you’ll be utterly left behind.
"grateful that literacy, the printing press, computers and the internet became normalised before this notion of "harm" and harm prevention was"
Printing Press -> Reformation -> Thirty Years' War -> Millions Dead
I'm sure that there were lots of different opinions at the time about what kind of harm was introduced by the printing press and what to do about it, and attempts to control information by the Catholic church etc.
The current fad for 'safe' 'AI' is corporate and naive. But there's no simple way to navigate a revolutionary change in the way information is accessed / communicated.
Way to blame the printing press for the actions of religious extremists.
The lesson isn't. printing press bad, it's extremist irrational belief in any entity is bad (whether it's religion, Trump, etc.).
The printing press is the leading cause of tpyos!
It's not about assigning blame. A revolutionary technology enables revolutionary change and all sorts of bad actors will take advantage of it.
Way to blame the printing press for the actions of religious extremists.
I don't see GP blaming the printing press for that, they're merely pointing out that one enabled the other, which is absolutely true. I'm damn near a free speech absolutist, and I think the heavy "safety" push by AI is well-meaning but will have unintended consequences that cause more harm than they are meant to prevent, but it seems obvious to me that they can be used much the same as printing presses were by the extremists.
The lesson isn't. printing press bad, it's extremist irrational belief in any entity is bad (whether it's religion, Trump, etc.).
Could not agree more
Safetyism is the standard civic religion since 9/11 and I doubt it will go quietly into the night. Much like the bishops and the king had a symbiotic relationship to maintain control and limit change (e.g., King James of KJV Bible fame), the government and corporations have a similarly tense, but aligned relationship. Boogeymen from the left or the right can always be conjured to provide the fear necessary to control
Would millions have died if the old religion gave way to the new one without a fight? The problem for the Vatican was that their rhetoric wasn't at top form after mentally stagnating for a few centuries since arguing with Roman pagans, so war was the only possibility to win.
(Don't forget Luther's post hoc justification of killing 100k+ peasants, but he won because he had better rhetorical skills AND the backing of aristocrats and armies. https://en.wikipedia.org/wiki/Against_the_Murderous,_Thievin... and https://en.wikipedia.org/wiki/German_Peasants%27_War)
"Think of the Children" has been the norm since long before it was re-popularized in the 80s for song lyrics, in the 90s encryption, and now everything else.
I almost think it's the eras between that are more notable.
"The Coddling of the American Mind" by Jonathan Haidt and Greg Lukianoff is a very good (and troubling) book that talks a lot about "safetyism". I can't recommend it enough.
https://www.betterworldbooks.com/product/detail/the-coddling...
https://www.audible.com/pd/The-Coddling-of-the-American-Mind...
I agree. There should have been guardrails in place to prevent people who espouse extremist viewpoints like Martin Luther from spreading their dangerous and hateful rhetoric. I rest easy knowing that only people with the correct intentions will be able to use AI.
The current focus on "safety" (I would prefer a less gracious term) are based as much on fear as on morality. Fear of government intervention and woke morality. The progress in technology is astounding, the focus on sabotaging then publicly available versions of the technology to promote (and deny) narratives is despicable.
I feel like this analogy is not very appropriate. The main problem with AI generated images and videos is that, with every improvement, it becomes more and more difficult to distinguish what's real and what's not. That's not something that happened with literacy or printing press or computers.
Think about it: the saturation of content on the Internet has become so bad that people are having a hard time knowing what's true or not, to the point that we're having again outbreaks of preventable diseases such as measles because people can't identify what's real scientific information and what's not. Imagine what will happen when anyone can create an image of whatever they want that looks just like any other picture, or worse, video. We are not at all equipped to deal with that. We are risking a lot just for the ability to spend massive amounts of compute power on generating images. It's not curing cancer, not solving world hunger, not making space travel free, no: it's generating images.
I don't understand. Are you saying that before AI there was a reliable way to distinguish fiction from factual?
It definitely is easier without AI. Before, if you saw a photo you could be fairly confident that most of it was real (yes, photo manipulation exists but you can't really create a photo out of nothing). Videos, far more trustworthy (and yes, I know that there's some amazing 3D renders out there but they're not really accessible). With these technologies and the rate at which they're improving, I feel like that's going out of the window. Not to mention that the more content that is generated, the easier it is that something slips by despite being fake.
"it becomes more and more difficult to distinguish what's real and what's not" - Is literally what they said.
British banned printing press in 1662 in the name of the harm
https://en.m.wikipedia.org/wiki/Licensing_of_the_Press_Act_1...
Yes, and fortunately that banning was the end of hateful printed content. Since that ban, the only way to print objectionable material has been to do it by hand with pen and ink.
(For clarity, I'm joking, and I know you're also not implying any such thing. I appreciate your comment/link)
Harm prevention is definitely not new; books have been subject to censorship for centuries. Just look at the U.S., where we had the Hays code and the Comic Code Authority. The only difference is that now, Harm is defined by California tech companies rather than the Church or the Monarchy.
The core problem is centralization of control. If everyone uses their own desktop computer, then everyone is responsible for their own behavior.
If everyone uses Hosting Service F, then at some point people will blur the lines and expect "Hosting Service F" to remove vulgar or offensive content. The lines themselves will be a zeitgeist of sorts with inevitable decisions that are acceptable to some but not all.
Can you even blame them? There are lots of ways for this to go wrong and noone wants to be on the wrong side of a PR blast.
So heavy guardrails are effectively inevitable.
I don't think your golden age ever truly existed — the Overton Window for acceptable discourse has always been narrow, we've just changed who the in-group and out-groups are.
The out group used to be atheists, or gays, or witches, or republicans (in the British sense of the word), or people who want to drink. And each of Catholics and Protestants made the other unwelcome across Europe for a century or two. When I was a kid, it was anyone who wanted to smoke weed, or (because UK) any normalised depiction of gay male relationships as being at all equivalent to heterosexual ones[0]. I met someone who was embarrassed to admit they named their son "Hussein"[1], and absolutely any attempt to suggest that ecstasy was anything other than evil. I know at least one trans person who started out of the closet, but was very eager to go into the closet.
[0] "promote the teaching in any maintained school of the acceptability of homosexuality as a pretended family relationship" - https://en.wikipedia.org/wiki/Section_28
I really wonder what harm would come to the company if they didn't talk about safety?
Would investors stop giving them money? Would users sue that they now had PTSD after looking at all the 'unsafe' outputs? Would regulators step in and make laws banning this 'unsafe' AI?
What is it specifically that company management is worried about?
They're attempting to guard themselves against incoming regulation. The big players, such as Microsoft, want to squash Stable Diffusion while protecting themselves, and they're going to do it by wielding the "safety is important and only we have the resources to implement it" hammer.
Safety is a very real concern, always has been in ML research. I'm tired of this trite "they want a moat" narrative.
I'm glad tech orgs are for once thinking about what they're building before putting out society-warping democracy-corroding technology instead of move fast break things.
Safety from what? Human anatomy?
See the recent Taylor Swift scandal. Safety from never ending amounts of deepfake porn and gore for example.
This isn't a valid concern in my opinion. Photo manipulation has been around for decades. People have been drawing other people for centuries.
Also, where do we draw the line? Should Photoshop stop you from manipulating human body because it could be used for porn? Why stop there, should text editors stop you from writing about sex or describing human body because it could be used for "abuse". Should your comment be removed because it make me imagine Taylor Swift without clothes for a brief moment?
Doing it effortlessly and instantly makes a difference.
(This applies to all AI discussions)
That's fine. But the question was what are they referring to and that's the answer.
No, but AI requires zero learning curve and can be automated. I can't spit out 10 images of Tay per second in photoshop. If I want and the API delivers I can easily do that with AI. (Given, would one becoding this it requires a learning curve, but in principal with the right interface and they exist i can churn out hundreds of images without me actively putting work in)
See the recent Taylor Swift scandal
but that's not dangerous. It's definitely worthy of unlocking the cages of the attack lawyers but it's not dangerous. The word "safety" is being used by big tech to trigger and gas light society.
I.e., controlling through fear
That would make sense if it was in the slightest about avoiding "society-warping democracy-corroding technology". Rather than making sure no one ever sees a naked person which would cause governments to come down on them like a ton of bricks.
To the extent these models don't blindly regurgitate hate speech, I appreciate that. But what I do not appreciate is when they won't render a human nipple or other human anatomy. That's not safety, and calling it such is gaslighting.
It doesn't strike you as hypocritical that they all talk about safety while continuing to push out tech that's upending multiple industries as we speak? It's tough for me to see it as anything other than lip service.
I'd be on your side if any of them actually chose to keep their technology in the lab instead of tossing it out into the world and gobbling up investment dollars as fast as they could.
AI/ML/GPT/etc are looking increasingly like other media formats -- a source of mass market content.
The safety discussion is proceeding very much like it did for movies, music, and video games.
As the leader in open image models it is incumbent upon us as the models get to this level of quality to take seriously how we can release open and safe models from a legal, societal and other considerations.
Not engaging in this will indeed lead to bad laws, sanctions and more as well as not fulfilling our societal obligations of ensuring this amazing technology is used for as positive outcomes as possible.
Stability AI was set up to build benchmark open models of all types in a proper way, this is why for example we are one of the only companies to offer opt out of datasets (stable cascade and SD3 are opted out), have given millions of supercompute hours in grants to safety related research and more.
Smaller players with less uptake and scrutiny don't need to worry so much about some of these complex issues, it is quite a lot to keep on top of, doing our best.
“We need to enforce our morality on you, for our beliefs are the true ones — and you’re unsafe for questioning them!”
You sound like many authoritarian regimes.
I mean open models yo
it is incumbent upon us as the models get to this level of quality to take seriously how we can release open and safe models from a legal, societal and other considerations.
Can you define what you mean by "societal and other considerations"? If not, why not?
I could but I won't as legal stuff :)
All of the above! Additionally... I think AI companies are trying to steer the conversation about safety so that when regulations do come in (and they will) that the legal culpability is with the user of the model, not the trainer of it. The business model doesn't work if you're liable for harm caused by your training process - especially if the harm is already covered by existing laws.
One example of that would be if your model was being used to spot criminals in video footage and it turns out that the bias of the model picks one socioeconomic group over another. Most western nations have laws protecting the public against that kind of abuse (albeit they're not applied fairly) and the fines are pretty steep.
They have already used "AI" with success to give people loans and they were biased. Nothing happened legally to that company.
Likely public condemnation followed by unreasonable regulations when populists see their campaign opportunities. We've historically seen this when new types of media (e.g. TV, computer games) debut and there are real, early signals of such actions.
I don't think those companies being cautious is necessarily a bad thing even for AI enthusiasts. Open source models will quickly catch up without any censorship while most of those public attacks are concentrated into those high profile companies, which have established some defenses. That would be a much cheaper price than living with some unreasonable degree of regulations over decades, driven by populist politicians.
What is it specifically that company management is worried about?
As with all hype techs, even the most talented management are barely literate in the product. When talking about their new trillion $ product they must take their talking points from the established literature and "fake it till they make it".
If the other big players say "billions of parameters" you chuck in as many as you can. If the buzz words are "tokens" you say we have lots of tokens. If the buzz words are "safety" you say we are super safe. You say them all and hope against hope that nobody asks a simple question you are not equipped to answer that will show you dont actually know what you are talking about.
It's a bit rich when HN itself is chock full with camp followers who pick the most mainstream opinion. Previously it was AI danger, then it became hallucinations, now it's that safety is too much.
The rest of the world is also like that. You can make a thing that hurts your existing business. Spinning off the brand is probably Google's best bet.
they risk reputational harm and since there's so many alternatives outright "brand cancellation". For example, vocal groups can lobby payment processors to deny service to any AI provider deemed unworthy. Ironic that tech enabled all of that behavior to begin with and now they're worried about it turning on them.
The latter; there is already an executive order around AI safety. If you don't address it out loud you'll draw attention to yourself.
https://www.whitehouse.gov/briefing-room/presidential-action...
IMO the "safety" in Stable Diffusion is becoming more overzealous where most of my images are coming back blurred, where I no longer want to waste my time writing a prompt only for it to return mostly blurred images. Prompts that worked in previous versions like portraits are coming back mostly blurred in SDXL.
If this next version is just as bad, I'm going to stop using Stability APIs. Are there any other text-to-image services that offer similar value and quality to Stable Diffusion without the overzealous blurring?
Edit:
Example prompt's like "Matte portrait of Yennefer" return 8/9 blurred images [1]
Run it locally.
I haven't tried SD3, but my local SD2 regularly has this pattern where while the image is developing it looks like it's coming along fine and then suddenly in the last few rounds it introduces weird artifacts to mask faces. Running locally doesn't get around censorship that's baked into the model.
I tend to lean towards SD1.5 for this reason—I'd rather put in the effort to get a good result out of the lesser model than fight with a black box censorship algorithm.
EDIT: See the replies below. I might just have been holding it wrong.
Do you use the proper refiner model?
Probably not, since I have no idea what you're talking about. I've just been using the models that InvokeAI (2.3, I only just now saw there's a 3.0) downloads for me [0]. The SD1.5 one is as good as ever, but the SD2 model introduces artifacts on (many, but not all) faces and copyrighted characters.
EDIT: based on the other reply, I think I understand what you're suggesting, and I'll definitely take a look next time I run it.
SDXL should be used together with a refiner. You can usually see the refiner kicking in if you have a UI that shows you the preview of intermediate steps. And it can sometimes look like the situation you describe (straining further away from your desired result).
Same goes for upscalers, of course.
Be sure to turn off the refiner. This sounds like you’re making models that aren’t aligned with their base models and the refiner runs in the last steps. If it’s a prompt out of alignment with the default base model it’ll heavily distort. Personally with SDXL I never use the refiner I just use more steps.
SD2 isn't SDXL. SD2 was a continuation of the original models that didn't see much success. It didn't have a refiner.
That makes sense. I'll try that next time!
Don't expect my current desktop will be able to handle it, which is why I'm happy to pay for API access, but my next Desktop should be capable.
Is the OSS'd version of SDXL less restrictive than their API hosted version?
You can set up the same thing you would have locally on some spot cloud instance.
If you run into issues, switch to a fine-tuned model from civitai.
The nice thing about Stable Diffusion is that you can very easily set it up on a machine you control without any 'safety' and with a user-finetuned checkpoint.
they're nerfing the models, not just the prompt engineering.
After SD1.5 they started directly modifying the dataset.
it's only other users who "restore" the porno.
and that's what we're discussing. there's a real concern about it as a public offering.
Sure, but again if you run it yourself you can use the finetuned by users checkpoints that have it.
yes, but the GP is discussing the API, and specifically the company that offers the base model.
they both don't want to offer anything that's legally dubious and it's not hard to understand why.
No it’s not. It’s perfectly reasonable not to want to generate porn for customers.
The models being open sourced makes them very easy to turn into the most deprived porno machines ever conceived. And they are.
It is in no way a meaningful barrier to what people can do. That’s the benefit of open source software.
I've never seen blurring in my images. Is that something that they add when you do API access? I'm running SD 1.5 and SDXL 1.0 models locally. Maybe I'm just not prompting for things they deem naughty. Can you share an example prompt where the result gets blurred?
It's a filter they apply after generation.
If you run locally with the basic stack it’s literally a bool flag to hide nsfw content. It’s trivial to turn off and off by default in most open source setups.
I don't use it at all but do you mind sharing what prompts don't work?
Last prompt I tried was "Matte portrait of Yennefer" returned 8/9 blurred images [1]
Wait, blurring (black) means that it objected to the content? I tried it a few times on one of the online/free sites (Huggingspace, I think) and I just assumed I'd gotten a parameter wrong.
Not necessarily, but it can. Black squares can come from a variety of problems.
Taking the actual example you provided, I can understand the issue. Since it amounts to blurring images of a virtual character, that are not actually "naughty." Equivalent images in bulk quantity are available on every search engine with "yennefer witcher 3 game" [1][2][3][4][5][6] Returns almost the exact generated images, just blurry.
[1] Google: https://www.google.com/search?sca_esv=a930a3196aed2650&q=yen...
[2] Bing via Ecosia: https://www.ecosia.org/images?q=yennefer%20witcher%203%20gam...
[3] Bing: https://www.bing.com/images/search?q=yennefer+witcher+3+game...
[4] DDG: https://duckduckgo.com/?va=e&t=hj&q=yennefer+witcher+3+game&...
[5] Yippy: https://www.alltheinternet.com/?q=yennefer+witcher+3+game&ar...
[6] Dogpile: https://www.dogpile.com/serp?qc=images&q=yennefer+witcher+3+...
It's really unfortunate that Silicon Valley ended up in an area that's so far left - and to be clear, it'd be just as bad if it was in a far right area too. Purple would have been nice, to keep people in check. 'Safety' seems to be actively making AI advances worse.
Silicon Valley is not "far left" by any stretch, which implies socialism, redistribution of wealth, etc. This is obvious by inspection.
I assume by far left, you mean progressive on social issues, which is not really a leftist thing but the groups are related enough that I'll give you a pass.
Silicon valley techies are also not socially progressive. Read this thread or anything published by Paul Graham or any of the AI leaders for proof of that.
However most normal city people are. A large enough percent of the country that big companies that want to make money feel the need to appeal to them.
Funnily enough, what is a uniquely Silicon Valley political opinion is valuing the progress of AI over everything else
when i think of "far left" i think of an authoritative regime disguised as serving the common good and ready to punish and excommunicate any thought or action deemed contrary to the common good. However, the regime defines "common good" themselves and remains in power indefinitely. In that regard, SV is very "far left". At the extremes far-left and far-right are very similar when you empathize as a regular person on the street.
That's just not what that term means.
It’s not right wing unless they sit on the right side of the National Assembly and support Louis XVI.
Well, you're wrong.
Techies are socially progressive as a whole. Yes there are some outliers, and tech leaders probably aren't as far left socially as the ground level workers.
I disagree techies are socially progressive as a whole; there is very minimal, almost no push for labor rights or labor protection even though our group is disproportionately hit with abusing employees under the visa program.
Labor protections are generally seen as a fiscal issue, rather than a social one. E.g. libertarians would usually be fine with gay rights but against greater labor regulation.
I wish :/, I really do
I find them in general to not be Republican and all the baggage that entails but the typical techie I meet is less concerned with social issues than the typical city Democrat.
If I can speculate wildly, I think it is because tech has this veneer of being an alternative solution to the worlds problems, so a lot of techies believe that advancing of tech is both the most important goal and also politically neutral. And also, now that tech is a uniquely profitable career, the types of people that would be in business majors are now CS majors. Ie. those that are mainly interested in getting as much money as possible for themselves.
indeed they are not really left but neoliberals with a leftist aesthetic, just like most republicans are neoliberals with a conservative aesthetic.
Put in any historical or political context SV is in no way left. They're hardcore libertarian. Just look at their poster boys, Elon Musk, Peter Thiel, and a plethora of others are very oriented towards totalitarianism from the right. Just because they blow their brains out on lsd and ketamine and go on 2 week spiritual retreats doesn't make them leftists. They're billionares that only care about wealth and power, living in segregated communities from the common folk of the area - nothing lefty about that.
Elon Musk and Peter Thiel are two of the most hated people in tech, so this doesn't seem like a compelling example. Also I don't think Elon Musk and Peter Thiel qualify as "hardcore libertarian." Thiel was a Trump supporter (hardly libertarian at all, let alone hardcore) and Elon has supported Democrats and much government his entire life until the last few years. He's mainly only waded into "culture war" type stuff that I can think of. What sort of policies has Elon argued for that you think are "hardcore libertarian?"
He wanted to replace public transport with a system where you don't have to ride the public transport with the plebs, he want's to colonize mars with the best minds (equal most money for him), he built a tank for urban areas. He promotes free speech even if it incites hate, he likes ayn rand, he implies government programs calling for united solutions is either communism, orwell or basically hitler. He actively promotes the opinion of those that pay above others on X.
Thank you, truly, I appreciate the effort you put in to list those. It helps me understand more where you're coming from.
He wanted to replace public transport with a system where you don't have to ride the public transport with the plebs
I don't think this is any more libertarian than kings and aristocrats of days past were. I know a bunch of people who ride public transit in New York and San Francisco who would readily agree with this, and they are definitely not libertarian. If anything it seems a lot more democratic since he wants it to be available to everyone
he want's to colonize mars with the best minds (equal most money for him)
This doesn't seem particularly "libertarian" either, excepting maybe the aspect of it that is highly capitalistic. That point I would grant. But you could easily be socialist and still support the idea of colonizing something with the best minds.
he built a tank for urban areas.
I admit I don't know anything about this one
He promotes free speech even if it incites hate
This is a social libertarian position, although it's completely disconnected from economic libertarianism. I have a good friend who is a socialist (as in wants to outgrow capitalism such as marx advocated) who supports using the state to suppress capitalist activity/"exploitation", and he also is a free speech absolutist.
he likes ayn rand
That's a reasonable point, although I think it's worth noting that there are plenty of hardcore libertarians who hate ayn rand.
he implies government programs calling for united solutions is either communism, orwell or basically hitler.
Eh, lots of republicans including Trump do the same thing, and they're not libertarian. Certainly not "hardcore libertarian"
He actively promotes the opinion of those that pay above others on X.
This could be a good one, although Google, Meta, Reddit, Youtube, and any other company that runs ads or has "sponsored content" is doing the same thing, so we would have to define all the big tech companies as "hardcore libertarian" to stay consistent.
Overall I definitely think this is a hard debate to have because "hardcore libertarian" can mean different things to different people, and there's a perpetual risk of "no true scotsman" fallacy. I've responded above with how I think most people would imagine libertarianism, but depending on when in history you use it, many anarcho-socialists used the label for themselves yet today "libertarian" is a party that supports free market economics and social liberty. But regardless the challenges inherent, I appreciate the exchange
I don't think this is any more libertarian than kings and aristocrats of days past were. So very libertarian.
If anything it seems a lot more democratic since he wants it to be available to everyone No, he want's a solution that minimizes contact to other people and let you live in your bubble. This minimizes exposure to others from the same city and is a commercial system, not a publicly created one. Democratization would be a cheap public transport where you don't get mugged, proven to work in every european and most asian cities.
I admit I don't know anything about this one The cybertruck. Again a vehicle to isolate you from everyday life being supposed bulletproof and all.
lots of republicans including Trump do the same thing, and they're not libertarian They are all "little government, individual choice" - of course they feed their masters, but the kochs and co want exactly this.
Appreciate the exchange too, thanks for factbased formulation of opinions.
Musk main residence is a $50k house he rents in Boca Chica. Grimes wanted a bigger, nicer residence for her and their kids and that was one of the reasons she left him.
One of his many lies. https://www.wsj.com/articles/elon-musk-says-he-lives-in-a-50...
SV area far left? I wouldn't even regard the area as left leaning, at all.
I looked at Wikipedia and there seem to be no socialist representation.
Like, from an European perspective hearing that is ludicrous.
They are the worst kind of left, the "prudish and constantly offended left", not the "free healthcare and good government" left.
I'm glad I live in Norway, where state TV shows boobs and does offensive jokes without anyone really caring.
Prudish? San Francisco? The same city that has outdoor nude carnivals without any kind of age restrictions?
If by prudish you mean intolerant of hate speech, sure. But generally few will freak out over some nudity here.
College here is free. We also have free healthcare here, as limited as it is: https://en.wikipedia.org/wiki/Healthy_San_Francisco
Not sure what you mean by "offensive jokes", that could mean a lot of things...
We detached this subthread from https://news.ycombinator.com/item?id=39467056.
thank you, the thread looks so much nicer now with interesting technical details at the top
So far left the techies dont even have a labor union. You're a joke.
The obsession with safety in this announcement feels like a missed marketing opportunity, considering the recent Gemini debacle. Isn’t SD’s primary use case the fact that you can install it on your own computer and make what you want to make?
At some point they have to actually make money, and I don't see how continuously releasing the fruits of their expensive training for people to run locally on their own computer (or a competing cloud service) for free is going to get them there. They're not running a charity, the walls will have to go up eventually.
Likewise with Mistral, you don't get half a billion in funding and a two billion valuation on the assumption that you'll keep giving the product away for free forever.
But there are plenty of other business models available for open source projects.
I use Midjourney a lot and (based on the images in the article) it’s leaps and bounds beyond SD. Not sure why I would switch if they are both locked down.
Is it possible to fine-tune Midjourney or produce a LORA?
No. You can provide a photos to merge though.
Sorry I don’t know what that means, but a quick google shows some results about it.
SD would probably be a lot better if they didn't have to make sure it worked on consumer GPUs. Maybe this announcement is a step towards that where the best model will only be able to be accessed by most using a paid service.
Ironically their over sensitive nsfw image detector in their api caused me to stop using it and run it locally instead. I was using it to render animations of hundreds of frames but when every 20th to 30th image comes out blurry it ruins the whole animation and it would double the cost or more to rerender it with a different seed hoping to not trigger the over zealous blurring.
I don’t mind that they don’t want to let you generate nsfw images but their detector is hopelessly broken, it once censored a cube, yes a cube...
Unfortunately their financial and reputational incentives are firmly aligned with preventing false negatives at the cost of a lot of false positives.
Unfortunately I don't want to pay for hundreds if not thousands of images I have to throw away because it decided some random innocent element is offensive and blurs the entire image.
Here is the red cube it censored because my innocent eyes wouldn't be able to handle it; https://archerx.com/censoredcube.png
What they are achieving with the over zealous safety issues are driving developers to on demand GPU hosts that will let them host their own models, which also opens up a lot more freedom. I wanted to use the stability AI api as my main source for Stable Diffusion but they make it really really hard especially if you want use it as part of your business.
Open source models can be fine-tuned by the community if needed.
I would much rather have this than a company releasing models this size into the wild without any safety checks whatsoever.
Could you list the concrete "safety checks" that you think prevents real-world harm? What particular image that you think a random human will ask the AI to generate, which then leads to concrete harm in the real world?
If 1 in 1,000 generations will randomly produce memorized CSAM that slipped into the training set then yeah, it's pretty damn unsafe to use. Producing memorized images has precedent[0].
Is it unlikely? Sure, but worth validating.
Why not run the safety check on the training data?
Okay, by "safety checks" you meant the already unlawful things like CSAM, but not politically-overloaded beliefs like "diversity"? The latter is what the comment[1] you were replying to was referring to (viz. "considering the recent Gemini debacle"[2]).
Not even the large companies will explain with precision their implementation of safety.
Until then, we must view this “safety” as both a scapegoat and a vector for social engineering.
“Photo of a red sphere on top of a blue cube. Behind them is a green triangle, on the right is a dog, on the left is a cat”
https://pbs.twimg.com/media/GG8mm5va4AA_5PJ?format=jpg&name=...
One thing that jumps out to me is that the white fur on the animals has a strong green tint due to the reflected light from the green surfaces. I wonder if the model learned this effect from behind the scenes photos of green screen film sets.
The models do a pretty good job at rendering plausible global illumination, radiosity, reflections, caustics, etc. in a whole bunch of scenarios. It's not necessarily physically accurate (usually not in fact), but usually good enough to trick the human brain unless you start paying very close attention to details, angles, etc.
This fascinated me when SD was first released, so I tested a whole bunch of scenarios. While it's quite easy to find situations that don't provide accurate results and produce all manner of glitches (some of which you can use to detect some SD-produced images), the results are nearly always convincing at a quick glance.
It's just diffuse irradiance, visible in most real (and CGI) pictures although not as obvious as that example. Seems like a typical demo scene for a 3D renderer, so I bet that's why it's so prominent.
I think you have to conceptualize how diffusion models work, which is that once the green triangle has been put into the image in the early steps, the later generations will be influenced by the presence of it, and fill in fine details like reflection as it goes along.
The reason it knows this is that this is how any light in a real photograph works, not just CGI.
Or if your prompt was “A green triangle looking at itself in the mirror” then early generation steps would have two green triangle like shapes. It doesn’t need to know about the concept of light reflection. It does know about composition of an image based on the word mirror though.
Interesting is that Left and right taken from viewer’s perspective instead of red sphere’s perspective
How do you know which way the red sphere is facing? A fun experiment would be to write two prompts for "a person in the middle, a dog to their left, and a cat to their right", and have the person either facing towards or away from the viewer.
Not bad, I'm curious of the output if you ask for a mirrored sphere instead.
This is actually the approach of one paper to estimate lighting conditions. Their strategy is to paint a mirrored sphere onto an existing image: https://diffusionlight.github.io/
That's very impressive!
It is! This isn't something orevious models could do.
That's _amazing_.
I imagine this doesn't look impressive to anyone unfamiliar with the scene, but this was absolutely impossible with any of the older models. Though, I still want to know if it reliabily does this--so many other things are left to chance, if I need to also hit a one-in-ten chance of the composition being right, it still might not be very useful.
What was difficult about it?
"When in doubt, scale it up." - openai.com/careers
That's nice, but could we please have an unsafe alternative? I would like to footgun both my legs off, thank you.
Just wait some time. People release SD loras all the time. Once SD3 is open, you'll be able to get a patched model in days/weeks.
A blogger I follow had an article explaining that the NSFW models for SDXL, are just now SORT OF coming up to the quality of SD1.5 “pre safety” models.
It’s been 6 months and it still isn’t there. SD3 is going to be quite awhile if they’re baking “safety” in even harder.
1.5 is still more popular than xl and 2 for reasons unrelated to safety. The size and generation speed matter a lot. This is just a matter of practical usability, not some idea of the model being locked down. Feed it enough porn and you'll get porn out of it. If people have incentive to do that (better results than 1.5), it really will happen within days.
Due to the pony community the SDXL nsfw models are far superior to SD1.5. Only issue is that controlnets don’t work with that pony SDXL fine tune
Since these are open models, people can fine tune them to do anything.
It’s not obvious that fine-tuning can remove all latent compulsions from these models. Consider that the creators know that fine-tuning exists and have vastly more resources to explore the feasibility of removing deep bias using this method.
Go check out the Unstable Diffusion Discord.
How would that be meaningfully different to SDXL?
I mean, SDXL is great. Until you’ve had a chance to actually use this model, isn’t calling it out for some imagined offence that may or may not exist seems like you’re drinking some Kool-aid rather than responding to something based in concrete actual reality.
You get access to it… and it does the google thing and puts people of colour in every frame? Sure, complain away.
You get access to it, you can’t even generate pictures of girls? Sure. Burn the house down.
…you haven’t even seen it and you’re already bitching about it?
Come on… give them a chance. Judge what it is when you see it not what you imagine it is before you’ve even had a chance to try it out…
Lots of models, free, multiple sizes, hot damn. This is cool stuff. Be a bit grateful for the work they’re doing.
…and even if sucks, it’s open. If it’s not what you want, you can retune it.
This reinforces my impression that Google is at least one year behind. Stunning images, 3D, video while Gemini had to be partially halted this morning.
For "political" reasons, not for technical reasons. Don't get it twisted.
I would describe those issues as technical. It’s genuinely getting things wrong because the “safety” element was implemented poorly.
Those are safety elements which exist for political reasons, not technical ones.
You think that technology is first. You think that mathematicians and computer engineers or mechanical engineers or doctors are first. They’re very important, but they’re not first. They’re second. Now I’ll prove it to you.
There was a country that had the best mathematicians, the best physicists, the best metallurgists in the world. But that country was very poor. It’s called the Soviet Union. But when you took one of these mathematicians or physicists, who was smuggled out or escaped, put him on a plane and brought him to Palo Alto. Within two weeks, they were producing added value that could produce great wealth.
What comes first is markets. If you have great technology without markets, without a market-friendly economy, you’ll get nowhere. But if you have a market-friendly economy, sooner or later the market forces will give you the technology you want.
And that my friend, simply won't come from an office paralyzed by internal politics of fear and conformity. Don't get it twisted.
Of all criticism that could be leveled at Google, 'shipping a product and supporting it' being the only thing that matters seems fair.
Which takes all the behind the scenes steps, not just the technical ones.
I mean, it's kind of both? Making Nazis look diverse isn't just a political error, it's also a technical one. By default, showing Nazis should show them as they actually were.
I don't think that's a fair comparison because they're fulfilling substantially different niches. Gemini is a conversational model that can generate images, but is mainly designed for text. Stable Diffusion is only for images. If you compare a model that can do many things and a model that can only do images by how well they generate images, of course the image generation model looks better.
Stability does have an LLM, but it's not provided in a unified framework like Gemini is.
“Safety” = safe to our reputation. It’s insulting how they imply safety from “harm”.
So they should dash their company on the rocks of your empty moral positions about freedom?
should pens be banned because a talented artist could draw a photorealistic image of something nasty happening to someone real?
Photoshop and the likes (modern day's pens) should have an automatic check that you are not drawing porn, censor the image and report you to the authorities if it thinks it involves minors.
edit: yes it is sarcasm, though I fear somebody will think it is in fact the right way to go.
That's ridiculous. What about real pens and paintbrushes? Should they be mandated to have a camera that analyses everything you draw/write just to be "safe"?
Maybe we should make it illegal to draw or write anything without submitting it to the state for "safety" analysis.
I hope that's sarcasm.
Text editors and the likes (modern day's typewriters) should have an automatic check that you are not criticizing the government, censor the text and report you to the authorities if it thinks it an alternate political party.
Hopefully you are going to be absolutely shocked by the prospect of the above sentence. But as you can see, surveillance is a slippery slope. "Safety" is a very dangerous word because everybody wants to be "safe" but no one is really ready to define what "safe" actually means. The moment we start baking cultural / political / environmental preferences and biases in the tools we use to produce content, we allow other group of people with different views to use those "safeguards" to harm us or influence us in ways we might not necessarily like.
The safest notebook I can find is indeed a simple pen and paper because it does not know or care what is being written, it just does it's best regardless of how amazing or horrible the content is.
Safety is also safe for people trying to make use of the technology at scale for most benign usecases.
Want to install a plugin into Wordpress to autogenerate fun illustrations to go at the top of the help articles in your intranet? You probably don’t want the model to have a 1 in 100 chance of outputting porn or extreme violence.
I guess we do not know anything about the training dataset ?
It's ethical
Who decides what's ethical in this scenario? Is it some independent entity?
"Ethical"
The dataset is so ethical that it is actually just a press release and not generally available.
This is a good question - not only for the actual ethics of the training, but for the future of AI use for art. It's both gonna damage the livelyhood of many artists (me included, probably) but also make it accessibly to many more people. As long as the training dataset is ethical, I think fighting it is hard and pointless.
What data would you consider making the dataset unethical vs. ethical?
Cool but it's hard to keep getting "blown away" at this stage. The "incredible" is routine now.
At this point, the next thing that will blow me away is AGI at human expert level or a Gaussian Splat diffusion model that can build any arbitrary 3D scene from text or a single image. High bar, but the technology world is already full of dark magic.
Will ask it for immortality, endless wealth, and still get bored.
I would be a big fan of solid infographics or presentation slides. That would be very useful.
Is there a Guassian splat model that works without the "Structure from Motion" step to extract the point cloud? That feels a bit unsatisfying to me.
So... they should just stop?
We believe in safe, responsible AI practices. This means we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors. Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment. In preparation for this early preview, we’ve introduced numerous safeguards. By continually collaborating with researchers, experts, and our community, we expect to innovate further with integrity as we approach the model’s public release.
What exactly does this mean? Will we be able to see all of the "safeguards" and access all of the technology's power without someone else's restrictions on them?
For SDXL this meant that there were almost no NSFW (porn and similar) images included in the dataset, so the community had to fine-tune the model themselves to make it generate those.
The community would've had to do that anyway. The SD1.5-based NSFW models of today are miles ahead of those from just a year ago.
And the pony SDXL nsfw model is miles ahead of SD1.5 NSFW models. Thank you bronies!
No worries, the safeguards are only for the general public. Criminals will have no issues going around them. /s
Criminals? We dont care about those.
Think of childern! We must stop people from generating porn!
at this point perfect text would be a gamechanger if it can be solved
midjourney 6 can be completely photorealistic and include valid text, but also sometimes adds bad text. it's not much, but having to use an image editor for that is still annoying. for creating marketing material, getting perfect text every time and never getting bad text would be amazing
I wonder if we could get it to generate a layered output, to make it easy to change just the text layer. It already creates the textual part in a separate pass, right?
Current open source tools include pretty decent off the shelf segment anything based detectors. It leaves a lot to be desired, but you do layer-like operations automatically detecting certain concept and applying changes to them or, less commonly exporting the cropped areas. But not the content "beneath" the layers as they don't exist.
Which tools would you recommend for this kind of thing?
I would bet that Adobe is definitely salivating at that. Might not be for a long time but it seems like a no brainer once the technology can handle it. Just the last few years have been fast and I interacted with the JS landscape for a few years. It moves faster than Sonic and this tech iterates quick.
All the demo images are 'artwork'.
will the model also be able to produce good photographs, technical drawings, and other graphical media?
Photorealism is well within current capabilities. Technical drawings absolutely not. Not sure what other graphical media includes.
Yeah but try getting e.g. Dall-E 3 to do photorealism, I think they've RLHF'd the crap out of it in the name of safety.
Not sure what other graphical media includes.
I'd want a model that can draw website designs and other UIs well. So I give it a list of things in the UI, and I get back a bunch of UI design examples with those elements.
Photographs, digital illustrations, comic or cartoon style images, whatever graphical style you can imagine are all easy to achieve with current models (though no single model is a master of all trades). Things that look like technical drawings are as well, but don't expect them to make any sense engineering-wise unless maybe if you train a finetune specifically for that purpose.
Does anyone know of a good tutorial on how diffusion models work?
https://jalammar.github.io/illustrated-stable-diffusion/
His whole blog is fantastic. If you want more background (e.g. how transformers work) he's got all the posts you need
This looks nice, thank you, but I'm looking for a more hands-on tutorial, with e.g. Python code, like Andrej Karpathy makes them.
fast.ai has a whole free course
https://www.youtube.com/watch?v=_7rMfsA24Ls https://course.fast.ai/Lessons/part2.html
I liked this 18 minute video ( https://www.youtube.com/watch?v=1CIpzeNxIhU ). Computerphile has other good videos with people like Brian Kernighan.
I'm curious to know if they're safeguards are eliminated when users find tune the model?
There are some VERY nsfw model fine tunes available for other versions of SD
such as?
Check out civitai.com for finetuned models for a wide range of uses
I believe you need to be signed in to see the NSFW stuff, for what it's worth.
People in this discussion seem to be hand-wringing about Stability's "saftey" comments but every model they've released has been fine tuned for porn in like 24 hours.
That's not entirely true. This wasn't the case for SD 2.0/2.1, and I don't think SD 3.0 will be available publicly for fine tuning.
2 is not popular because people have better quality results with 1.5 and xl. That's it. If 3 is released and works better, it will be fine tuned too.
SD 2 definitely seems like an anomaly that they've learned from though and was hard for everyone to use for various reasons. SDXL and even Cascade (the new side-project model) seems to be embraced by horny people.
This preview phase, as with previous models, is crucial for gathering insights to improve its performance and safety ahead of an open release.
oh, for fuck's sake.
We did this for every stable diffusion release, you get the feedback data to improve it continuously ahead of open release.
I was referring to 'safety'. how the hell can an image generation model be dangerous? we had software for editing text, images, videos and audio for half a century now.
Advertisers will cancel you if you do anything they don't like, 'safety' is to prevent that.
So, they just announced StableCascade.
Wouldn't this v3 supersede the StableCascade work?
Did they announce it because a team had been working on it and they wanted to push it out to not just lose it as an internal project, or are there architectural differences that make both worthwile?
There's architectural differences, although I found Stable Cascade a bit underwhelming, while yes it can actual manage text, the text it does manage just looks like someone just wrote text over the image it doesn't feel integrated a lot of the time.
SD3 seems to be more towards SOTA, not sure why Cascade took so long to get out, seemed to be up and running months ago
I think of the SD3 as a further evolution of SD1.5/2/XL and StableCascade as a branching path. It is unclear which will be better in the long term, so why not cover both directions if they have the resources to do so?
The example images look so bad. Absolutely zero artistic value.
From a technical perspective they are impressive. The depth of field in the classroom photo and the macro shot. The detail in the chameleon. The perfect writing in very different styles and fonts. The dust kicked up by the donut.
The artistic value is something you have to add with a good prompt with artistic vision. These images are probably the AI equivalent of "programmer art". It fulfills its function, but lacks aesthetic considerations. I wouldn't attribute that to the model just yet.
I'm willing to bet that they are avoiding artistic images on purpose to not get any heat from artists feeling ripped off, which did happen previously.
I wish they put out the report already. Has anyone else published a preprint combining ideas similar to diffusion transformers and flow matching?
Pretty exciting indeed to see they used flow matching, which have been unpopular for the last few years
It'll be out soon, doing benchmark tests etc
Can we use it create SORA like videos?
No.
If we trained it with videos yes but need more GPUs for that.
Anyone knows which AI could be used to generate UI design elements ? (such as "generate a real estate app widget list") as well as the kind of prompts one would use to obtain good results ?
I'm only now investigating using AI to increase velocity in my projects, and the field is moving so fast, i'm a bit outdated.
v0 by Vercel could be worth a look: https://v0.dev
From the FAQ: "v0 is a generative user interface system by Vercel powered by AI. It generates copy-and-paste friendly React code based on shadcn/ui and Tailwind CSS that people can use in their projects"
If by design elements you include vector images, you could try https://www.recraft.ai/ or Adobe Firefly 2 - there's not a lot of vector work right now, so your choices are either the handful of vector generators, or just bite the bullet and use eg DALL-E 3 to generate raster images you convert to SVG/recreate by hand.
(The second is what we did for https://gwern.net/dropcap because the PNG->SVG filesizes & quality were just barely acceptable for our web pages.)
What's the best way to use SD (3 or 2) online? I can't run it on my PC and I want to do some experiments to generate assets for a POC videogame I'm working on. I pay MidJOurney and I woulnd't mind pay something like 5 or 10 dollars per month to experiment with SD, but I can't find anything.
poke around stablediffusion.fr and trending public huggingface spaces
I used Rundiffusion for a while before I bought a 4090, and I thought their service was pretty nice. You pay for time on a system of whatever size you choose, with whatever tool/interface you select. I think it's worth tossing a few bucks into it to try it out.
Can it make a picture of a woman chasing a bear?
The old one can't.
SD 1.5 (using RealisticVision 5.1, 20 steps, Euler A) spit out something technically correct (but hilarious) in just a few generations.
"a woman chasing a bear, pursuit"
It’ll be interesting to see what “safety” means in this case given the censorship in diffuser models nowadays. Look what’s happening with Gemini, it’s quite scary really how different companies have different censorship values
I’ve had some fair share of frustation with DallE as well when trying to generate weapon images for game assets. Had to tweak a lot of my prompt
it’s quite scary really how different companies have different censorship values
The fact that they have censorship values is scary. But the fact that those are different is better than the alternative.
The sample images are absolutely stunning.
Also, I was blown away by the "Stable Diffusion" written on the side of the bus.
Is it just me or is the stable diffusion bus image broken in the background? The bus back there does not look logical w.r.t placement and size relative to the sidewalk.
No model. Half of the announcement text is “we area really really responsible and safe, believe us.”
Kind of a dud for an announcement.
The company itself is about to go run out of money hence the Hail Mary at trying to get acquired
NSFW fine tune when? Or will "safety" win this time?
They need to release model first. Then it's will be fine-tuned.
The talk of "safety" and harm in every image or language model release is getting quite boring and repetitive. The reasons why it's there is obvious and there are known workarounds yet the majority of conversations seems to be dominated by it. There's very little discussion regarding the actual technology and I'm aware of the irony of mentioning this. Really wish I could filter out these sorts of posts.
Hopefuly it dies down soon but I doubt it. At least we don't have to hear garbage about "WHy doEs opEn ai hAve oPEn iN thE namE iF ThEY aReN'T oPEN SoURCe"
I hope the safety conversation doesn't die. The societal effects of these technologies are quite large, and we should be okay with creating the space to acknowledge and talk about the good and the bad, and what we're doing to mitigate the negative effects. In any case, even though it's repetitive, there exists someone out there on the Interwebs who will discover that information for the first time today (or whenever the release is), and such disclosures are valuable. My favorite relevant XKCD comic: https://xkcd.com/1053/
I notice they are avoiding images of people in the announcement.
I wonder if they are afraid of the same debacle as google AI and what they mean by "safety" is actually heavy bias against white people and their culture like what happened with Gemini.
Quite nice to see diffusion transformers [0] becoming the next dominant architecture on the generative media.
[0]: https://twitter.com/EMostaque/status/1760660709308846135
Impressive text in the images.
Horrible website, hijacks scrolling. I have my scrolling speed up with Chromium Wheel Smooth Scroller. This website's scrolling is extremely slow, so the extension is not working because they are "doing it wrong" TM and somehow hijack native scrolling and do something with it.
Didn't they released another model few days ago ?
So there is no license information yet?
So, they block all bad actors, but themselves?
What is with these names haha, Stable Diffusion XL 1.0 and now to Stable Diffusion 3??
Rewriting the "safety" part, but replacing the AI tool with an imaginary knife called Big Knife:
"We believe in safe, responsible knife practices. This means we have taken and continue to take reasonable steps to prevent the misuse of Big Knife by bad actors."
Can it generate an image of people without injecting insufferable diversity quotas into each image? If so then it’s the most advanced model on the internet right now!
No details in the announcement, is it still pixel size in = pixel size out?
Ugh, another startup(?) requiring Discord to use their product. :(
It is interesting to me that these diffusion image models are so much smaller than the LLMs.
The text/spelling part is a huge step forward
I truly wonder what "unsafe" scenarios an image generator could be used for? Don't we already have software that can do pretty much anything if a professional human is using it?
I would say the barrier to entry is stopping a lot of ‘candid’ unsafe behaviour. I think you allude to it yourself in implying currently it requires a professional to achieve the same results.
But giving that ability to _everyone_ will lead to a huge increase in undesirable and targeted/local behaviour.
Presumably it enables any creep to generate what they want by virtue of being able to imagine it and type it, rather than learn a niche skill set or employ someone to do it (who is then also complicit in the act)
"undesirable local behavior"
Why don't you just say you believe thought crime should be punishable?
I imagine they might talk about things like students making nudes of their classmates and distributing them.
Or maybe not. It's hard to tell when nobody seems to want to spell out what behaviors we want to prevent.
Would it be illegal for a student who is good at drawing to paint a nude picture of an unknowing classmate and distribute it?
If yes, why doesn't the same law apply to AI? If no, why are we only concerned about it when AI is involved?
Because AI lowers the barrier to entry; using your example, few people have the drawing skills (or the patience to learn them) or take the effort to make a picture like that, but the barrier is much lower when it takes five seconds of typing out a prompt.
Second, the tool will become available to anyone, anywhere, not just a localised school. If generating naughty nudes is frowned upon in one place, another will have no qualms about it. And that's just things that are about decency, then there's the discussion about legality.
Finally, when person A draws a picture, they are responsible for it - they produced it. Not the party that made the pencil or the paper. But when AI is used to generate it, is all of the responsibility still with the person that entered the prompt? I'm sure the T's and C's say so, but there may still be lawsuits.
Right, these are the same arguments against uncontrolled empowerment that I imagine mass literacy and the printing press faced. I would prefer to live in a society where individual freedom, at least in the cognitive domain, is protected by a more robust principle than "we have reviewed the pros and cons of giving you the freedom to do this, and determined the former to outweigh the latter for the time being".
You seem to be very confused about civil versus criminal penalties....
Feel free to make an AI model that does almost anything, though I'd probably suggest that it doesn't make porn of minors as that is criminal in most jurisdiction, short of that it's probably not a criminal offense.
Most companies are only very slightly worried about criminal offenses, they are far more concerned about civil trials. There is a far lower requirement for evidence. AI creator in email "Hmm, this could be dangerous". That's all you need to lose a civil trial.
Nah, I think it's a disagreement over whether a tool's maker gets blamed for evil use or the tool's user.
It's a similar argument over whether or not gun manufacturers should have any liability for their products being used for murder.
This is really only a debate in the US and only because it's directly written in the constitution. Pretty much no other product works that way.
Why do you figure I would be confused? Whether any liability for drawing porn of classmates is civil or criminal is orthogonal to the AI comparison. The question is if we would hold manufacturers of drawing tools or software, or purveyors of drawing knowledge (such as learn-to-draw books), liable, because they are playing the same role as the generative AI does here.
Because you seem to be very confused on civil liabilities in most products. Manufactures are commonly held liable for the users use of products, for example look at any number of products that have caused injury.
Are we on the same HN that bashes Facebook/Twitter/X/TikTok/ads because they manipulate people, spread fake news or destroyed attention span?
Photoshop also lowers that barrier of entry compared to pen and pencil. Paper also lowers the barrier compared to oil canvas.
Affordable drawing classes and YouTube drawing tutorials lower the barrier of entry as well.
Why on earth would manufacturers of pencils, papers, drawing classes, and drawing software feel responsible for censoring the result of combining their tool with the brain of their customer?
A sharp kitchen knife significantly lowers the barrier of entry to murder someone. Many murders are committed everyday using a kitchen knife. Should kitchen knife manufacturers blog about this every week?
I agree with your point, but I would be willing to bet that if knives were invented today rather than having been around awhile, they would absolutely be regulated and restricted to law enforcement if not military use. Hell, even printers, maybe not if invented today but perhaps in a couple years if we stay on the same trajectory, would probably require some sort of ML to refuse to print or "reproduce" unsafe content.
I guess my point is that I don't think we're as inconsistent as a society as it seems when considering things like knives. It's not even strictly limited to thought crimes/information crimes. If alcohol were discovered today , I have no doubt that it would be banned and made schedule I
Fun fact: Many scanners and photocopiers will detect that you're trying to scan/copy a banknote and will refuse to complete the scan. One of the ways is detecting the EURion Constellation.
https://en.wikipedia.org/wiki/EURion_constellation
Can you point to other crimes that are based on skill or effort?
IANAL but that sounds like harrassment, I assume the legality of that depends on the context (did the artist previously date the subject? lots of states have laws against harassment and revenge porn that seem applicable here [1]. are you coworkers? etc), but I don't see why such laws wouldn't apply to AI generated art as well. It's the distribution that's really the issue in most cases. If you paint secret nudes and keep them in your bedroom and never show them to anyone it's creepy, but I imagine not illegal.
I'd guess that stability is concerned with their legal liability, also perhaps they are decent humans who don't want to make a product that is primarily used for harassment (whether they are decent humans or not, I imagine it would affect the bottom line eventually if they develop a really bad rep, or a bunch of politicians and rich people are targeted by deepfake harassment).
[1] https://www.cagoldberglaw.com/states-with-revenge-porn-laws/...
^ a lot of, but not all of those laws seem pretty specific to photographs/videos that were shared with the expectation of privacy and I'm not sure how they would apply to a painting/drawing, and I certainly don't know how the courts would handle deepfakes that are indistinguishable from genuine photographs. I imagine juries might tend to side with the harassed rather than a bully who says "it's not illegal cause it's actually a deepfake but yeah i obviously intended to harass the victim"
Such activity is legal per Ashcroft v Free Speech Coalition (2002). Artwork cannot be criminalized because of the contents of it.
Artwork is currently criminalized because of its contents. You cannot paint nude children engaged in sex acts.
The case I literally just referenced allows you to paint nude children engaged in sex acts.
I appreciate you taking the time to lay that out, I was under the opposite impression for US law.
Students already share nudes every day.
Where are the Americans asking about Snapchat? If I were a developer at Scnapchat I could prolly open a few Blob Storage accounts and feed a darknet account big enough to live off of. You people are so manipulatable.
Students don’t share photorealistic renders of nude classmates getting gangbanged though
That's not even necessarily a bad thing (as a whole - individually it can be). Now, any leaked nudes can be claimed to be AI. That'll probably save far more grief than it causes.
What do you mean should be... it 100% is.
In a large number of countries if you create an image that represents a minor in a sexual situation you will find yourself on the receiving side of the long arm of the law.
If you are the maker of an AI model that allows this, you will find yourself on the receiving side of the long arm of the law.
Moreso, many of these companies operate in countries where thought crime is illegal. Now, you can argue that said companies should not operate in those countries, but companies will follow money every time.
I think it's pretty important to specify that you have to willingly seek and share all of these illegal items. That's why this is so sketch. These things are being baked with moral codes that'll _share_ the information, incriminating everyone. Like why? Why not just let it work and leave it up to the criminal to share their crimes? People are such authoritarian shit-stains, and acting like their existence is enough to justify their stance is disgusting.
This is not obvious at all when it comes to AI models.
Yes, but this is a different conversation altogether.
Once it is outside your mind and in a physical form, is it still just a thought sir?
In my country there is legal precedent setting that private, unshared documents are tantamount to thought.
[Edited: I'm realizing the person I'm responding to is kinda unhinged, so I'm retracting out of the convo.]
Similar to why Google's latest image generator refuses to produce a correct image of a 'Realistic, historically accurate, Medieval English King'. They have guard rails and system prompts set up to force the output of the generator with the company's values, or else someone would produce Nazi propaganda or worse. It (for some reason) would be attributed to Google and their AI, rather than the user who found the magic prompt words.
Yeah this is probably the most realistic reason
For some scenarios, it's not the image itself but the associations that the model might possibly make from being fed a diet of 4chan and Stormfront's unofficial YouTube channel. The worry is over horrible racist shit, like if you ask it for a picture of a black person, and it outputs a picture of a gorilla. Or if you ask it for a picture of a bad driver, and it only manages to output pictures of Asian women. I'm sure you can think up other horrible stereotypes that would result in a PR disaster.
Eh, a professional human could easily lockpick the majority of front doors out there. Nevertheless I don't think we're going to give up on locking our doors any time soon.
The major risky use cases for image generators are (a) sexual imagery of kids and (b) public personalities in various contexts usable for propaganda.
From George Hotz on Twitter (https://twitter.com/realGeorgeHotz/status/176060391883954211...)
"It's not the models they want to align, it's you."
What specific cases are being prevented by safety controls that you think should be allowed?
Well for starters, ChatGPT shouldn't balk at creating something "in Tim Burton's style" just because Tim Burton complained about AI. I guess its fair use unless a select rich person who owns the data complains. Seems like it isn't fair use at all then, just theft from those who cannot legally defend themselves.
Fair use is an exception to copyright. The issue here is that it's not fair use, because copyright simply does not apply. Copyright explicitly does not, has never, and will never protect style.
Didn't Tom Waits successfully sue Frito Lay when the company found an artist that could closely replicate his style and signature voice, who sang a song for a commercial that sounded very Tom Waits-y?
Yes, though explicitly not for copyright infringement. Quoting the court's opinion, "A voice is not copyrightable. The sounds are not 'fixed'." The case was won under the theory of "voice misappropriation", which California case law (Midler v Ford Motor Co) establishes as a violation of the common law right of publicity.
Yes but that was not a copyright or trademark violation. This article explained it to me:
https://grr.com/publications/hey-thats-my-voice-can-i-sue-th...
That makes it even more ridiculous, as that means they are giving rights to rich complaining people that no one has.
Examples: Can you great an image of a cat in Tim Burton's style? Oops! Try another prompt Looks like there are some words that may be automatically blocked at this time. Sometimes even safe content can be blocked by mistake. Check our content policy to see how you can improve your prompt.
Can you create an image of a cat in Wes Anderson's style? Certainly! Wes Anderson’s distinctive style is characterized by meticulous attention to detail, symmetrical compositions, pastel color palettes, and whimsical storytelling. Let’s imagine a feline friend in the world of Wes Anderson...
As far as Stable Diffusion goes - when the released SD 2.1/XL/Stable Cascade, you couldn't even make a (woman's) nipple.
I don't use them for porn like a lot of people seem too, but it seems weird to me that something that's kind of made to generate art can't generate one of the most common subjects in all of art history - nude humans.
I seem to have the opposite problem a lot of the time. I tried using Meta's image gen tool, and had such a time trying to get it to make art that was not "kind of" sexual. It felt like Facebook's entire learning chain must have been built on people's sexy images of their girlfriend that's all now hidden in the art.
These were examples that were not super blatant, like a tree landscape that just happens to have a human figure and cave in their crotch. Examples:
https://i.imgur.com/RlH4NNy.jpg - Art is very focused on the monster's crotch
https://i.imgur.com/0M8RZYN.jpg - The comparison should hopefully be obvious
Not meant in a rude way, but please consider that your brain is making these up and you might need to see a therapist. I can see absolutely nothing "kind of sexual" in those two pictures.
For some reason its training thinks they are decorative, I guess it’s a pretty funny elucidation of how it works.
I have seen a lot of “pasties” that look like Sorry! game pieces, coat buttons, and especially hell-forged cybernetic plumbuses. Did they train it at an alien strip club?
The LoRAs and VAEs work (see civit.ai), but do you really want something named NSFWonly in your pipeline just for nipples? Haha
I’m not sure if they updated them to rectify those “bugs” but you certainly can now.
Tell me what they mean by "safety controls" first. It's very vaguely worded.
DALL-E, for example, wrongly denied serveral request of mine.
You are using someone elses propietary technology, you have to deal with their limitations. If you don't like there are endless alternatives.
"Wrongly denied" in this case depends on your point of view, clearly DALL-E didn't want this combination of words created, but you have no right for creation of these prompts.
I'm the last one defending large monolithic corps, but if you go to one and want to be free to do whatever you want you are already starting from a very warped expectation.
I don’t feel like it truly matters since they’ll release it and people will happily fine-tune/train all that safety right back out.
It sounds like a reputation/ethics thing to me. You probably don’t want to be known as the company that freely released a model that gleefully provides images of dismembered bodies (or worse).
Oh the big one would be models weights being released for anyone to use or fine tune themselves.
Sure, the safety people lost that battle for Stable diffusion and LLama. And because they lost, entire industries were created by startups that could now use models themselves, without it being locked behind someone else's AI.
But it wasn't guaranteed to go that way. Maybe the safetyists could have won.
I don't we'd be having our current AI revolution if facebook or SD weren't the first to release models, for anyone to use.
Parody and pastiche
Generating images of nazis
https://www.theverge.com/2024/2/21/24079371/google-ai-gemini...
Not specifically SD, but DallE: I wanted to get an image of a pure white British shorthair cat on the arm of a brunette middle-aged woman by the balcony door, both looking outside.
It wasn‘t important, just something I saw in the moment and wanted to see what DallE makes of it.
Generation denied. No explanation given, I can only imagine that it triggered some detector of sexual request?
(It wasn‘t the phrase "pure white", as far as I can tell, because I have lots of generated pics of my cat in other contexts)
No, it's the cacophony of zealous point scores on X they want to avoid.
We detached this subthread from https://news.ycombinator.com/item?id=39466910.
I get a slightly uncomfortable feeling with this talk about AI safety. Not in the sense that there is anything wrong with that (may be or may be not), but in the sense I don't understand what people are talking about when they talk about safety in this context. Could someone explain like I have Asperger (ELIA?) whats this about? What are the "bad actors" possibly going to do? Generate (child) porn/ images with violence etc. and sell them? Pollute the training data so that the racist images pops up when someone wants to get an image of a white pussycat? Or produce images that contain vulnerabilities so that when you open that in your browser you get compromised? Or what?
Excuse me?
You sound offended. My apologies. I had no intention whatsoever to offend anyone. Even if I am not diagnosed, I think I am at least borderline somewhere in the spectrum, and thought that would be a good way to ask people explain without assuming I can read between the lines.
Let's just stick with the widely understood "Explain Like I'm 5" (ELI5). Nobody knows you personally, so this comes off quite poorly.
I think ELI5 means that you simplify a complex issue so that even a small kid understands it. In this case there is no need to simplify anything, just explain what a term actually means without assuming reader understanding nuances of terms used. And I still do not quite get how ELIA can be considered hostile, but given the feedback, maybe I avoid it in the future.
Saying "explain like I have <specific disability>" is blatantly inappropriate. As a gauge: Would you say this to your coworkers? Giving a presentation? Would you say this in front of (a caretaker for) someone with Autism? Especially since Asperger's hasn't even been used in practice for, what, over a decade?
Then just ask the question itself.
AI isn't a coworker, not a human so it's not as awkward to talk about one's disability.
I'm not part of Stability AI but I can take a stab at this:
The AI is being limited so that it cannot produce any "offensive" content which could end up on the news or go viral and bring negative publicity to Stability AI.
Viral posts containing generated content that brings negative publicity to Stability AI are fine as long as they're not "offensive". For example, wrong number of fingers is fine.
There is not a comprehensive, definitive list of things that are "offensive". Many of them we are aware of - e.g. nudity, child porn, depictions of Muhammad. But for many things it cannot be known a priori whether the current zeitgeist will find it offensive or not (e.g. certain depictions of current political figures, like Trump).
Perhaps they will use AI to help decide what might be offensive if it does not explicitly appear on the blocklist. They will definitely keep updating the "AI Safety" to cover additional offensive edge cases.
It's important to note that "AI Safety", as defined above (cannot produce any "offensive" content which could end up on the news or go viral and bring negative publicity to Stability AI) is not just about facially offensive content, but also about offensive uses for milquetoast content. Stability AI won't want news articles detailing how they're used by fraudsters, for example. So there will be some guards on generating things that look like scans of official documents, etc.
So it's just fancy words for safety (legal/reputational) for Stability AI, not users?
Yes*. At least for the purposes of understanding what the implementations of "AI safety" are most likely to entail. I think that's a very good cognitive model which will lead to high fidelity predictions.
*But to be slightly more charitable, I genuinely think Stability AI / OpenAI / Meta / Google / MidJourney believe that there is significant overlap in the set of protections which are safe for the company, safe for users, and safe for society in a broad sense. But I don't think any released/deployed AI product focuses on the latter two, just the first one.
Examples include:
Society + Company: Depictions of Muhammad could result in small but historically significant moments of civil strife/discord.
Individual + Company: Accidentally generating NSFW content at work could be harmful to a user. Sometimes your prompt won't seem like it would generate NSFW content, but could be adjacent enough: e.g. "I need some art in the style of a 2000's R&B album cover" (See: Sade - Love Deluxe, Monica - Makings of Me, Rihanna - Unapologetic, Janet Jackson - Damita Jo)
Society + Company: Preventing the product from being used for fraud. e.g. CAPTCHA solving, fraudulent documentation, etc.
Individual + Company: Preventing generation of child porn. In the USA, this would likely be illegal both for the user and for the company.
Just as an example:
https://arstechnica.com/information-technology/2024/02/deepf...
The bad actor might be the model itself, e.g., returning unwanted pornography or violence. Do you have a problem with Google’s SafeSearch?
This is the world we live in. CYA is necessary. Politicians, media organizations, activists and the parochial masses will not brook a laissez faire attitude towards the generation of graphic violence and illegal porn.
Not even legal porn, unfortunately. Or even the display of a single female nipple…
looking at the manual censorship of the big channels on youtube, you don't even need to display anything, just suggesting it is enough to get a strike.
(of course unless you are into yoga, then everything is permitted)
...or children's gymnastics.
Great talk about slavery and religious-persecution, Jim! Wait, what were we talking about? Fucking American fascists trying to control our thoughts and actions, right right.
I really wish that every discussion about a new model didn’t rapidly become a boring and shallow discussion about AI safety.
AI is not an engineered system; it's emergent behavior from a system we can vaguely direct but do not fundamentally understand. So it's natural that the boundaries of system behavior would be a topic of conversation pretty much all the time.
EDIT: Boring and shallow are, unfortunately, the Internet's fault. Don't know what to do about those.
At least in some latest controversies (e.g. Gemini generation of people) all of the criticized behavior was not emergent from ML training, but explicitly intentionally engineered manually.
This announcement only mentions safety. What else do you expect to talk about?
What's equally interesting is that while they spend a lot of words on safety, they don't actually say anything. The only hint what they even mean by safety is that they took "reasonable steps" to "prevent misuse by bad actors". But it's hard to be more vague than that. I still have no idea what they did and why they did it, or what the threat model is.
Maybe that will be part of future papers or the teased technical report. But I find it strange to put so much emphasis on safety and then leave it all up to the reader's imagination.
Remember when AI safety meant the computers weren’t going to kill us?
Now people spend a lot of time making them worse to ensure we don’t see boobs.
BTW Nvidia and AMD are baking safety mechanisms into the fucking video drivers
No where is safe
Do you have a reference on this?
PSA: There are now calls to embed phone-home / remote kill switch mechanisms into hardware because “AI safety”.
examples? seems like it would be easier to instead communicate with ISPs.
I agree with you, but when companies don't implement these things, they get absolutely trashed in the press & social media, which I'm sure affects their business.
What would you have them do? Commit corporate suicide?
This is a good question. I think it would be best for them to give some sort of signal, which would mean "We're doing this because we have to. We are willing to change if you offer us an alternative." If enough companies/people did this, at some point change would become possible.
They rather talk about "reasonable steps" to safety. Sounds like "just the minimum so we don't end up in legal trouble" to me...
There is some truth in what you say, just like saying you're a "free speech absolutist" sounds good at first blush. But the real world is more complicated, and the provider adds safety features because they have to operate in the real world and not just make superficial arguments about how things should work.
Yes, they are protecting themselves from lawsuits, but they are also protecting other people. Preventing people asking for specific celebrities (or children) having sex is for their benefit too.
thanks, i hadn't fully realized that 'safety' means 'safe to offer' and not 'safe for users'. i won't forget it
I think this AI safety thing is great. These models will be used by people to make boring art. The exciting art will be left for people to make.
This idea of AI doing the boring stuff is good. Nothing prevents you from making exciting, dangerous, or 'unsafe' art on your own.
My feeling is that most people who are upset about AI safety really just mean they want it to generate porn. And because it doesn't, they are upset. But they hide it under the umbrella of user freedom. You want to create porn in your bedroom? Then go ahead and make some yourself. Nothing stopping you, the person, from doing that.
Any large publicly available model has no choice but to do this. Otherwise, they're petrified of a PR nightmare.
Models with a large user base will have an inverse relationship with usability. That's why it's important to have options to train your own with open source.
I think this isn’t software as much as a service. When viewed through this lens the guard rails make more sense.
It's also "safety" in the sense that you can deploy it as part of your own application without human review and not have to worry that it's gonna generate anything that will get you in hot water.