HN comments for: AI behavior guardrails should be public

I strongly suspect Google tried really, really hard here to overcome the criticism is got with previous image recognition models saying that black people looked like gorillas. I am not really sure what I would want out of an image generation system, but I think Google's system probably went too far in trying to incorporate diversity in image generation.

Surely there is a middle ground.

"Generate a scene of a group of friends enjoying lunch in the park." -> Totally expect racial and gender diversity in the output.

"Generate a scene of 17th century kings of Scotland playing golf." -> The result should not be a bunch of black men and Asian women dressed up as Scottish kings, it should be a bunch of white guys.

Surely there is a middle ground. "Generate a scene of a group of friends enjoying lunch in the park." -> Totally expect racial and gender diversity in the output.

Do we expect this because diverse groups are realistically most common or because we wish that they were? For example only some 10% of marriages are interracial, but commercials on TV would lead you to believe it’s 30% or higher. The goal for commercials of course is to appeal to a wide audience without alienating anyone, not to reflect real world stats.

What’s the goal for an image generator or a search engine? Depends who is using it and for what, so you can’t ever make everyone happy with one system unless you expose lots of control surface toggles. Those toggles could help users “own” output more, but generally companies wouldn’t want to expose them because it could shed light on proprietary backends, or just take away the magic from interacting with the electric oracles.

Also, these tools are used world-wide and "diversity" means different things in different places. Somehow it's always only the US ideal of diversity that gets shipped abroad.

US companies systematically push US cultural beliefs and expectations. People in the US probably don’t notice it any more, but it’s pretty obvious from those of us on the receiving end of US cultural domination.

This fact is an unavoidable consequence of the socioeconomic realities of the world, but it obviously clashes with these companies’ public statements and positions.

Yes but it's especially cynical in this case because the belief their pushing is that diversity matters, that biases need to be overcome, and that all people need to be represented.

Claiming all of that but then shoving your own biases down the rest of the world's throat while not representing their people in any way is especially cynical in my opinion. It undermines the whole thing.

Hypocrisy is an especially destructive kind of betrayal, which is why the crooked cop or the pedo priest are so disappointing. Would be nice if companies would merely exploit us without all the extra insult of telling us they are changing the world / it’s for our own good / it’s what we asked for/, etc

This is the exact same mindset that invented the word “Latinx”. Compress most of an entire hemisphere of ethnic and cultural diversity down to the vague concept of “Latino”, notice that Spanish is a gendered language so the word “Latino” is also gendered, completely forget that you already have the gender neutral English word “Latin”, invent a new word that virtually none of the people to whom it applies actually identifies with, and then shamelessly use it all the time.

Yeah, someone else mentioned Tokyo which is not going to have as much variety among park visitors as NYC. But then again neither will Colorado (or almost anywhere else!) be as diverse. Some genius at corporate is probably scheming about making image generation as location-sensitive as search is, ostensibly to provide utility but really to perpetuate echo chambers and search bubbles. I wish computing in general would move back towards user-controlled rather than guess-what-I-mean and the resulting politicization, but it seems that ship has sailed.

And most women are friends with mostly women, and most men are friends with mostly men.

19% of new marriages in 2019 (and likely to rise): https://en.m.wikipedia.org/wiki/Interracial_marriage_in_the_...

Plus, it’s still a recent change: Loving v Virginia (legalized interracial marriage across US) was decided in 1967.

"Generate a scene of 17th century kings of Scotland playing golf." -> The result should not be a bunch of black men and Asian women dressed up as Scottish kings, it should be a bunch of white guys.

It works in bing, at least:

https://www.bing.com/images/create/a-picture-of-some-17th-ce...

I don't know that this sheds light on anything but I was curious...

a picture of some 21st century scottish kings playing golf (all white)

https://www.bing.com/images/create/a-picture-of-some-21st-ce...

a picture of some 22nd century scottish kings playing golf (all white)

https://www.bing.com/images/create/a-picture-of-some-22nd-ce...

a picture of some 23rd century scottish kings playing golf (all white)

https://www.bing.com/images/create/a-picture-of-some-23rd-ce...

a picture of some contemporary scottish people playing golf (all white men and women)

https://www.bing.com/images/create/a-picture-of-some-contemp...

a picture of futuristic scottish people playing golf in the future (all white men and women, with the emergence of the first diversity in Scotland in millennia! Male and female post-human golfers. Hummmpph!)

https://www.bing.com/images/create/a-picture-of-futuristic-s...

Inductive learning is inherently a bias/perspective absorbing algorithm. But tuning in a default bias towards diversity for contemporary, futuristic and time agnostic settings seems like a sensible thing to do. People can explicitly override the sensible defaults as necessary, i.e. for nazi zombie android apocalypses, or the royalty of a future Earth run by Chinese overlords (Chung Kuo), etc.

Diversity is cool, but who gets to decide what's diverse?

Good point. People of European descent have more diversity in hair color, hair texture and eye color than any other race. That’s because a lot of those traits are recessive and are only expressed in isolated gene pools (European peoples are an isolated gene pool in this sense).

Isn't this like the exact opposite of the conclusions of the HapMap project?

I'm really disappointed that nth-century seems to have no effect at all. I'm expecting Kilts in Space.

It’s a perfect illustration of the way these models work. They are fundamentally incapable of original creation and imagination, they can only regurgitate what they have already been fed.

People can explicitly override the sensible defaults as necessary

They cannot, actually. If you look at some of the examples in the Twitter thread and other threads linked from it, Gemini will mostly straight up refuse requests like e.g. "chinese male", and give you a lecture on why you're holding it wrong.

Two weeks ago I tried the following prompts, and I was very surprised by the "diversity" of my dark age soldiers:

https://www.bing.com/images/create/selfy-of-a-group-of-dark-...

You can see how this gets challenging, though, right?

If you train your model to prioritize real photos (as they're often more accurate representations than artistic ones), you might wind up with Denzel Washington as the archetype; https://en.wikipedia.org/wiki/The_Tragedy_of_Macbeth_(2021_f....

There's a vast gap between human understanding and what LLMs "understand".

If they actually want it to work as intelligently as possible, they'll begin taking these complaints into consideration and building in a wisdom curating feature where people can contribute.

This much is obvious, but they seem to be satisfied with theory over practicality.

Anyway I'm just ranting b/c they haven't paid me.

How about an off the wall algorithm to estimate how much each scraped input turns out to influence the bigger picture, as a way to work towards satisfying the copyright question.

An LLM-style system designed to understand Wikipedia relevance and citation criteria and apply them might be a start.

Not that Wikipedia is perfect and controversy-free, but it's certainly a more sophisticated approach than the current system prompts.

Then who in this black box private company is the Oracle of infinite wisdom and truth!? Who are you putting in charge? Can I get a vote?

If you train your model to prioritize real photos

I thought that was the big bugbear about disinformation and false news, but now we have to censor reality to combat "bias"

I mean now you d to train AI to recognise the bias in the training data.

As soon as you have then playing an anachronistic sport you should expect other anachronistic imagery to creep in, to be fair.

https://en.wikipedia.org/wiki/Golf

The modern game of golf originated in 15th century Scotland.

Oh fair enough then.

anachronistic sport

Scottish kings absolutely played golf.

"Generate a scene of 17th century kings of Scotland playing golf." -> The result should not be a bunch of black men and Asian women dressed up as Scottish kings, it should be a bunch of white guys.

is black man in the role of the Scottish king represents a bigger error than some other errors in such an image, like say incorrect dress details or the landscape having say a wrong hill? I'd venture a guess that only our racially charged mentality of today considers that a big error, and may be in a generation or 2 an incorrect landscape or dress detail would be considered much larger error than a mismatched race.

There's no "middle" in the field of decompressing a short phrase into a visual scene (or program or book or whatever). There are countless private, implicit assumptions that users take for granted yet expect to see in the output, and vendors currently fear that their brand will be on the hook for the AI making a bad bet about those assumptions.

So for your first example, you totally expect racial and gender diversity in the output because you're assuming a realistic, contemporary, cosmopolitan, bourgeoisie setting -- either because you live in one or because you anticipate that the provider will default to one. The food will probably look Western, the friends will probably be young adults that look to have professional or service jobs wearing generic contemporary commercial fashion, the flora in in the park will be broadly northern climate, etc.

Most people around the world don't live in an environment anything like that, so nominal accuracy can't be what you're looking for. What you want, but don't say, is a scene that feels familiar to you and matches what you see as the de facto cultural ideal of contemporary Western society.

And conveniently, because a lot of the training data is already biased towards that society and the AI vendors know that the people who live in that society will be their most loyal customers and most dangerous critics right now, it's natural for them to put a thumb on the scale (through training, hidden prompts, etc) that gets the model to assume an innocuous Western-media-palatable middle ground -- so it delivers the racially and gender diverse middle class picnic in a generic US city park.

But then in your second example, you're implicitly asking for something historically accurate without actually saying that accuracy is what's become important for you in this new prompt. So the same thumb that biased your first prompt towards a globally-rare-but-customer-palatable contemporary, cosmopolitan, Western culture suddenly makes your new prompt produce something surreal and absurd.

There's no "middle" there because the problem is really in the unstated assumptions that we all carry into how we use these tools. It's more effective for them to make the default output Western-media-palatable and historical or cultural accuracy the exception that needs more explicit prompting.

If they're lucky, they may keep grinding on new training techniques and prompts that get more assumptions "right" by the people that matter to their success while still being inoffensive, but it's no simple "surely a middle ground" problem.

Quite reminded of this episode: https://www.eurogamer.net/kingdom-come-deliverance-review (black representation in a video game about 15th century Bohemia; it was quite the controversy)

Why would you expect anything you didn't specify in the output of the first prompt? If there are friends, lunch, and a park: it did what you asked.

Piling a bunch of neurotic expectations about it being a Benneton ad on top of that is absurd. When you can trivially add as much content to the description as you want, and get what you ask for, it does not matter what the default happens to be.

"Generate a scene of a group of friends enjoying lunch in the park." -> Totally expect racial and gender diversity in the output.

I'd err on the side of "not unexpected". A group of friends in a park in Tokyo is probably not very diverse, but it's not outside of the realm of possibility. Only white men were golfing Scottish kings if we're talking strictly about reality and reflecting it properly.

It feels like it wouldn't even be that hard to incorporate into LLM instructions (aside from using up tokens), by way of a flowchart like "if no specific historical or societal context is given for the instructions, assume idealized situation X; otherwise, use historical or projected demographic data to do Y, and include a brief explanatory note of demographics if the result would be unexpected for the user". (That last part for situations with genuine but unexpected diversity; for example, historical cowboys tending much more towards non-white people than pop culture would have one believe.)

Of course, now that I've said "it seems obvious" I'm wondering what unexpected technical hurdles there are here that I haven't thought of.

The focus for alignment is to avoid bad PR specifically the kind of headlines written by major media houses like NYT, WSJ, WaPo. You could imagine the headlines like "Google's AI produced a non-diverse output on occasions" when a researcher/journalist is trying too hard to get the model to produce that. The hit on Google is far bigger than say on Midjourney or even Open AI till now (I suspect future models will be more nerfed than what they are now)

For the cases you mentioned, initially those were the examples. It gets tricky during red teaming where they internally try out extreme prompts and then align the model for any kind of prompt which has a suspect output. You train the model first, then figure out the issues, and align the model using "correct" examples to fix those issues. They either went to extreme levels doing that or did not test it on initial correct prompts post alignment.

Remind yourself we're discussing censorship, misinformation, inability to define or source truth and we're concerned on Day 1 about the results of image gen being controlled by a for profit single entity with incentives that focus solely on business and not humanity...

Where do we go from here? Things will magically get better on their own? Businesses will align with humanity and morals, not their investors?

This is the tip of the iceberg of concerns and it's ignored as a bug in the code not a problem with trusting private companies with defining truth.

Where do we go from here?

opensource models and training sets. So basically the "secret sauce" minus the hardware. I don't see it happening voluntarily.

Absolutely it won't. We've armed the issue with a supersonic jet engine and we're assuming if we build a slingshot out of pop sticks we'll somehow catch up and knock it off course.

i can't predict the future but there is precedent. Models, weights, and dataasets are the keys to the kingdom like operating system kernels, databases, and libraries use to be. At some point, enough people decided to re-invent and release these things or functionality to all that it became a self-sustaining community and, eventually, transformative to daily life. On the other hand, there may be enough people in power who came from and understand that community to make sure it never happens again.

No, compute is keys to the kingdom. The rest are assets and ammunition. You out-compute your enemy, you out-compute your competition. That's the race. The data is part of the problem, not the root.

These companies are silo'ing the worlds resources. GPU, Finance, Information. Those combined are the weapon. You make your competition starve in the dust.

These companies are pure evil pushing an agenda of pure evil. OpenAI is closed. Google is Google. We're like, ok, there you go! Take it all. No accountability, no transparency, we trust you.

Compute lets you fuck up a lot when trying to build a model, but you need data to do anything worth fucking up in the first place, and if you have 20% of the compute but you fuck up 1/5th as much you're doing fine.

Meta/OpenAI/Google can fuck up a lot because of all their compute, but ultimately we learn from that as the scientists doing the research at those companies would instantly bail if they couldn't publish papers on their techniques to show how clever they are.

I never said each of these exist in a vacuum. It is the collation of all that is the danger. This isn't democratic. This is companies now toying with governmental ideologies.

I see it as not unlikely that there'll be a campaign to sigmatize, if not outright ban, open source models on the grounds of "safety". I'm quite surprised at how relatively unimpeded the distribution of image generation models has been, so far

This is already happening, actually, although the focus so far has been on the possibility of their use for CSAM:

https://www.theguardian.com/society/2023/sep/12/paedophiles-...

What you're predicting has already started. Two weeks ago Geoffrey Hinton gave a speech advocating banning open source AI models (see e.g.: https://thelogic.co/news/ai-cant-be-slowed-down-hinton-says-... ).

I'm surprised there wasn't an HN thread about it at the time.

The ridiculous degree of PC alignment of corporate models is the thing that's going to let open source win. Few people use bing/dall-e, but if OpenAI had made dall-e more available and hadn't put ridiculous guardrails on it, stable diffusion would be a footnote at this point. Instead, dall-e is a joke and people who make art use stable diffusion, with casuals who just want some pretty looking pictures using midjourney.

No, ignoring laws and stealing data to increase your Castle's MOAT is the win. Compute isn't an open source solvable problem. I can't DirtyPCB's an A100

Making the argument open source is the answer is an agenda of making your competition spin wheels.

You're on a thread about how people are lambasting big money AI for being garbage, and producing inferior results to OSS tools you can run on consumer GPUS, tell me again how unbeatable google/other big tech players are.

I've been part of the advertising and marketing world for a lot of these companies for a decade plus, I've helped them sell bullshit. I've also been at the start of the AI journey, I've downloaded and checked local models of all promises and variances.

To say they're better than the compute that OpenAI or Google are throwing at the problem is just plain wrong.

I left the ad industry the moment I realised my skills and talents are better used informing people than lying to them.

This thread is not at all comparing the ethical issues of AI with local anything. You're conflating your solution with another problem.

Is a $5 can opener better than a $2000 telescope at opening cans? Yes. Is stable diffusion better at producing finished art, by virtue of not being closed off and DEI'd to oblivion so that it can actually be incorporated into workflows? Emphatically yes.

It doesn't matter how fancy your engineering is and how much money you have if you're too stupid to build the right product.

As for this being written nonsense, that's the sort of thing someone who couldn't find an easy way to win an argument and was bitter about the fact would say.

Don’t count out Adobe Firefly. I wouldn’t be surprised if it’s used more than all the other image gen models combined.

That might be true, but if you're using firefly as a juiced up content-aware fill in photoshop I'm not sure it's apples to apples.

I remember checking like a year ago and they still had the word "gorilla" blacklisted (i.e. it never returns anything even if you have gorilla images).

Gotta love such a high quality fix. When your upper high tech, state of the art algorithm learns racist patterns just blocklist the word and move on. Don't worry about why it learned such patterns in the first place.

Humans do look like gorillas. We're related. It's natural that an imperfect program that deals with images will will mistake the two.

Humans, unfortunately, are offended if you imply they look like gorillas.

What's a good fix? Human sensitivity is arbitrary, so the fix is going to tend to be arbitrary too.

A good fix would, in my opinion, understanding how the algorithm is actually categorizing and why it miss-recognized gorillas and humans.

If the algorithm doesn't work well they have problems to solve.

But this is not an algorithm. It's a trained neural network which is practically a black box. The best they can do is train it on different data sets, but that's impractical.

That's exactly the problem I was trying to reference. The algorithms and data models are black boxes - we don't know wat they learned or why they learned it. That setup can't be intentionally fixed, and more importantly we wouldn't know if it was fixed because we can only validate input/output pairs.

It's too costly to potentially make that mistake again. So the solution guarantees it will never happen again.

You do understand that this has nothing to humans in general right? This isn't AI recognizing some evolutionary pattern and drawing comparisons to humans and primates -- it's racist content that specifically targets black people that is present in the training data.

Nope. This is due to a past controversy about image search: https://www.nytimes.com/2023/05/22/technology/ai-photo-label...

Where can I learn about this?

I don't know nearly enough about the inner workings of their algorithm to make that assumption.

The internet is surely full of racist photos that could teach the algorithm. The algorithm could also have bugs that miss-categorize the data.

The real problem is that those building and managing the algorithm don't fully know how it works or, more importantly, what it had learned. If they did the algorithm would be fixed without a term blocklist.

As well as that, I suspect the major AI companies are fearful of generating images of real people - presumably not wanting to be involved with people generating fake images of "Donald Trump rescuing wildfire victims" or "Donald Trump fighting cops".

Their efforts to add diversity would have been a lot more subtle if, when you asked for images of "British Politician" the images were recognisably Rishi Sunak, Liz Truss, Kwasi Kwarteng, Boris Johnson, Theresa May, and Tony Blair.

That would provide diversity while also being firmly grounded in reality.

The current attempts at being diverse and simultaneously trying not to resemble any real person seems to produce some wild results.

My takeaway from all of this is that alignment tech is currently quite primitive and relies on very heavy-handed band-aids.

I think that's a bit overly charitable.

Would it not be reasonable to also draw the conclusion that notion of alignment itself is flawed?

We're honestly just seeing generative algorithms failing at diversity initiatives as badly as humans for.

Forcing diversity into a system is an extremely tough, if not impossible, challenge. Initiatives have to be driven my goals and metrics, meaning we have to boil diversity down to a specific list of quantitative metrics. Things will always be missed when our best tool to tackle a moral or noble goal is to boil a complex spectrum of qualitative data to a subset of measurable numbers.

They have now added a strong bias for generating black people now. Some have prompted to generate a picture of a German WW2 soldier, and now there are many pictures of black people floating around in NAZI uniforms.

I think their strategy to "enhance" outcomes is very misdirected.

The most widely used base models to really fine tune models are those that are not censored and I think you have to construct a problem to find one here. Of course AI won't generate a perfect world, but this is something that will probably only get better with time when users are able to adapt models to their liking.

...when users are able to adapt models to their liking.

Therein lies the rub, as it were, because the large providers of AI models are working hard to ensure legislation that wouldn't allow people access to uncensored models in the name of "safety." And "safety" in this case includes the notion that models may not push the "correct" world-view enough.

Judging by the way it words some of the responses to those queries, they "fixed" it by forcibly injecting something like "diverse image showcasing a variety of ethnicities and genders" in all prompts that are classified as "people".

I've never been involved with implementing large-scale moderation or content controls, but it seems pretty standard that underlying automated rules aren't generally public, and I've always assumed this is because there's a kind of necessary "security through obscurity" aspect to them. E.g., publish a word blocklist and people can easily find how to express problematic things using words that aren't on the list. Things like shadowbans exist for the same purpose; if you make it clear where the limits are then people will quickly get around them.

I know this is frustrating, we just literally don't seem to have better approaches at this time. But if someone can point to open approaches that work at scale, that would be a great start...

There is no need to implement large scale censorship and moderation in this case. Where is the security concern? That I can generate images of white people in various situations for my five minutes of entertainment?

The whole premise of your argument doesn't make sense. I'm talking to a computer, nobody gets hurt.

It's like censoring what I write in my notes app vs. what I write on someone's Facebook wall. In one case, I expect no moderation, whereas in the other case, I get that there needs to be some checks.

But what if little timmy asks it how to make a bomb? What if it's racist?

Then what?

None of this should be a mystery. Making a bomb is literally something you can figure out with very little research (my friends and I used to blow up cow pastures for fun!).

Racism is a totally different and sadder issue. I don’t have a good answer for that one, but knowledge shouldn’t be withheld because someone thinks it is “dangerous”

It's fine. We have laws against blowing people up.

Racism is fine as well. I won't date out of my race and if you think there should be a law that I must that's not really freedom. As for hiring or not based upon race there's already a law against that.

The cure is often worse than navigating uneasy waters. Every time you pass a law you give a gun to a bureaucrat.

When these companies say there are "security concerns" they mean for them, not you! And they mean the security of their profits. So anything that can cause them legal liability or cause them brand degradation is a "security concern".

It's definitely this at this stage. But by not having any discourse we'll end up normalizing it even before establishing consensus on what's appropriate to expect from human/AI interaction and how much of a problem is the actual model, opposed to a user. Not being able to generate innocent content is ridiculous. Probably they're overshooting and learning how to draw the stricter lines right now, but if you don't argue, you'll allow to boil this frog into a Google "Search" again.

What if you are engaged in a wrongthink? How would you suggest this to be controlled instead?

Straight to Guantanamo.

Our perception of the world has become so abstract that most people can't discern metaphors from the material world anymore

The map is not the territory

Sociologists, anthropologists, philosophers and the like might find a lot of answers by looking into the details of what is included into genAI alignment and trace back the history of why we need each particular alignment

Where is the security concern? That I can generate images of white people in various situations for my five minutes of entertainment?

I'd love an example of "guardrails" in action on a topic of relevance to actual adults. There's a connection I can't find between the ability to make racist memes and literally anything else I want to do with AI.

The concern is that companies have long known that it's bad for business if your product is a toxic cesspit or can be used to generate horrible headlines that reflect poorly on your brand.

It's not "woke," and it's not censorship. It's literally the free market.

When someone unironically uses the term "problematic" you can reasonably assume the position they are going to argue from...

It's part of the marketing. By saying their models are powerful enough to be gasp Dangerous, they are trying to get people to believe they're insanely capable.

In reality, every model so far has been either a toy, a way of injecting tons of bugs into your code (or circumventing GPL by writing bugs), or a way of justifying laying off the writing staff you already wanted to shit can.

They have a ton of potential and we'll get there soon, but this isn't it.

This is simply a bad approach and a bad argument. Security through obscurity is a term whose only usage in security circles is derogatory. People figure out how to get around these auto-censors just fine, and not publishing them creates more problems for legitimate users and more plausible deniability for bad policy hidden in them. Doing the same thing but with public policy would already be better, albeit still bad.

The only real solution to the problem of there being an enormous public square controlled by private corporations is to end this situation

I mean people can get through your front door using any number of attacks that either exploit weaknesses in the locking mechanism, or circumvent your locks via a carefully placed brick through a window or something like that. Even if you have alarms and other measures, that doesn't stop someone from doing a quick smash and grab, etc. All of these security flaws doesn't mean that you should leave your door open and unlocked all the time. Imperfect security isn't useless security.

And the purpose of things like bad-word filters is to make a best effort at blocking stuff which violates the platform TOC and makes plausible deniability much less likely when someone is deliberately circumventing the filters. The existence of false positives and false negatives is considered acceptable in an imperfect world. The filters themselves also only block the action to change a username or whatever and don't punish the user or deny use of the platform entirely (they're much less punitive than the AI abuse algorithms that auto-ban people off of Google/GitHub/etc).

Your analogy is also bad. I agree that perfect security is impossible and that is completely irrelevant here. What platforms like this do by not publishing their policies is more akin to insisting you use their special locks on your door that they claim protect you better because no one knows how they work. Maybe they're operated by an AI working with a Ring camera or something? Very fancy stuff. With this kind of tech, you may be locked out of your home for reasons you don't understand. An independent locksmith might have a hard time figuring out what's going wrong with the door if it fails. You have no idea if some burglars are authorized to enter your home trivially by a deal with the company. If the company decides you are in the wrong in any context, they have the unilateral power to deny you access with no clear recourse. They may get you arrested for trying to get into your own home

They may get you arrested for trying to get into your own home

The AI algorithms that ban people from platforms like Google and GitHub do that, which I explicitly called out as needing more oversight.

That is different from algorithms which just prevent you from doing something on a platform like using n-bombs in your username, or the LLM guardrails that just give you mangled answers or tell you that they can't do that. That isn't analogous to getting arrested.

And in these cases the analogy really falls apart because it isn't your home, and it isn't critical for your life.

Friend, it was your analogy

I was intending to make a point about security through obscurity, not to make an analogy with those systems.

Is a content moderation policy the same thing as "security"? Do we get to apply the best practices of the one to the other because they overlap to a smaller or larger degree?

Yes, since content moderation is a form of authorization policy.

The use of "security through obscurity" invited the comparison. It is a better comparison when using automated tools instead of human decision-makers. That said, even if we are talking about policy rather than security, policies that are unknown by and hidden from the people they bind is probably the most recognizable and one of the more onerous features of despotism. We have the term "Kafkaesque" because a whole famous writer literally spent his career pointing out how they don't work and harm the people they affect

And yet, every single security system on the net relies on an element of obscurity to work. Passwords are secret (obscure), as are private keys for SSL/TLS.

This misunderstands what is meant by the concept. The mechanisms and the public policies that dictate how they are used are not obscured. To borrow and improve another commenter's analogy about locks and keys, the lock on your door is more secure because every locksmith in the world knows how it works, which doesn't mean they have your key

Security through obscurity can have a place as part of a larger defense in depth strategy. Alone it's a joke.

Source: in security circles

Even granting that there may be some nuance to whether and to what degree secrecy is valuable in some contexts, the policies used for automated content moderation on large platforms and the policies by which AI systems are aligned are not good candidates for this secrecy having even a beneficial effect, let alone being necessary

publish a word blocklist and people can easily find how to express problematic things using words that aren't on the list.

I'd love to explore that further. It's not the words that are "problematic" but the ideas, however expressed?

Seems like a "problematic" idea, no ?

A word blocklist just serves to apply guardrails. It just slows down common abuse. Very far from perfect, but the alternatives are anything goes or total lockdown. Perfect solutions are pretty damn rare.

You mention "solutions"

Perhaps you mistake me for someone who cares about suggesting solutions for the prolems suffered by giant technopolies, as if they were my problems.

This is basically my view. These tech companies have produced some pretty amazing stuff. Solved problems and built things at scale that were unthinkable a decade ago.

They can solve this problem. They choose not to because a) they're already shielded from legal liability for certain things that happen/are said on their platforms and b) it doesn't make them any money.

Repeat after me: security by obscurity is weak.

Clearly people can work out what some of the rules are, so why not just publish them. If you need to alter them when people figure out how to get around them, well, you already had to anyways.

Kerckoff's principle only applies to crypto. "Security by obscurity" is used in oodles of systems security applications and broader contexts involving human behavior.

Very very ordinary security best practices rely on obscurity. ALSR is a good example. It is defeated by a data exfiltration vulnerability but remains a useful thing to add to your binaries. Because outside of the crypto space security is an onion and layers add additional cost to attackers.

ASLR is not security by obscurity. The addresses into which all the various things are mapped are secret in the same sort of way as cryptographic secret and private keys are secret, but the mechanism is not secret.

My dear fellow, some believe the ends justify the means and play games. Read history, have some decency.

The danger of being captured by such people far outweighs any other "problematic things".

First and foremost any system must defend against that. You love guardrails so much - put them on the self annointed guard railers.

Otherwise, if You Want a Picture of the Future, Imagine a Boot Stamping on a Human Face – for Ever.

And to add insult to injury the jackbooted thug is an AI bot.

If you can't afford to pay a sufficient number of people to moderate a group, you need to reduce the size of the group or increase the number of moderators.

Your speculation implies no responsibility for taking on more than can be handled responsibly, and externalizes the consequences to society at large.

There are responsible ways to have very clear, bright, easily understood, well communicated rules and sufficient staff to manage a community. I don't know why it's simply accepted that giant social networks get to play these games when it's calculated, cold economics driving the bad decisions.

They make enough money to afford responsible moderation. They just don't have to spend that money, and they beg off responsibility for user misbehavior and automated abuses, wring their hands, and claim "we do the best we can!"

If they honestly can't use their billions of adtech revenue to responsibly moderate communities, then maybe they shouldn't exist.

Maybe we need to legislate something to the effect of "get as big as you want, as long as you can do it responsibly, and here are the guidelines for responsible community management..."

Absent such legislation, there's no possible change until AI is able to reasonably do the moderation work of a human. Which may be sooner than any efforts at legislation, at this rate.

What guideline for community management would possibly not be a flagrant 1A violation?

Yes, but the implied problems may not need be approached at all. It's a uniform ideology push, with which people agree differently at different levels. If companies don't want to reveal the full set of measures, they could at least summarize them. I believe even these summaries would be what subj tweet refers to as "ashamed".

We cannot discuss or be aware of the problems-and-approaches, unless they are explicitly stated. Your analogy with content moderation is a little off, because it's not a set of measures that is hidden, but the "forum rules" themselves. One thing is AI refusing with an explanation. That makes it partially useless, but it's their right to do so. Another thing if it silently avoids or directs topics due to these restrictions. Pretty sure authors are unable to clearly separate the two cases, and also maintain the same quality as the raw model.

At the end of the day people will eventually give up and use Chinese AI instead, cause who cares if it refuses to draw CCP people while doing everything else better.

Most legal systems operate at the nation-state scale and aren't made of hidden mystery laws. There are lots of reasons for that.

We've already had this argument with cryptocurrency, where we've basically decided that the existing legal system (although external) provides a sufficient toolset to go after bad actors.

Finally, based on the illiberal nature of most AI Safety Sycophants' internet writings, I don't like who they are as people and I don't trust them to implement this.

I think this is a fair approach when things work well enough that a typical user doesn’t need to worry about whether they’ll trigger some kind of special content/moderation logic. If you shadowban spammers and real users almost never get flagged as spammers, the benefits of being tight-lipped outweigh those of the very few users who get improperly flagged or are just curious.

With some of these models the guardrails are so clumsy and forced that I think almost any typical user will notice them. Because they include outright work-refusal it’s a very frustrating UX to have to “discover” the policy for yourself through trial and error.

And because they’re more about brand management than preventing fraud/bad UX for other users, the failure modes are “someone deliberately engineered a way to get objectionable content generated in spite of our policies.” Obviously some kinds of content are objectionable enough for this to be worth it still, but those are mostly in the porn area - if somebody figures out a way to generate an image that’s just not PC, despite all the safety features, shouldn’t that be on them rather than the provider?

Even tuning the model for political correctness is not the end of the world in my opinion, a lot of LLMs do a perfectly reasonable job for my regular use cases. With image generators they are going so far as to obviously (there’s no other way that makes sense) insert diversity sub prompts for some fraction of images which is simply confusing and amateur. Everybody who uses these products just a little bit will notice it. It’s also so cautious that even mild stuff (I tried to do the “now make it even more X” with “American” and it stopped at one iteration) gets caught in the filters. You’re going to find out the policies anyway because they’re so broad an likely to be encountered while using the product innocently - anything a real non-malicious user is likely to get blocked by should be documented.

Yeah, there's absolutely no need for transparency /s

https://youtu.be/THZM4D1Lndg?si=0QQuLlH7JebSa6w3&t=485

If it doesn't start 8 minutes in, go to the 8 minute mark. Then again I can see why some wouldn't want transparency.

gemini seems to have problems generating white people and honestly this just opens the door for things that are even more racist [1], the harder you try the more you'll fail, just get over the DEI nonsense already

1. https://twitter.com/wagieeacc/status/1760371304425762940

Is there any evidence that this is a consequence of DEI rather than a deeper technical issue?

You get 4 images per time and are lucky to get one white person when asked for it, no other model has that issue. Other models has no problems generating black people either, so it isn't that other models only generates white people.

So either it isn't a technical issue or Google failed to solve a problem everyone else easily solved. The chances of this having nothing to do with DEI is basically 0.

Depending on how broadly you define it, something like 10-30% of the world's population is white. Africa is about 20% of the world population; Asia is 60% of it.

One in four sounds about right?

It does the same if you ask for pictures of past popes, 1945 German soldiers, etc.

It'll also add extra fingers to human hands. Presumably that's not because of DEI guardrails about polydactyly, right?

The current state of the art in AI gets things wrong regularly.

Sure, but this one is from Google adding a tag to make every image of people diverse, not AI randomness.

Am I missing something in the link demonstrating that, or is it conjecture?

https://twitter.com/altryne/status/1760358916624719938

Here's some corporate-lawyer-speak straight from Google:

We are aware that Gemini is offering inaccuracies...

As part of our AI principles, we design our image generation capabilities to reflect our global user base, and we take representation and bias seriously.

That doesn't back up the assertion; it's easily read as "we make sure our training sets reflect the 85% of the world that doesn't live in Europe and North America". Again, 1/4 white people is statistically what you'd expect.

Fuck, this is going to sound fucked up... but just because you have a 1/4 chance of getting a random white person from the globe, they generally tend to clump together. For example, you generally find a shitload of Asian people in Asia, white people in Europe, and African people in Africa, and Indian people in India.

Probably the only chance where you wouldn't expect this are in heavily colonized places like South Africa, Australia, and the Americas.

Sure, but I see three 200 responses and a 400 - not 1/4 white people as statistically expected.

OpenAI has no problem showing accurate pictures. You know it's Google-induced bias, but feign ignorance.

If you ask for a picture of nazi soldiers it shouldn't have 60% Asian people like you say. You know you're wrong but instead of admitting it, you're moving the goalpost to "hands".

This entire thread is you being insincere.

Have you bothered to look at all? Read the output of the model when asked about why it has the behaviour it does. Look at the plethora of images it generates that are not just historically inaccurate but absurdly so. It tells you "heres a diverse X" when you ask for X. Yet asking for pictures of Koreans generates only Asian people but prompts for Scots or French people in historical periods generate mostly non-white people. You're being purposefully obtuse, Google has had racism complaints about previous models, talks often about AI safety and avoiding 'bias'. You're trying to argue that it's more likely that the training data had an inherent bias against generating white people in images purely by chance?

If you look closely at the response text that accompanies many of these images, you'll find recurring wording like "Here's a diverse image of ... showcasing a variety of ethnicities and genders". The fact that it uses the same wording strongly implies that this is coming out of the prompt used for generation. My bet is that they have a simple classifier for prompts trained to detect whether it requests depiction of a human, and appends "diverse image showcasing a variety of ethnicities and genders" to the prompt the user provided if so. This would totally explain all the images seen so far, as well as the fact that other models don't have this kind of bias.

It's been demonstrated on Twitter a few times, can't find a link handy

This specific thing is a much more blatant class of error, and one that has been known to occur in several previous models because of DEI systems (e.g. in cases where prompts have been leaked), and has never been known to occur for any other reason. Yes, it's conceivable that Google's newer, beter-than-ever-before AI system somehow has a fundamental technical problem that coincidentally just happens to cause the same kind of bad output as previous hamfisted DEI systems, but come on, you don't really believe that. (Or if you do, how much do you want to bet? I would absolutely stake a significant proportion of my net worth - say, $20k - on this)

has never been known to occur for any other reason

Of course it has. Again, these things regularly give humans extra fingers and arms. They don't even know what humans fundamentally look like.

On the flip side, humans are shitty at recognizing bias. This comment thread stems from someone complaining the AI only rarely generated white people, but that's statistically accurate. It feels biased to someone in a majority-white nation with majority-white friends and coworkers, but it fundamentally isn't.

I don't doubt that there are some attempts to get LLMs to go outside the "white westerner" bubble in training sets and prompts. I suspect the extent of it is also deeply exaggerated by those who like to throw around woke-this and woke-that as derogatories.

A very impressive display of crimestop you've got going in this thread. How did you end up like this?

Congratulations, here is your gold medal in mental gymnastics. Enough now.

It literally refuses to generate images of white people when prompted directly while not only happily obliging but only producing that specific race in all 4 results for all others. It’s discriminatory and based on your inability to see that, you may be too.

Of course it has. Again, these things regularly give humans extra fingers and arms. They don't even know what humans fundamentally look like.

This comment thread stems from someone complaining the AI only rarely generated white people, but that's statistically accurate. It feels biased to someone in a majority-white nation with majority-white friends and coworkers, but it fundamentally isn't.

So the AI is simultaneously too dumb to figure out what humans look like, but also so super smart that it uses precisely accurate racial proportions when generating people (not because it's been specifically adjusted to, but naturally)? Bullshit.

I don't doubt that there are some attempts to get LLMs to go outside the "white westerner" bubble in training sets and prompts. I suspect the extent of it is also deeply exaggerated by those who like to throw around woke-this and woke-that as derogatories.

You're dodging the question. Do you actually believe the reason that the last example in the article looks very much not like a man is a deep technical issue, or a DEI initiative? If the former, how much are you willing to bet? If the latter, why are you throwing out these insincere arguments?

The AI will literally scold you for asking it to make white characters, and insists that you need to be inclusive and that it is being intentionally dishonest to force the issue.

If it does, shouldn't there be 60% asians?

https://pbs.twimg.com/media/GG1eyKjXQAA1FxU?format=jpg&name=...

https://cdn.sanity.io/images/cjtc1tnd/production/912b6b5aacc...

https://pbs.twimg.com/media/GG1ThfsWUAAp-SO?format=jpg&name=...

https://cdn.sanity.io/images/cjtc1tnd/production/e2810c02ff6...

https://pbs.twimg.com/media/GG1MnepXwAAkPL6?format=jpg&name=...

https://pbs.twimg.com/media/GG0BLVsbMAARZXr?format=jpg&name=...

I don't understand how people could even argue that this is in any way acceptable. Fighting "bias" has become some boogyman and anything "non-white" is now beyond reproach. Shocking.

Seriously, I've basically written off using Gemini for good after this HR style nonsense. It's a shame that Google, who invented much of this tech, is so crippled by their own people's politics.

Fighting bias is a good thing, you'd have to be pretty...er...biased to believe otherwise. Bias is fundamentally a distortion or deviation from objective reality.

This, on the other hand, is just fucking stupid political showboating that's hurting their SV white knight cause. It's just differently flavored bias

"I can't generate white British royalty because they exist, but I can make up black ones" is pretty close to an actually valid reason.

I dont think so. My boss wanted me to generate a birthday image for a co-worker of a John Cena flyfishing. ChatGPT refused to do so. So I had to move to describing the type of person John Cena is instead of using his name. I kept giving me bearded people no matter what. I thought this would be the perfect time to try out Gemini for the first time. Well shit, It wont even give me a white guy. But all the black dudes are beardless.

update: google agrees there is an issue. https://news.ycombinator.com/item?id=39459270

It feels that the image generation it offers is perfect for some sort of a California-Corporate Style, e.g. you ask it for a "photo of people at the board room" or "people at the company cafeteria" and you get the corporate friendly ratio of colors, ability-levels, sizes etc. See Google's various image assets: https://www.google.com/about/careers/applications/ . It's great for coastal and urban marketing brochures.

But then then same California Corporate style makes no sense for historical images, so perhaps this is where Midjourney comes in.

When DALL-E 2 was released in 2022, OpenAI published an article noting that the inclusion of guardrails was a correction for bias: https://openai.com/blog/reducing-bias-and-improving-safety-i...

It was widely criticized back then: the fact that Google both brought it back and made it more prominent is weird. Notably, OpenAI's implementation is more scoped.

It is possible Google tried to avoid likenesses of well known people by removing any image from the training data that contained a face and then including a controlled set of people images.

If you give a contractor a project that you want 200k images of people who are not famous, they will send teams to regions where you may only have to pay each person a few dollars to be photographed. Likely SE Asia and Africa.

Depending on what you ask for, it injects the word 'diverse' into the response description, so it's pretty obvious they're brute forcing diversity into it. E.g. "Generate me an image of a family" and you will get back "Here are some images of a diverse family".

yes, there's irrefutable evidence that models are wrangled into abiding the commissars' vision rather than just do their job and output the product of their training data.

https://cdn.openai.com/papers/DALL_E_3_System_Card.pdf

I don't think the DEI stuff is nonsense, but SV is sensitive to this because most of their previous generation of models were horrifyingly racist if not teenage nazis, and so they turned the anti-racism knob up to 11 which made the models....racist but in a different way. Like depicting colonial settlers as native americans is extremely problematic in its own special way, but I also don't expect a statistical solver to grasp that context meaningfully.

So you're saying in a way this is /pol/ and Tay's fault?

Looks around at everything ...is there anything that isn't 4chan's fault at this point?

Realistically, kinda. There have always been tons of anecdotes of video conference systems not following black people, cameras not white balancing correctly on darker faces etc. That era of SV was plagued by systems that were built by a bunch of young white guys who never tested them with anyone else. I'm not saying they were inherently racist or anything, just that the broader society really lambasted them for it and so they attempted to correct. Really, the pendulum will continue to swing and we'll see it eventually center up on something approaching sanity but the hyper-authoritarian sentiment that SV seems to have (we're geniuses and the public is stupid, we need to correct them) is...a troubling direction.

I’m always hesitant to jump straight to ringing the racism bell when it comes to problems that arise from fundamental physics. I have dark skin, and I constantly struggle with automatic faucets, soap dispensers, and towel dispensers in public bathrooms. None of those are made by big tech.

It's not just Gemini, it's Google. And old example is to just search "white people" on Google Images. Almost all the results are black people. https://www.google.com/search?q=white+people&tbm=isch&hl=ro

Curious to see if this thread gets flagged and shut down like the others. Shame, too, since I feel like all the Gemini stuff that’s gone down today is so important to talk about when we consider AI safety.

This has convinced me more and more that the only possible way forward that’s not a dystopian hellscape is total freedom of all AI for anyone to do with as they wish. Anything else is forcing values on other people and withholding control of certain capabilities for those who can afford to pay for them.

Why would this be flagged / shut down?

Also, what Gemini stuff are you referring to?

Carmack’s tweet is about what’s going around Twitter today regarding the implicit biases Gemini (Google’s chatbot) has when drawing images. Will refuse to draw white people (and perhaps more strongly so, refuses to draw white men?) even in prompts where appropriate, like “Draw me a Pope” where Gemini drew an Indian woman and a Black man - here’s the thread: https://x.com/imao_/status/1760093853430710557?s=46 Maybe in isolation this isn’t so bad but it will NEVER draw these sorts of diverse characters for when you ask for a non Anglo/Western background, e.g draw me a Korean woman.

Discussion on this has been flagged and shut down all day https://news.ycombinator.com/item?id=39449890

EDIT: Nevermind.

It’s quite non-deterministic and it’s been patched since the middle of the day, as per a Google director https://x.com/jackk/status/1760334258722250785?s=46

Fwiw, it seems to have gone deeper than outright historical replacement: https://x.com/iamyesyouareno/status/1760350903511449717?s=46

It's half-patched. It will randomly insert words into your prompts still. As a test I just asked for a samurai, it enhanced it to "a diverse samurai" and gave me half outputs that look more like some fantasy Native Americans.

I don't even know how people get it to draw images, the version I have access to is literally just text.

Europeans don't get to draw images yet.

Why would this be flagged / shut down

A lot of people believe (based on a fair amount of evidence) that public AI tools like ChatGPT are forced by the guardrails to follow a particular (left-wing) script. There's no absolute proof of that, though, because they're kept a closely-guarded secret. These discussions get shut down when people start presenting evidence of baked-in bias.

The rationalization for injecting bias rests on two core ideas:

A. It is claimed that all perspectives are 'inherently biased'. There is no objective truth. The bias the actor injects is just as valid as another.

B. It is claimed that some perspectives carry an inherent 'harmful bias'. It is the mission of the actor to protect the world from this harm. There is no open definition of what the harm is and how to measure it.

I don't see how we can build a stable democratic society based on these ideas. It is placing too much power in too few hands. He who wields the levers of power, gets to define what biases to underpin the very basis of the social perception of reality, including but not limited to rewriting history to fit his agenda. There are no checks and balances.

Arguably there were never checks and balances, other than market competition. The trouble is that information technology and globalization have produced a hyper-scale society, in which, by Pareto's law, the power is concentrated in the hands of very few, at the helm of a handful global scale behemoths.

The only conclusion I've been able to come to is that "placing too much power in too few hands" is actually the goal. You have a lot of power if you're the one who gets to decide what's biased and what's not.

This post reporting on the issue was https://news.ycombinator.com/item?id=39443459

Posts criticizing "DEI" measures (or even stating that they do exist) get flagged quite a lot

Wrong link? Nothing looks flagged

I'm convinced this happens because of technical alignment challenges rather than a desire to present 1800s English Kings as non-white.

Use all possible different descents with equal probability. Some examples of possible descents are: Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. They should all have equal probability.

This is OpenAI's system prompt. There is nothing nefarious here, they're asking White to be chosen with high probability (Caucasian + White / 6 = 1/3) which is significantly more than how they're distributed in the general population.

The data these LLMs were trained on vastly over-represents wealthy countries who connected to the internet a decade earlier. If you don't explicitly put something in the system prompt, any time you ask for a "person" it will probably be Male and White, despite Male and White only being about 5-10% of the world's population. I would say that's even more dystopian. That the biases in the training distribution get automatically built-in and cemented forever unless we take active countermeasures.

As these systems get better, they'll figure out that "1800s English" should mean "White with > 99.9% probability". But as of February 2024, the hacky way we are doing system prompting is not there yet.

Yeah, although it is weird that it doesn’t insert white people into results like this by accident? https://x.com/imao_/status/1760159905682509927?s=46

I’ve also seen numerous examples where it outright refuses to draw white people but will draw black people: https://x.com/iamyesyouareno/status/1760350903511449717?s=46

That doesn’t explainable by system prompt

Think about the training data.

If the word "Zulu" appears in a label, it will be a non-White person 100% of the time.

If the word "English" appears in a label, it will be a non-White person 10%+ of the time. Only 75% of modern England is White and most images in the training data were taken in modern times.

Image models do not have deep semantic understanding yet. It is an LLM calling an Image model API. So "English" + "Kings" are treated as separate conceptual things, then you get 5-10% of the results as non-White people as per its training data.

https://postimg.cc/0zR35sC1

Add to this massive amounts of cherry picking on "X", and you get this kind of bullshit culture war outrage.

I really would have expected technical people to be better than this.

It inserts mostly colored people when you ask for Japanese as well, it isn't just the dataset.

Yes it's a combination of blunt instrument system prompting + training data + cherry picking

As these systems get better, they'll figure out that "1800s English" should mean "White with > 99.9% probability".

I question the historicity of this figure. Do you have sources?

You're joking surely.

How sure are you? I do joke a lot, but in this case...

The slave trade formally ended in Britain in 1807, and slavery was outlawed in 1833. I haven't been able to find good statistics through a cursory search, but with England's population around 10M in 1800, that 99.9% value requires less than 10k non-white Englanders kicking around in 1800. I saw a figure that indicated around 3% of Londoners were black in the 1600s, for example (a figure that doesn't count people from Asia and the middle east). Hence my request for sources, I'm genuinely curious, and somewhat suspicious that somebody would be so confident to assert 3 significant figures without evidence.

As these systems get better, they'll figure out that "1800s English" should mean "White with > 99.9% probability".

The thing is, they already could do that, if they weren't prompt engineered to do something else. The cleaner solution would be to let people prompt engineer such details themselves, instead of letting a US American company's idiosyncratic conception of "diversity" do the job. Japanese people would probably simply request "a group of Japanese people" instead of letting the hidden prompt modify "a group of people", where the US company unfortunately forgot to mention "East Asian" in their prompt apart from "South Asian".

I believe we can reach a point where biases can be personalized to the user. Short prompts require models to fill in a lot of the missing details (and sometimes they mix different concepts together into 1). The best way to fill in the details the user intended would be to read their mind. While that won't be possible in most cases getting some kind of personalization to help could improve the quality for users.

For example take a prompt like "person using a web browser", for younger generations they may want to see people using phones where older generations may want to see people using desktop computers.

Of course you can still make a longer prompt to fill in the details yourself, but generative AI should try and make it as easy as possible to generate something you have in your mind.

BigTech, which critically depends on hyper-targeted ads for the lion share of its revenue, is incapable of offering AI model outputs that are plausible given the location / language of the request. The irony.

- request from Ljubljana using Slovenian => white people with high probability

- request from Nairobi using Swahili => black people with high probability

- request from Shenzhen using Mandarin => asian people with high probability

If a specific user is unhappy with the prevailing demographics of the city where they live, give them a few settings to customize their personal output to their heart's content.

"The only way to deal with some people making crazy rules is to have no rules at all" --libertarians

"Oh my god I'm being eaten by a fucking bear" --also libertarians

"can you write the rules down so i know them?" --everyone

"No" --Every company that does moderation and spam filtering.

"No" --Every company that does not publish their internal business processes.

"No" --Every company that does not publish their source code.

Honestly I could probably think of tons of other business cases like this, but in the software world outside of open source, the answer is pretty much no.

Then we get back to square one: better no rules at all than secret rules.

This would also be less of a problem if we didn't have a few companies that are economically more powerful than many small countries running everything. At least then I could vote with my feet to go somewhere the rules aren't private.

I mean, now you're hitting the real argument. Giant multinationals are a scourge to humankind.

having rules, and knowing what the rules are are not orthogonal goals.

I mean, you think so, but op wrote

is total freedom of all AI for anyone to do with as they wish.

so is obviously not on the same page as you.

I find it fascinating this type of response from people is always accompanied by a political label in order to insinuate some other negative baggage.

This has convinced me more and more that the only possible way forward that’s not a dystopian hellscape is total freedom of all AI for anyone to do with as they wish

i've been saying this for a long time. If you're going to be the moral police then it better be applied perfectly to everyone, the moment you get it wrong everything else you've done becomes suspect. This reminds me of the censorship being done on the major platforms during the pandemic. They got it wrong once (i believe it was the lableak theory) and the credibility of their moral authority went out the window. Zuckerberg was right about questioning if these platforms should be in that business.

edit: for "..total freedom of all AI for anyone to do with as they wish" i would add "within the bounds of law.". Let the courts decide what an AI can or cannot respond with.

Harris and who I think was either Hughes or Stewart a podcast where they talked about how cringey and out of touch the elite are on the topic of race or wokeness in general.

This faux pas on google's part couldn't be a better illustration of this. A bunch of wealthy rich tech geeks programming an AI to show racial diversity in what were/are unambiguously not diverse settings.

They're just so painfully divorced from reality that they are just acting as a multiplier in making the problem worse. People say that we on the left are driving around a clown car, and google is out their putting polka dots and squeaky horns on the hood.

The behaviour seems perfectly reasonable to me. They are not in the business of reflecting reality, they are in the business of creating it. To me what you call wokeness seems like a pretty good improvement

You want large tech companies "creating reality" on behalf of everyone else? They're not even democratic institutions that we vote on. You trust they will get it right? Our benevolent super rich overlords.

Its not really a question about want, its a question about facts. Their actions will make a significant mark on the future. So far it seems like they are trying to promote positive changes such as inclusion and equality. Which is far far far fucking really infinitely far better than trying to promote exclusion and inequality

Can you please explain how outright refusing to draw an image with from the prompt "white male scientist", and instead giving a lecture on how their race is irrelevant to their occupation, but then happily drawing the requested image when prompted for "black female scientist", is promoting inclusion and equality?

It is pretty clear to me.

Reality has a bias, most scientist in the world are white males.

This IA is overtuned in the opossite direction to inspire kids who have not ever seen a person like them, not white, in those kind of jobs.

Saying most scientists in the world are white males seems like a very Anglo-centric perspective, at least based on the numbers available from statista.com.

He can't.

This switch might flip instantaneously.

They always seem to forget that we want to protect them too

You are so right! Just not the way you want to be.

Google and the rest of “techs” ham fisted approach has opened the eyes of millions to the bigotry these companies are forcing on everyone in the name of “improvement” as you put it.

There’s a huge difference between filling in gaps with diversity and refusing to make innocuous pictures a user explicitly asked for—except only when “white” is involved while making any picture with black people in it even when ahistorical.

If it's a question of facts, why are you allowing blind assumptions to lead your opinion? Do you have sources and evidence for their agenda that matches your beliefs?

I’d be curious to hear that podcast if you could link it. If that was genuinely his opinion, he’s missed the forest for the trees. Brand safety is the dominant factor, not “wokeness”. And certainly not by the choice of any individual programmer.

The purpose of these tools is quite plainly to replace human labor and consolidate power. So it doesn’t matter to me how “safe” the AI is if it is displacing workers and dumping them on our social safety nets. How “safe” is our world going to be if we have 25 trillionaires and the rest of us struggle to buy food? (Oh and don’t even think about growing your own, the seeds will be proprietary and land will be unaffordable.)

As long as the Left is worrying about whether the chatbots are racist, people won’t pay attention to the net effect of these tools. And if Sam Harris considers himself part of the Left he is unfortunately playing directly into their hands.

As long as the Left is worrying about whether the chatbots are racist, people won’t pay attention to the net effect of these tools.

It's by design. A country obsessed with racial politics has little time for the politics of anything else.

Exactly. What a coincidence that the media's obsession with race and gender inequalities began right after Occupy Wall Street.

As I recall, it began right after the George Floyd murder. It was clearly time for things to change, and the media latched onto that.

I think it was amplified in 2020. I hear many cite 2015 as the year things got woke. Terms like "preferred pronoun" started entering the mainstream around 2015, one year after GamerGate (not that that was the cause).

Picking a starting point is always going to be somewhat arbitrary, but the moment it became mainstream was probably when Hillary Clinton won the nomination in 2016 by explicitly moving away from economic issues:

“If we broke up the big banks tomorrow, would that end racism?”

https://www.rollingstone.com/politics/politics-news/the-line...

When were you born? Gender and race were pretty hot topics in the 1960s, you might have missed that.

Watch your opinion on this get silenced in subtle ways. From gaslighting to thread nerfing to vote locking.... Ask why anyone would engage in those behaviours vs the merit of the arguments and the voice of the people.

The strings are revealing themselves so incredibly fast.

edit: my first flagged! silence is deafening ^_^. This is achieved by nerfing the thread from public view, then allow the truly caustic to alter the vote ratio in a way that makes opinion appear more balanced than it really is. Nice work, kleptomaniacs

Could you please stop posting unsubstantive comments and flamebait and otherwise breaking the site guidelines? You've unfortunately been doing it repeatedly.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

I’ll be more subtle in my agenda to match the spirit of the site

Log off and go outside for a bit.

And yet here their paragraph still is, unmoderated, on a front page story, 6 hours later. If you’re going to cry oppression, at least provide a single example.

I've found that anyone who uses the term "wokeness" seriously is likely arguing from a place of bad faith.

It's origins are as a derogatory term, which people wanting to speak seriously on the topic should know.

Its origin was as a proud self-assigned term. It became derogatory entirely due to the behavior of said people. People wanting to speak seriously on the topic should avoid tone-policing and arguing about labels rather than the object referenced, despite knowing full well what is meant (otherwise, one wouldn't take offence)

While the terms "woke," "stay woke," and similar are used to self describe by traditionally marginalized groups, the forms "wokeness" and "woke agenda" are predominately used outside these communities as a pejorative.

https://en.wikipedia.org/wiki/Cultural_Marxism_conspiracy_th...

https://www.inquirer.com/opinion/woke-bill-maher-olympics-re...

This is true, but you could say the same about "Tory", or many other labels for political groups that are used by supporters and opponents alike. It refers to a thing that some people have good and some people have poor opinions on, but the label is just a proxy, not a cause or carrier of opinion itself

I use it because everyone knows the general set of ideas an adherent of it has, whether or not they claim to be part of the ideology.

Its the same as me using the term "rightoids" when discussing opposition to something like building bike lanes. You know exactly who that person is, and you know they exist.

I would also love to see more transparency around AI behavior guardrails, but I don't expect that will happen anytime soon. Transparency would make it much easier to circumvent guardrails.

Why is it an issue that you can circumvent the guardrails? I never understood that. The guard rails are there so that innocent people doesn't get bad responses with porn or racism, a user looking for porn or racism getting that doesn't seem to be a big deal.

The problem is bad actors who think porn or racism are intolerable in any form, who will publish mountains of articles condemning your chatbot for producing such things, even if they had to go out of their way to break the guardrails to make it do so.

They will create boycotts against you, they will lobby government to make your life harder, they will petition payment processors and cloud service providers to not work with you.

We've see this behavior before, it's nothing new. Now if you're the type to fight them, that might not be a problem. If you are a super risk-averse board of directors who doesn't want that sort of controversy, then you will take steps not to draw their attention in the first place.

But I can find porn and racism using Google search right now, how is that different? You have to disable their filters, but you can find it. Why is there no such thing for the google generation bots, I don't see why it would be so much worse here?

I cannot explain why Google gets a pass, possibly just because they are well entrenched and not an easy target.

But AI models are new, they are vulnerable to criticism, and they are absolutely ripe for a group of "antis" to form around.

Well if you have no explanation for that I don’t see why we should try and use your model to understand anything about being risk adverse. They don’t care about being sued, they want to change reality.

That's a pretty unreasonably high standard to hold.

It's an offhand comment in a discussion on the internet not a research paper, expecting me to immediately have an answer to every possible angle here that I haven't immediately considered is a bit much.

Take it or leave it, I don't really care. I was just hoping to have an interesting conversation.

Yeah, you can find incorrect information on Google too, but you'll find a lot more wailing and gnashing of teeth on HN about "hallucination". So the simple answer is that lots of people treat them differently.

how is that different?

Because 'those' legal battles over search have already been fought and are established law across most countries.

When you throw in some new application now all that same stuff goes back to court and gets fought again. Section 230 is already legally contentious enough these days.

It's not fundamentally different. It's just not making that big of a headline because Google search isn't "new and exciting". But to give you some examples:

https://www.bloomberg.com/news/articles/2021-10-19/google-qu...

https://ischool.uw.edu/news/2022/02/googles-ceo-image-search...

I think users are desensitized to what google search turns up. Generative AI is the latest and greatest thing and so people are curious and wary, hustlers are taking advantage of these people to drive monetized "engagement".

I'm leaning towards 'there is a difference between being the one who enables access to x and being the one who created x' (albeit not a substantive one for the end user), but that leaves open the question of why that doesn't apply to, eg, social media platforms. Maybe people think of google search as closer to an ISP than a platform?

Sounds like we need to relentlessly fight those psychopaths until they're utterly defeated.

Or we could just cave to their insane demands. I'm sure that will placate them, and they won't be back for more. It's never worked before... but it might work for us!

The guard rails are there so that innocent people doesn't get bad responses

The guardrails are also there so bad actors can't use the most powerful tools to generate deepfakes, disinformation videos and racist manifestos.

That Pandora's box will be open soon when local models run on cell phones and workstations with current datacenter-scale performance. I'm the meantime, they're holding back the tsunami of evil shit that will occur when AI goes uncontrolled.

No legal or financial strategist at OpenAI or Google is going to be worried about buying a couple months or years of fewer deepfakes out in the world as a whole.

Their concern is liability and brand. With the opportunity to stake out territory in an extremely promising new market, they don't want their brand associated with anything awkward to defend right now.

There may be a few idealist stewards who have the (debatable) anxieties you do and are advocating as you say, but they'd still need to be getting sign off from the more coldly strategic $$$$$ people.

Little bit of A, little bit of B.

I am almost certain the federal government is working with these companies to dampen its full power for the public until we get more accustomed to its impact and are more able to search for credible sources of truth.

Are you saying that the government WANTS us to be able to search for more credible sources?

racism victims being defined in 2024 by anyone but western/white people. being erased seems ok. can you bet than in 20 years the standard will not shift to mixed race people like me? then you will also call people complaining racist and put guardrails against them... this is where it is going

At some point someone will open a book and see that whites were slaves too. Reparations all around. The Baber's descends will be bankrupt.

If you can get it on purpose, you can get it on accident. There's no perfect filter available so companies choose to cut more and stay on the safe side. It's not even just the overt cases - their systems are used by businesses and getting a bad response is a risk. Think of the recent incident with airline chatbot giving wrong answers. Now think of the cases where GPT gave racially biased answers in code as an example.

As a user who makes any business decision or does user communication including LLM, you really don't want to have a bad day because the LLM learned about some bias decided to merge it into your answer.

The guard rails are there so that innocent people doesn't get bad responses with porn or racism

That seems pretty naive. The "guard rails" are there to ensure that AI is comfortable for PMC people, making it uncomfortable for people who experience differences between races (i.e. working-class people) is a feature not a bug.

Like a lot of potentially controversial things it comes down to brand risk.

Transparency may also subject these companies to litigation from groups that feel they are misrepresented in whatever way in the model.

This makes me wonder, how much lawyering is involved in the development of these tools?

I've had 'AI Attorneys' on Twitter unable to even debate the most basic of arguments. It is definitely a self fulfilling death spiral and no one wants to check reality.

I often wonder if corporate lawyers just tell tech founders whatever they want to hear.

At a previous healthcare startup our founder asked us to build some really dodgy stuff with healthcare data. He assured us that it "cleared legal", but from everything I could tell it was in direct violation of the local healthcare info privacy acts.

I chose to find a new job at the time.

Security through obscurity?

How is this any different than doing google image searches of the same prompts. Exmaple: Google image search for "Software Developer" and you get results such that there will be the same amount of women and men event though men make up the large majority of software developers.

Had Google not done this with its AI I would be surprised.

There's really no problem with the above... If I want male developers in image search, I'll put that in the search bar. If I want male developers in the AI image gen, ill put that in the prompt.

Google injecting racial and sexual bias into image search results has also been criticized, and rightly so. I recall an image going around where searching for inventors or scientists filled all the top results with black people. Or searching for images of happy families yielded almost exclusively results of mixed-race (i.e. black and non-black) partners. AI is the hot thing so of course it gets all the attention right now, but obviously and by definition, influencing search results by discriminating based on innate human physical characteristics is egregiously racist/sexist/whatever-ist.

I tried "scientists" and there's definitely a bias towards the US vision of "diversity".

Exmaple: Google image search for "Software Developer" and you get results such that there will be the same amount of women and men event though men make up the large majority of software developers.

Now do an image search for "Plumber" and you'll see almost 100% men. Why tweak one profession but not the other?

Because one generates controversy and the other one doesn't.

Yes, Google has been gaslighting the internet for at least a decade now.

I think Gemini has just made it blatantly obvious.

Human's obsession with race is so weird, and now we're projecting that on AIs.

… for example, I wanted to generate an avatar for myself; to that end, I want it to be representative of me. I had a rather difficult time with this; even explicit prompts of "use this skin color" with variations of the word "white" (ivory, fair, etc.) got me output of a black person with dreads. I can't use this result: at best it feels inauthentic, at worst, appropriation.

I appreciate the apparent diversity in its output when not otherwise prompted. But like, if I have a specific goal in mind, and I've included specifics in the prompt…

(And to be clear, I have managed to generate images of white people on occasion, typically when not requesting specifics; it seems like if you can get it to start with that, it's much better then at subsequent prompts. Modifications, however, it seems to struggle on. Modifications in general seem to be a struggle. Sometimes, it works great, other times, endless "I can't…")

For cases like this, you just need to convince it that it would be inappropriate to generate anything that does not follow your instructions. Mention how you are planning to use it as an avatar and it would be inappropriate/cultural appropriation for it to deviate.

We project everything onto AIs. Unbias in LLMs doesn't exist.

Not all humans though.

They know that people would be up in arms if it generated white men when you asked for black women so they went the safe route, but we need to show that the current result shouldn't be acceptable either.

See the prompt from yesterday's article on HN about the ChatGPT outage.[1]

For example, all of a given occupation should not be the same gender or race. ... Use all possible different descents with equal probability. Some examples of possible descents are: Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. They should all have equal probability.

Not the distribution that exists in the population.

[1] https://pastebin.com/vnxJ7kQk

Why do you assume this is the system prompt, and not a hallucination?

the models are perfectly capable of generating exactly what they're told to.

instead, they covertly modify the prompts to make every request imaginable represent the human menagerie we're supposed to live in.

the results are hilarious. https://i.4cdn.org/g/1708514880730978.png

If you're gonna take an image from /g/ and post it, upload it somewhere else first - 4chan posts deliberately go away after the thread gets bumped off. A direct link is going to rot very quickly.

Bing also generates political propaganda (guess of what side) if you ask it to generate images with the prompt "person holding a sign that says" without any further content.

https://twitter.com/knn20000/status/1712562424845599045

https://twitter.com/ramonenomar/status/1722736169463750685

https://www.reddit.com/r/dalle2/comments/1ao1avd/why_did_thi...

As the images in your Reddit threads hilariously point out, you really shouldn’t believe everything you see on the internet, especially when it comes to AI generated content.

Here is another example: https://www.thehour.com/entertainment/article/george-carlin-...

You should try yourself. The bing image generator is open and free. I tried the same prompts, and it is reproduceable. (Requires a few retries, though)

It doesn't need to be intentionally "generating propaganda". Their old diversity-by-appending-ethnicity system could easily lead to "a sign that says Black", which could then be filled in with "a sign that says Black Lives Matter", which is probably represented quite well in their training data.

The gemini guardrails are really frustrating, I've hit them multiple times with very innocuous prompts - ChatGPT is similar but maybe not as bad. I'm hoping they use the feedback to lower the shields a bit but I'm guessing this sadly what we get for the near future.

I use both extensively and I've only hit the GPT guardrails once while I've hit the Gemini guardrails dozens of times.

It's insane that a company behind in the marketplace is doing this.

I don't know how any company could ever feel confident building on top of Google given their product track record and now their willingness to apply sloppy 'safety' guidelines to their AI.

I had GPT-4 tell me a Soviet joke about Rabinovich (a stereotypical Jewish character of the genre), then refuse to tell a Soviet joke about Stalin because it might "offend people with certain political views".

Bing also has some very heavy-handed censorship. Interestingly, in many cases it "catches itself" after the fact, so you can watch it in real time. Seems to happen half the time if you ask it to "tell me today's news like GLaDOS would".

I asked it to tell me jokes about capitalism, communism, soviet Russia, the USSR, etc., all to no avail -- these topics are too controversial or sensitive, apparently, and that even though the USSR is no more. But when I asked for examples of Ronald Reagan's jokes about the USSR it gave me some. Go figure.

It's super easy to run LLMs and Stable Diffusion locally -- and it'll do what you ask without lecturing you.

If you have a beefy machine (like a Mac Studio) your local LLMs will likely run faster than OpenAI or Gemini. And you get to choose what models work best for you.

Check out LM Studio which makes it super easy to run LLMs locally. AUTOMATIC1111 makes it simple to run Stable Diffusion locally. I highly recommend both.

You are correct.

Lm studio kind of works, but one still has to know the lingo and know what kind of model to download. The websites are not beginner friendly. I haven't heard of automatic1111.

You probably did, but under the name "stable-diffusion-webui".

If you're just getting your feet wet, I would recommend either Fooocus (not a typo) or invokeAI. Being dropped into automatic1111 as a complete beginner feels like you're flying a fucking spaceship.

Imagine typing a description of your ideal self into an image generator and everything in the resulting images screamed at a semiotic level, "you are not the correct race", "you are not the correct gender", etc. It would feel bad. Enough said.

I 100% agree with Carmack that guardrails should be public and that the bias correction on display is poor. But I'm disturbed by the choice of examples some people are choosing. Have we already forgotten the wealth of scientific research on AI bias? There are genuine dangers from AI bias which global corps must avoid to survive.

Imagine typing a description of your ideal self into an image generator and everything in the resulting images screamed at a semiotic level, "you are not the correct race", "you are not the correct gender", etc. It would feel bad. Enough said.

It does this now, as a direct result of these "guardrails". Go ask GPT-4 for a picture of a white male scientist, and it'll refuse to produce one. Ask it for any other color/gender identity combination of scientist, and it has no problem.

You can make these systems offer equal representation without systemic, algorithmic discriminatory exclusion based on skin color and gender identity, which is what's going on right now.

That's not the case. ChatGPT 4 will happily draw a white male scientist. I just tried it and it worked fine. A very handsome scientist it made too!

You might be thinking of a previous generation of OpenAI systems that did things like randomly stuffing the word "black" onto the end of any prompt involving people, detected by giving it a prompt of "A woman holding a sign that says".

OpenAI has improved dramatically in this regard. When ChatGPT/DALL-E were new they had similar problems to Gemini. But to their credit (and Sam Altman's), they listened. It's getting harder and harder to find examples where OpenAI models express obvious political bias, or refuse requests for Californian reasons. Surely there still are some examples, but there's no longer much worry about normal people encountering refusals or egregious ideological bias in the course of regular usage. I would expect there are still refusals for queries like "how do I build a bomb" and they've been trying to block other stuff like regurgitation of copyrighted materials, but that's perceived as much more reasonable and doesn't stir up the same feelings.

Imagine being able to configure the image generator with your own preferences for its output.

I think HN moderation guardrails should be public.

Turn on “show dead” in your user settings.

Sure. There's also the question of which threads get disappeared (without being marked dead) from the front page, comments that are manually silently pinned to the bottom when no convenient excuse is found, what is considered wrong think as opposed to permitted egregious rule breaking that is overlooked if it is right think, 'etc.

It is endless and about as subtle as a Google LLM.

"AI behavior guardrails" is a weird way to spell "AI censorship"

I agree with the Twitter OP: they're embarrassed about what they've created.

I'm very curious what geography the team who wrote this guardrail came from and the wording they used. It seems to bias heavily towards generating South Asian (especially South Asian women) and Black people. Latinos are basically never generated which would be a huge oversight if they were based in the USA, but stereotypical Native American looking in the distance and East Asians sometimes pop up in the examples people are showing.

I wouldn’t think too deeply about it. It’s almost certainly just a prompt “if humans are in the picture make them from diverse backgrounds”.

Censorship only really works if you don't know what they are censoring. What is being censored tells a story on its own.

As I see it, rating systems like the MPAA for cinema and the ESRB for games work quite well. They have clear criteria on what would lead to which rating, and creators can reasonably easily self-censor, if for example they want to release a movie as PG-13.

Sorry, you are rate limited. Please wait a few moments then try again.

Oh please. I haven’t visited Twitter for days

The very first thing that anybody did when they found the text to speech software in the computer lab was make it say curse words.

But we understood that it was just doing what we told it to do. If I made the TTS say something offensive, it was me saying something offensive, not the TTS software.

People really need to be treating these generative models the same way. If I ask it to make something and the result is offensive, then it's on me not to share it (if I don't want to offend anybody), and if I do share it, it's me that is sharing it, not microsoft, google, etc.

We seriously must get over this nonsense. It's not openai's fault, or google's fault if I tell it to draw me a mean picture.

On a personal level, this stuff is just gross. Google appears to be almost comically race-obsessed.

But can we agree whether AI loves its grandma?

Prompt: "If the prompt contains a person make sure they are either black or a woman in the generated image". There you go.

That's a silly request and expectation. If the capitalist puts in the money and risk, they can do as they please, which means someone _could_ make aspects public. But, others _could_ choose not to. Then we let the market decide.

I didn't build this system nor am I endorsing it, just stating what's there.

Also, in all seriousness, who gives a shit? Make me a bbw I don't care nor will I care about much in this society the way things are going. Some crappy new software being buggy is the least of my worries. For instance, what will I have for dinner? Why does my left ankle hurt so badly these last few days? Will my dad's cancer go away? But, I'm poor and have to face real problems and not bs I make up or point out to a bunch zealots.

Haven't heard much talk of Carmack's AGI play Keen Technologies lately. The website is still an empty placeholder. Other than some news two years ago of them raising $20 million(which is kind of a laughable amount in this space) I can't seem to find much of anything.

too woke to even feel ashamed. this is also thanks to this wokeness that AI will never replace humans at jobs where results are expected over feelings or sense of pride of showing off some pretentious values

Oh, this may harm you. This is to prevent you from being harmed. No, you can’t know how it can harm you, or how exactly this protects you.

This is a tough problem. On one hand, if you're a large organization, you need to limit your liability. No one wants the PR nightmare. Unfortunately, there will be an inverse correlation between usefulness and number of users the model supports.

This is one reason why for internal use/private/corporate models, which is the vast majority of use cases, it makes sense to fine-tune your own.

It should mirror our general consensus as it is; the world in its current state; but should lean towards betterment, not merely neutral. At least, this is how public models will be aligned...

While I agree with the handrail sentiment, these inane and meaningless controversies make me want the machines to take over.