return to table of content

Machine Unlearning in 2024

negative_person
25 replies
1d4h

Why should we try to unlearn "bad" behaviours from AI?

There is no AGI without violence, its part of being free thinking and self survival.

But also by knowing that launching a first strike by a drunk president was a bad idea we averted a war because of a few people, AI needs to understand consequences.

It seems futile to try and hide "bad" from AI.

williamtrask
8 replies
1d4h

Because we can get AI related technologies to do things living creatures can’t, like provably forget things. And when it benefits us, we should.

Personal opinion, but I think AGI is a good heuristic to build against but in the end we’ll pivot away. Sort of like how birds were a good heuristic for human flight, but modern planes don’t flap their wings and greatly exceed bird capabilities in many ways.

Attribution for every prediction and deletion seem like prime examples of things which would break the analogy of AI/AGI with something more economically and politically compelling/competitive.

negative_person
7 replies
1d4h

Can you point to any behaviour in human beings you'd unlearn if theyd also forget the consequences?

We spend billions trying to predict human behaviour and yet we are surprised everyday, "AGI" will be no simpler. We just have to hope the dataset was aligned so the consequences are understood, and find a way to contain models that don't.

aeonik
2 replies
1d3h

The feeling of extreme euphoria and its connection to highly addictive drugs like Heroin might be a use case. Though I'm not sure how well something like that would work in practice.

everforward
1 replies
1d3h

Is that possible to do without also forgetting why it’s dangerous? That seems like it would fuel a pattern of addiction where the person gets addicted, forgets why, then gets addicted again because we wiped their knowledge of the consequences the first time around.

Then again, I suppose if the addiction was in response to a particular stimulus (death of a family member, getting fired, etc) and that stimulus doesn’t happen again, maybe it would make a difference?

It does have a tinge of “those who don’t recall the past are doomed to repeat it”.

aeonik
0 replies
1d

After a certain point I think someone can learn enough information to derive almost everything from first principles. But I think it might work temporarily.

There's a movie about this idea called "Eternal Sunshine of a Spotless Mind".

I find it hard I believe that you can surgically censor one chunk of information, and cut off the rest of the information. Especially if it's general physical principles.

I also don't have a nice topological map of how all the world's information is connected to the moment, so I can't back up by opinions.

Though I'm still rooting for the RDF/OWL and Semantic Web folks, they might figure it out.

williamtrask
0 replies
1d2h

You seem to be focusing a lot on remembering or forgetting consequences. Yes, ensuring models know enough about the world to only cause the consequences they desire is a good way for models to not create random harm. This is probably a good thing.

However, there are many other reasons why you might want a neural network to provably forget something. The main reason has to do with structuring an AGI's power. Even though the simple-story of AGI is something like "make it super powerful, general, and value aligned and humanity will prosper". However, the reality is more nuanced. Sometimes you want a model to be selectively not powerful as a part of managing value mis-alignment in practice.

To pick a trivial example, you might want a model to enter your password in some app one time, but not remember the password long term. You might want it to use and then provably forget your password so that it can't use your password in the future without your consent.

This isn't something that's reliably doable with humans. If you give them your password, they have it — you can't get it back. This is the point at which we'll have the option to pursue the imitation of living creatures blindly, or choose to turn away from a blind adherence to the AI/AGI story. Just like we reached the point at which we decided whether flying planes should have flapping wings dogmatically — or whether we should pursue the more economically and politically competitive thing. Planes don't flap their wings, and AI/AGI will be able to provably forget things. And that's actually the better path.

A recent work co-authors and I published related to this: https://arxiv.org/pdf/2012.08347

beeboobaa3
0 replies
1d1h

Seeing dad have sex with mom.

Brian_K_White
0 replies
1d3h

It sounds like the only answer for AI is the same as the only answer for humans.

Wisdom. Arriving at actions and reactions based on better understanding of the interconnectedness and interdependency of everything and everyone. (knowing more not less, and not selective or bowdlerized)

And most humans don't even have it. Most humans are not interested and don't believe and certainly don't act as though "What's good for you is what's good for me, what harms you harms me." Every day a tech podcaster or youtuber says this or that privacy loss or security risk "doesn't affect you or me", they all affect you and me, when a government or company gives themselves and then abuses power over a single person anywhere, that is a hit to you and me even though we aren't that person, because that person is somebody, and you and I are somebody.

Most humans ridicule anyone that talks like that and don't let them near any levers of power at any scale. They might be ok with it in inconsequential conversational contexts like a dinner party or this or this forum, but not in any decision-making context. Anyone talking like that is an idiot and disconnected from reality, they might drive the bus off the bridge because the peace fairies told them to.

If an AI were better than most humans and had wisdom, and gave answers that conflicted with selfishness, most humans would just decide they don't like the answers and instructions coming from the AI and just destroy it, or at least ignore it, pretty much as they do today with humans who say things they don't like.

Perhaps one difference is an AI could actually be both wise and well-intentioned rather than a charlatan harnessing the power of a mass of gullables, and it could live longer than a human and it's results could become proven-out over time. Some humans do get recognized eventually, but by then it doesn't do the rest of us any good because they can no longer be a leader as they're too old or dead. Then again maybe that's required actually. Maybe the AI can't prove itself because you can never say of the AI, "What does he get out of it by now? He lived his entire life saying the same thing, if he was just trying to scam everyone for money or power or something, what good would it even do him now? He must have been sincere the whole time."

But probably even the actual good AI won't do much good, again for the same reason as with actually good humans, it's just not what most people want. Whatever individuals say about what their values are, by the numbers only the selfish organisations win. Even when a selfish organization goes too far and destroys itself, everyone else still keeps doing the same thing.

AvAn12
0 replies
1d

A few things to exclude from training might include: - articles with mistakes such as incorrect product names, facts, dates, references - fraudulent and non-repeatable research findings - see John Ioannidis among others - outdated and incorrect scientific concepts like phlogiston and LaMarckian evolution - junk content such as 4-chan comments section content - flat earther "science" and other such nonsense - debatable stuff like: do we want material that attributes human behavior to astrological signs or not? And when should a response make reference to such? - prank stuff like script kiddies prompting 2+2=5 until an AI system "remembers" this - intentional poisoning of a training set with disinformation - suicidal and homicidal suggestions and ideation - etc.

Even if we go with the notion that AGI is coming, there is no reason its training should include the worst in us.

doubloon
3 replies
1d3h

AGI would not beGI unless it could change its mind after realizing its wrong about something

542458
2 replies
1d3h

I disagree. People with anterograde amnesia still possess general intelligence.

saintfire
1 replies
1d3h

I don't know I ton about amnesia, but I would think the facilities for changing their mind are still there.

E.g. ordering food, they might immediately change their mind after choosing something and correct their order.

I recognize they cannot form new memories but from what I understand they still would have a working memory, otherwise you'd be virtually unable to think and speak.

542458
0 replies
16h46m

LLMs will change their minds today. Most major ones can change their minds on subsequent generations within the same context (“I’m sorry, my previous answer was incorrect,..”), and the biggest ones can change their mind mid-answer (mostly observed with GPT4).

affgrff2
1 replies
1d3h

Maybe it all boils down to copyright. Having a method that believably removes the capacity to generate copyrighted results might give you some advantage with respect to some legislation.

wongarsu
0 replies
1d2h

Also if you build some sort of search engine using an LLM governments will expect you to be able to remove websites or knowledge of certain websites for legal reasons (DMCA, right to be forgotten, etc).

wruza
0 replies
23h47m

There is no AGI without violence, its part of being free thinking and self survival.

Self survival idea is a part of natural selection, AGI doesn't have to have it. Maybe the problem is we are the only template to build AGI from, but that's not inherent to "I" in any way. Otoh, lack of self preservation can make animals even more ferocious. Also there's a reason they often leave a retreat path in warzones.

Long story short it's not that straightforward, so I sort of agree cause it's an uncharted defaults-lacking territory we'll have to explore. "Unlearn bad" is as naive as not telling your kids about sex and drugs.

szundi
0 replies
1d3h

Thanks but no violent AGIs thanks

surfingdino
0 replies
23h21m

AI has no concept of children, family, or nation. It doesn't have parental love or offspring protection instinct. Faced with danger to its children it cannot choose between fighting or sacrificing itself in order to protect others. What it is good at is capturing value through destruction of value generated by existing business models; it does it by perpetrating mass theft of other people's IP.

sk11001
0 replies
1d3h

The point is to build things that are useful, not to attempt to replicate science fiction literature.

numpad0
0 replies
1d2h

They are just trying to find a way to plausibly declare successful removal of copyrighted and/or illegal material without discarding weights.

GPT-4 class models reportedly costs $10-100m to train, and that's too much to throw away for Harry Potter or Russian child porn scrapes that could later reproduce verbatim despite representing <0.1ppb or whatever minuscule part of dataset.

imtringued
0 replies
1d3h

You seem to be ignoring the potential to use this to improve the performance of LLMs. If you can unlearn wrong answers you can ask the model using any scoring mechanism to check for correctness instead of scoring for token for token similarity to the prescribed answer.

andy99
0 replies
1d4h

This is presumably about a chatbot though, not AGI, so it's basically a way of limiting what they say. (Not a way that I expect to succeed)

Jaygles
0 replies
16h45m

Because corporations won't buy the fancy chat bot if there's a chance it will occasionally use slurs in it's interactions with their customers.

Cheer2171
0 replies
1d4h

So you have a problem with supervised learning like spam classifiers?

542458
0 replies
1d3h

There is no AGI without violence, its part of being free thinking and self survival.

I disagree. Are committed pacifists not in possession of general intelligence?

avi_vallarapu
24 replies
1d2h

We need to consider the practicality of unlearning methods in real-world applications and the legal acceptance of the same.

Given current technology and what advancements are needed to make Unlearning more possible, probably there should be a time-to-unlearn kind of an acceptable agreement that allows organizations to retrain or tune the response that does not involve any response from the to-be-unlearned copyright content.

Ultimately, legal acceptance for unlearning may be all about deleting the data set that is part of any kind of violations from the training data set. It may be very challenging to otherwise prove legally through the proposed unlearning techniques, that the model does not produce any type of response involving the private data.

The actual data set contains the private data violating privacy or copyright, and the model is trained on it, period. This means, it must involve retraining by deleting the documents/data to be unlearned.

beeboobaa3
12 replies
1d2h

How to deal with "unlearning" is the problem of the org running the illegal models. If I have submitted a gdpr deletion request you better honor it. If it turns out you stole copyrighted content you should get punished for that. No one cares how much it might cost you to retrain your models. You put yourself in that situation to begin with.

visarga
10 replies
1d1h

No one cares how much it might cost you to retrain your models.

Playing tough? But it's misguided. "No one cares how much it might cost you to fix the damn internet"

If you wanted to retro-fix facts, even if that could be achieved on a trained model, it would still get back by way of RAG or web search. But we don't ask pure LLMs for facts and news unless we are stupid.

If someone wanted to pirate a content it would be easier to use Google search or torrents than generative AI. It would be faster, cheaper and higher quality. AIs move slow, are expensive, rate limited and lossy. AI providers have in-built checks to prevent copyright infringement.

If someone wanted to build something dangerous, it would be easier to hire a specialist than to chatGPT their way into it. All LLMs know is also on Google Search. Achieve security by cleaning the internet first.

The answer to all AI data issues - PII, Copyright, Dangerous Information - is coming back to the issue of Google search offering links to it, and websites hosting this information online. You can't fix AI without fixing the internet.

beeboobaa3
9 replies
1d1h

What do you mean playing tough? These are existing laws that should be enforced. The amount of people's lives ruined by the American government because they were deemed copyright infringers is insane. The us has made it clear that copyright infringement is unacceptable.

We now have a new class of criminals infringing on copyright on a grand scale via their models and they seem desperate to avoid persecution hence all this bullshit.

cscurmudgeon
8 replies
23h13m

1. You are assuming just training a model on copyrighted material is a violation. It is not. It may be under certain conditions but not by default.

2. Why should we aim for harsh punitive punishments just because it was done so in the past?

beeboobaa3
7 replies
22h43m

1. You are assuming just training a model on copyrighted material is a violation. It is not. It may be under certain conditions but not by default.

Using copyrighted content for commercial purposes should be a violation if it's not already considered to be one. No different from playing copyrighted songs in your restaurant without paying a licensing fee.

2. Why should we aim for harsh punitive punishments just because it was done so in the past?

I'd be fine with abolishing, or overhauling, the copyright system. This rules with harsh penalties for consumers/small companies but not for bigtech double standard is bullshit, though.

ekianjo
6 replies
13h21m

Using copyrighted content for commercial purposes should be a violation

so reading a book and using the book contents to help you in your job would be a violation too based on your logic

beeboobaa3
5 replies
12h43m

A business cannot read a book, and your machine learning model is not given human rights.

Dylan16807
4 replies
10h55m

A business cannot read a book

Assume the human read the book as part of their job. Is that using copyrighted material for commercial purposes?

If that doesn't count then I'm not sure why you brought up "commercial purposes" at all.

This rules with harsh penalties for consumers/small companies but not for bigtech double standard is bullshit, though.

Consumers and small companies get away with small copyright violations all the time. And still bigger than having your image be one of millions in a training set.

beeboobaa3
3 replies
5h6m

Assume the human

Humans have rights. They get to do things that businesses, and machine learning models, or general automation, don't.

Just like you can sit in a library and tell people the contents of books when they ask, but if you go ahead and upload everything you get bullied into suicide by the US government[1]

Consumers and small companies get away with small copyright violations all the time

Yeah, because people don't notice so they don't care. Everyone knows what these bigtech criminals are doing.

[1] https://en.wikipedia.org/wiki/Aaron_Swartz

Dylan16807
2 replies
2h1m

Humans have rights. They get to do things that businesses, and machine learning models, or general automation, don't.

So is that a yes to my question?

If humans are allowed to do it for commercial purposes, and it's entirely about human versus machine, then why did you say "Using copyrighted content for commercial purposes should be a violation" in the first place?

Just like you can sit in a library and tell people the contents of books when they ask,

You know there a huge difference between describing a book and uploading the entire contents verbatim, right?

If "tell the contents" means reading the book out loud, that becomes illegal as soon as enough people are listening to make it a public performance.

but if you go ahead and upload everything you get bullied into suicide by the US government[1]

They did that to a human... So I've totally lost track of what your point is now.

beeboobaa3
1 replies
1h17m

and it's entirely about human versus machine

It's not. Those were what's called examples. There is of course more to it. Stop trying to pigeonhole a complex discussion onto a few talking points. There are many reasons why what OpenAI did is bad, and I gave you a few examples.

Dylan16807
0 replies
1h6m

I'm not trying to be reductive or nitpick your example, I was trying to understand your original statement and I still don't understand it.

There's a reason I keep asking a very generic "why did you bring it up", it's because I'm not trying to pigeonhole.

But if it's not worth explaining at this point and the conversation should be over, that's okay.

avi_vallarapu
0 replies
1d1h

Exactly, I think is where it leads to eventually. And that is what I my original comment meant as well. "Delete it" rather than using some more techniques to "unlearn it", unless you claim the unlearning is 100% accurate.

isodev
9 replies
1d2h

a time-to-unlearn kind of an acceptable agreement

Why put the burden to end users? I think the technology should allow for unlearning and even "never learn about me in any future models and derivative models".

avi_vallarapu
5 replies
1d1h

No technology can guarantee 100% unlearning, and the only 100% guarantee is when the data is deleted before the model is retrained. Legally, even 99.99% accuracy may not be acceptable, but, only 100%.

eru
3 replies
16h27m

Or rather some legal fiction that you can pretend is 100%. You can never achieve real 100% in practice after all. Eg the random initialisation of weights might already encode all the 'bad' stuff you don't want. Extremely unlikely, but not strictly 0% unlikely.

The law cuts off at some point, and declares it 100%.

isodev
2 replies
11h6m

All this is technically correct, but it also means this technology is absolutely not ready to be used for anything remotely involving humans or end user data.

eru
1 replies
10h43m

Why? We use random data in lots of applications, and there's always the theoretical probability that it could 'spell something naughty'.

isodev
0 replies
4h54m

It's about models' ability to unlearn information or to configure their training environment so that something is never learned in the first place... is not exactly the same as "oups, we logged your IP in a log by accident".

A company is liable even if they have accidentally retained / failed to delete personal information. That's why we have a lot of standards and compliance regulation to ensure a bare minimum of practices and checks are performed. There is also the cyber resilience act coming up.

If your tool is used by/for humans, you need beyond 100% certitude exactly what happens with their data and how it can be deleted and updated.

mr_toad
0 replies
13h54m

the only 100% guarantee is when the data is deleted before the model is retrained

That’s not even a guarantee. A model can hallucinate information about anyone, and by sheer luck some of those hallucinations will be correct. And as a consequence of forging (see section 2.2.1) you’d never be able to prove whether the data was in the training set or not.

Vampiero
2 replies
1d

The technology is on par with a Markov chain that's grown a little too much. It has no notion of "you", not in the conventional sense at least. Putting the infrastructure in place to allow people (and things) to be blacklisted from training is all you can really do, and even then it's a massive effort. The current models are not trained in such a way that you can do this without starting over from scratch.

xg15
0 replies
20h13m

Well then, maybe we shouldn't use the technology.

Retric
0 replies
21h4m

That’s hardly accurate. Deep learning among other things is another type of lossy compression algorithm.

It doesn’t have a 1:1 mapping of each bit of information it’s been trained with, but you can very much extract a subset of that data. Which is why it’s easy to get DallE to recreate the Mona Lisa, variations on that image show up repeatedly in its training courpus.

friendzis
0 replies
4h16m

We need to consider the practicality of unlearning methods in real-world applications and the legal acceptance of the same. > probably there should be a time-to-unlearn kind of an acceptable agreement

A very important distinction is between data storage and data use/dissemination. Your comment hints at "use current model until retrained is available and validated", which is an extremely dangerous idea.

Remember old times of music albums distributed over physical media. Suppose a publisher creates a mix, stocks shelves with album and it becomes known that one of the tracks is not properly licensed. It would be expected that it takes some time to execute distribution shutdown: distribute order, clean up shelves, etc. However, time for another production run with a modified tracklist would be entirely the problem of the publisher in question.

The window for time-to-unlearn should only depend on practicality of stopping information dissemination, not getting updated source ready. Otherwise companies will simply wait for model to be retrained on a single 1080 and call it a day, which would effectively nullify the law.

dataflow
8 replies
1d4h

However, RTBF wasn’t really proposed with machine learning in mind. In 2014, policymakers wouldn’t have predicted that deep learning will be a giant hodgepodge of data & compute

Eh? Weren't deep learning and big data already things in 2014? Pretty sure everyone understood ML models would have a tough time and they still wanted RTBF.

spennant
2 replies
1d3h

Agreed. The media and advertising industry was most definitely leveraging cookie-level data for building attribution and targeting models. As soon as the EU established that this data was “personal data”, as it could, theoretically, be tied back to individual citizens, there were questions about the models. Namely “Would they have to be rebuilt after every RTBF request?” Needless to say, no one in the industry really wanted to address the question, as the wrong answer would essentially shut down a very profitable practice.

Aerroon
1 replies
1d3h

More likely: the wrong answer would've shut out a profitable market rather than the practice. The EU is not the world. Anthropic seems to not mind blocking the EU for example.

spennant
0 replies
1d

Sure. But two things:

1) At the time, the European data laws implied that it protected its citizens no matter where they are. Nobody wanted to be the first to test that in court.

2) The organizations and agencies performing this type of data modeling were often doing so on behalf of large multinational organizations with absurd advertising spends, so they were dealing with Other People’s Data. The responsibility of scrubbing it clean of EU citizen data was unclear.

What this meant was that an EU tourist who traveled to the US, and got served a targeted ad, could make a RTBF request to the advertiser (think Coca-Cola, Nestle or Unilever)

The whole thing was a mess.

startupsfail
0 replies
1d2h

RTBF was introduced to solve a specific issue, no?

Politicians and their lobbyist friends could no longer remove materials linking them to their misdeeds as the first Google Search link associated with their names. Hence RTBF.

Now, there’s similar issue with AI. Models are progressing towards being factual, useful and reliable.

peteradio
0 replies
1d4h

I don't know if people anticipated contemporary parroting behavior over huge datasets. Modern well funded models can recall an obscure persons home address buried deep into the training set. I guess the techniques described might be presented to the European audience in an attempt to maintain access to their data/and or market for sales. I hope they fail.

isodev
0 replies
1d3h

Of course, it’s not a regulation issue. The technology was introduced to users before it was ready. The very nature of training without opt-in consent or mechanism of being forgotten are all issues that should have been addressed before trying to make a keyboard with a special copilot button.

indigovole
0 replies
1d2h

GDPR and RTBF were formulated around the fears of data collection by the Stasi and other organizations. They were not formulated around easing the burdens of future entrepreneurs, but about mitigating the damage they might cause. Europeans were concerned about real harms that living people had experienced, not about enabling AGI or targeted advertising or digital personal assistants.

We have posts here at least weekly from people cut off from their services, and their work along with them, because of bad inference, bad data, and inability to update metadata based purely on BigGo routine automation and indifference to individual harm. Imagine the scale that such damage will take when this automation and indifference to individual harm are structured around repositories from which data cannot be deleted, cannot be corrected.

hooby
0 replies
10h34m

I'm pretty sure that the policymakers did NOT understand ML models in 2014 - and still do NOT understand it today.

I also don't think that they care. They don't care that ML is a hodgepodge of data & compute, and they don't care how hard it is to remove data from a model.

They didn't care about the ease or difficulty of removing data from more traditional types of knowledge storage either - like search indexes, database backups and whatnot.

RTBF was not proposed with any specific technology in mind. What they had in mind, was to try and give individuals a tool, to keep their private information private. Like, if you have a private, unlisted phone number, and that number somehow ends up on the call-list of some pollster firm, you can force that firm to delete your number so that they can't call you anymore.

The idea is, that if your private phone number (or similar data) ends up being shared or sold without your consent - you can try to undo the damage.

In practice it might still be easier to get a new number, than to have your leaked one erased... but not all private data is exchangeable like that.

nullc
4 replies
1d4h

I've wondered before if it was possible to unlearn facts, but retain the general "reasoning" capability that came from being trained on the facts, then dimensionality reduce the model.

mr_toad
0 replies
13h35m

How much reasoning capability LLM’s have is up for debate.

With a true AGI you could just tell it to keep people’s personal information confidential and expect that it would understand that instruction.

huygens6363
0 replies
1d3h

Yes, me too. If it could somehow remember the “structure” instead of the instantiation. More “relationships between types of token relationships” instead of “relationships between tokens”.

andy99
0 replies
1d4h

If you think of knowledge as a (knowledge) graph, it seems there would be some nodes with low centrality that you could drop without much effect, and other key ones that would have a bigger impact if lost.

Brian_K_White
0 replies
1d2h

I don't know about in AI, but it seems like that is what humans do.

We remember some facts but I know at least I have had a lot of facts pass through me and only leave their effects.

I once had some facts, did some reasoning, arrived at a conclusion, and only retained the conclusion and enough of the reasoning to identify other contexts where the same reasoning should apply. I no longer have the facts, I simply trust my earlier selfs process of reasoning, and even that isn't actually trust or faith because I also still reason about new things today and observe the process.

But I also evolve. I don't only trust a former reasoning unchanging forever. It's just that when I do revisit something and basically "reproduce the other scientists work" even if I arrive a different conclusion today, I'm generally still ok with the earlier me's reasoning and conclusion. It stands up as reasonable, and the new conclusion is usually just tuned a little, not wildly opposite. Or some things do change radically but I always knew they might, like in the process of self discovery you try a lot of opposite things.

Getting a little away from the point but the point is I think the way we ourselves develop answer-generating-rules is very much by retaining only the results (the developed rules) and not all the facts and steps of the work, at least much of the time. Certainly we remember some justifying / exemplifying facts to explain some things we do.

greenavocado
4 replies
23h34m

Please use the correct terminology: censorship

qbit42
0 replies
22h13m

I don't think that's a fair characterization. If a user requests a company to stop using their data, ML unlearning allows the company to do so without retraining their models from scratch.

danielmarkbruce
0 replies
23h30m

If company X wants their model to say/not say Y based on ideology, they aren't stopping anyone saying anything. They are stopping their own model saying something. The fact that I don't go around screaming nasty things about some group doesn't make me against free speech.

It's censorship to try to stop people producing models as they see fit.

Dylan16807
0 replies
10h49m

Is it censorship to not include every piece of text you can possibly find into your training dataset?

What's the difference between making that choice versus removing it from the model later?

62951413
0 replies
22h43m

The prolefeed explains that deep duckspeaking is doubleplusgood. Nothing to see here, citizen.

cwillu
4 replies
1d4h

“to edit away undesired things like private data, stale knowledge, copyrighted materials, toxic/unsafe content, dangerous capabilities, and misinformation, without retraining models from scratch”

To say nothing of unlearning those safeguards and/or “safeguards”.

ben_w
3 replies
1d3h

It sounds like you're mistakenly grouping together three very different methods of changing an AI's behaviour.

You have some model, M™, which can do Stuff. Some of the Stuff is, by your personal standards Bad (I don't care what your standard is, roll with this).

You have three solutions:

1) Bolt on a post-processor which takes the output of M™, and if the output is detectably Bad, you censor it.

Failure mode: this is trivial to remove, just delete the post-processor.

Analogy: put secret documents into a folder called "secret do not read".

2) Retrain the weights within M™ to have a similar effect as 1.

Failure mode: this is still fairly easy to remove, but will require re-training to get there. Why? Because the weights containing this information are not completely zeroed-out by this process.

Analogy: how and why "un-deletion" is possible on file systems.

3) Find and eliminate the weights within M™ that lead to the Bad output.

Analogy: "secure deletion" involves overwriting files with random data before unlinking them, possibly several times if it's a spinning disk.

--

People are still doing research on 3 to make sure that it actually happens, what with it being of very high importance for a lot of different reasons including legal obligation.

andy99
1 replies
1d3h

Until we have a very different method of actually controlling LLM behavior, 1 is the only feasible one.

Your framing only makes sense when "Bad" is something so bad that we can't bear its existence, as opposed to just "commercially bad" where it shouldn't behave that way with an end user. In the latter, your choice 1 - imposing external guardrails - is fine. I'm not aware of anything LLMs can do that fits in the former category.

ben_w
0 replies
1d

Until we have a very different method of actually controlling LLM behavior, 1 is the only feasible one.

Most of the stuff I've seen, is 2. I've only seen a few places use 1 — you can tell the difference, because when a LLM pops out a message and then deletes it, that's a type 1 behaviour, whereas if the first thing it outputs directly is a sequence of tokens saying (any variant of) "nope, not gonna do that" that's type 2 behaviour.

This appears to be what's described in this thread: https://old.reddit.com/r/bing/comments/11fryce/why_do_bings_...

The research into going from type 2 to type 3 is the entirety of the article.

Your framing only makes sense when "Bad" is something so bad that we can't bear its existence, as opposed to just "commercially bad" where it shouldn't behave that way with an end user. In the latter, your choice 1 - imposing external guardrails - is fine.

I disagree, I think my framing applies to all cases. Right now, LLMs are like old PCs with no user accounts and a single shared memory space, which is fine and dandy when you're not facing malicious input, but we live in a world with malicious input.

You might be able to use a type 1 solution, but it's going to be fragile, and more pertinently, slow, as you only know to reject content once it has finished and may therefore end up in an unbounded loop of an LLM generating content that a censor rejects.

A type 2 solution is still fragile, but it just doesn't make the "bad" content in the first place — and, to be clear, "bad" in this context can be anything undesired, including "uses vocabulary too advanced for a 5 year old who just started school" if that's what you care about using some specific LLM for.

cwillu
0 replies
23h34m

I think you mistakenly replied to my comment instead of one that made some sort of grouping?

Alternatively, you're assuming that because there is some possible technique that can't be reversed, it's no longer useful to remove the effects of techniques that _can_ be reversed?

motohagiography
1 replies
1d4h

seems like there is a basic problem where if you specify something to be unlearned, it could still be re-learned by inference and prompting. the solution may not be in filtering the proscribed facts or data itself, but in the weights and incentives that form a final layer of reasoning. Look at "safe" models now like google's last launch, where the results were often unsatisfying, as clearly we don't want truthful models yet, but we want ones that enable our ability to develop them further, which for now means not selecting out by antagonizing other social stakeholders.

maybe we can encode and weight some principle of the models having been created by something external, with some loosely defined examples they can refer to as a way to evaluate what they return, then ones that don't yield those results cease to be used, where the ones that find a way to align will get reused to train others. there will absolutely be bad ones, but in aggregate they should produce something more desirable, and if they really go off the rails, just send a meteor. the argument in how models can "unlearn" will be between those who favour incentives and those who favour rules- likely, incentives for ones I create, but rules for everyone elses'.

musicale
0 replies
18h25m

It is unsurprising that a system trained on human-generated content might end up encoding implicit bias, toxicity, and negative goals. And the more powerful and general-purpose a system is, the more suitable it is for a wide range of powerfully negative purposes.

Neither specializing the model nor filtering its output seems to have worked reliably in practice.

aidenn0
1 replies
1d2h

I think "unlearning" is not the actual goal; we don't want the model to stick its proverbial head in the sand. Being unaware of racism is different from not producing racist content (and, in fact, one could argue that it is necessary to know about racism if one wishes to inhibit producing racist content; I remember in elementary school certain kids thought it would be funny to teach one of the special-ed kids to parrot offensive sentences).

krono
0 replies
21h30m

Say you tell me you want a red sphere. Taken at face value, you show a prejudice for red sphere's and discriminate against all other coloured shapes.

We've all had to dance that dance with ChatGPT by now, where you ask for something perfectly ordinary, but receive a response telling you off for even daring to think like that, until eventually you manage to formulate the prompt in a way that it likes with just the right context and winner vocabulary + grammar, and finally the damned thing gives you the info you want without so much as any gaslighting or snarky insults hiding in the answer!

It doesn't understand racism, it simply evaluates certain combinations of things according to how it was set up to do.

JKCalhoun
1 replies
18h19m

I don't know — the post, reading the comments here, I am a little worried for the "sanity" of our AI that have been trained, untrained, retrained like a pawn in some kind of Cold War spy novel.

kombookcha
0 replies
9h47m

It's fine, the LLM AIs we have now are just fancy versions of autocorrect. They, and other LMs, guess at statistically probable words/datapoints, and because they don't understand context, you might need to put your thumb on the scales to make the output actually be useful. They're at best very janky tools as soon as you're working with things that require context that isn't easily contained in some kind of confined area of work.

Currently we are seeing the phenomenon 'habsburg AI' where AI's consume their own outputs as training data, which rapidly deteriorates their ability to actually be useful for much of anything.

The thing is that there literally isn't enough human-made data to keep feeding them (they already ate the entire internet), so if you both want to continue ramping their intake of data and you also don't want them to get rapidly weird and completely useless, you pretty much have to get in there with elbow grease. Removing or deprioritizing data that's tripping up the model is one of the few ways you can do human-assisted refinement of these things.

The sooner we all face the music that these things aren't magical truth machines, have a long way to go and there is no guaranteed rate of growth, the sooner this hype cycle can end.

xg15
0 replies
20h27m

What I don't get about the DP approach is how this would be reconciled with the "exact" question-answering functionality of LLMs.

DP makes perfect sense if all I care about is low-resolution statistical metrics or distributions of something and not the exact values - the entire purpose of DP is to prevent reconstructing the exact values.

However, the expectation for LLMs is usually to ask a question (or request a task) and get an exact value as a response: If you ask "What's the phone number of John Smith?" the model will either tell you it doesn't know or it will answer you with an actual phone number (real or hallucinated). It will not tell you "the number is with 83% probability somewhere in New Jersey".

So if the model is trained with DP, then either the data is scrambled enough that the it won't be able to return any kind of reliably correct data, effectively making it useless - or it's not scrambled enough, so that the model can successfully reconstuct data despite the scrambling process, effectively making the DP step useless.

Or in other words, the OP defines "DP unlearning" as:

The intuition is that if an adversary cannot (reliably) tell apart the models, then it is as if this data point has never been learned—thus no need to unlearn.

However, if my original model truthfully returns John Smith's phone number on request and the "unlearned" model must not be distinguishable by an outside observer from the original model, then the "unlearned" model will also return the phone number. While I could say that "technically" the model has never seen the phone number in the training data due to my DP scrambling, this doesn't solve the practical problem why the unlearning was requested in the first place, namely that John Smith doesn't want the model to return his phone number. He could probably care less about the specific details of the training process.

So then, how would DP help here?

thomastjeffery
0 replies
3h3m

So "unlearning" is yet another overconfident implementation of an impossible task. I guess that's par for the course.

My favorite part is where literally pretending is presented as a serious option. Wow. Just..wow.

surfingdino
0 replies
22h59m

How about a radial approach? How about not ingesting all content but only that which is explicitly marked as available for model-building purposes?

joshhansen
0 replies
12h9m

"Eternal Sunshine of the Spotless Mind"

The erasure of knowledge is a troubling occupation

gotoeleven
0 replies
1d3h

My new startup includes a pitchfork wielding mob in the ML training loop.