Why should we try to unlearn "bad" behaviours from AI?
There is no AGI without violence, its part of being free thinking and self survival.
But also by knowing that launching a first strike by a drunk president was a bad idea we averted a war because of a few people, AI needs to understand consequences.
It seems futile to try and hide "bad" from AI.
Because we can get AI related technologies to do things living creatures can’t, like provably forget things. And when it benefits us, we should.
Personal opinion, but I think AGI is a good heuristic to build against but in the end we’ll pivot away. Sort of like how birds were a good heuristic for human flight, but modern planes don’t flap their wings and greatly exceed bird capabilities in many ways.
Attribution for every prediction and deletion seem like prime examples of things which would break the analogy of AI/AGI with something more economically and politically compelling/competitive.
Can you point to any behaviour in human beings you'd unlearn if theyd also forget the consequences?
We spend billions trying to predict human behaviour and yet we are surprised everyday, "AGI" will be no simpler. We just have to hope the dataset was aligned so the consequences are understood, and find a way to contain models that don't.
The feeling of extreme euphoria and its connection to highly addictive drugs like Heroin might be a use case. Though I'm not sure how well something like that would work in practice.
Is that possible to do without also forgetting why it’s dangerous? That seems like it would fuel a pattern of addiction where the person gets addicted, forgets why, then gets addicted again because we wiped their knowledge of the consequences the first time around.
Then again, I suppose if the addiction was in response to a particular stimulus (death of a family member, getting fired, etc) and that stimulus doesn’t happen again, maybe it would make a difference?
It does have a tinge of “those who don’t recall the past are doomed to repeat it”.
After a certain point I think someone can learn enough information to derive almost everything from first principles. But I think it might work temporarily.
There's a movie about this idea called "Eternal Sunshine of a Spotless Mind".
I find it hard I believe that you can surgically censor one chunk of information, and cut off the rest of the information. Especially if it's general physical principles.
I also don't have a nice topological map of how all the world's information is connected to the moment, so I can't back up by opinions.
Though I'm still rooting for the RDF/OWL and Semantic Web folks, they might figure it out.
You seem to be focusing a lot on remembering or forgetting consequences. Yes, ensuring models know enough about the world to only cause the consequences they desire is a good way for models to not create random harm. This is probably a good thing.
However, there are many other reasons why you might want a neural network to provably forget something. The main reason has to do with structuring an AGI's power. Even though the simple-story of AGI is something like "make it super powerful, general, and value aligned and humanity will prosper". However, the reality is more nuanced. Sometimes you want a model to be selectively not powerful as a part of managing value mis-alignment in practice.
To pick a trivial example, you might want a model to enter your password in some app one time, but not remember the password long term. You might want it to use and then provably forget your password so that it can't use your password in the future without your consent.
This isn't something that's reliably doable with humans. If you give them your password, they have it — you can't get it back. This is the point at which we'll have the option to pursue the imitation of living creatures blindly, or choose to turn away from a blind adherence to the AI/AGI story. Just like we reached the point at which we decided whether flying planes should have flapping wings dogmatically — or whether we should pursue the more economically and politically competitive thing. Planes don't flap their wings, and AI/AGI will be able to provably forget things. And that's actually the better path.
A recent work co-authors and I published related to this: https://arxiv.org/pdf/2012.08347
Seeing dad have sex with mom.
It sounds like the only answer for AI is the same as the only answer for humans.
Wisdom. Arriving at actions and reactions based on better understanding of the interconnectedness and interdependency of everything and everyone. (knowing more not less, and not selective or bowdlerized)
And most humans don't even have it. Most humans are not interested and don't believe and certainly don't act as though "What's good for you is what's good for me, what harms you harms me." Every day a tech podcaster or youtuber says this or that privacy loss or security risk "doesn't affect you or me", they all affect you and me, when a government or company gives themselves and then abuses power over a single person anywhere, that is a hit to you and me even though we aren't that person, because that person is somebody, and you and I are somebody.
Most humans ridicule anyone that talks like that and don't let them near any levers of power at any scale. They might be ok with it in inconsequential conversational contexts like a dinner party or this or this forum, but not in any decision-making context. Anyone talking like that is an idiot and disconnected from reality, they might drive the bus off the bridge because the peace fairies told them to.
If an AI were better than most humans and had wisdom, and gave answers that conflicted with selfishness, most humans would just decide they don't like the answers and instructions coming from the AI and just destroy it, or at least ignore it, pretty much as they do today with humans who say things they don't like.
Perhaps one difference is an AI could actually be both wise and well-intentioned rather than a charlatan harnessing the power of a mass of gullables, and it could live longer than a human and it's results could become proven-out over time. Some humans do get recognized eventually, but by then it doesn't do the rest of us any good because they can no longer be a leader as they're too old or dead. Then again maybe that's required actually. Maybe the AI can't prove itself because you can never say of the AI, "What does he get out of it by now? He lived his entire life saying the same thing, if he was just trying to scam everyone for money or power or something, what good would it even do him now? He must have been sincere the whole time."
But probably even the actual good AI won't do much good, again for the same reason as with actually good humans, it's just not what most people want. Whatever individuals say about what their values are, by the numbers only the selfish organisations win. Even when a selfish organization goes too far and destroys itself, everyone else still keeps doing the same thing.
A few things to exclude from training might include: - articles with mistakes such as incorrect product names, facts, dates, references - fraudulent and non-repeatable research findings - see John Ioannidis among others - outdated and incorrect scientific concepts like phlogiston and LaMarckian evolution - junk content such as 4-chan comments section content - flat earther "science" and other such nonsense - debatable stuff like: do we want material that attributes human behavior to astrological signs or not? And when should a response make reference to such? - prank stuff like script kiddies prompting 2+2=5 until an AI system "remembers" this - intentional poisoning of a training set with disinformation - suicidal and homicidal suggestions and ideation - etc.
Even if we go with the notion that AGI is coming, there is no reason its training should include the worst in us.
AGI would not beGI unless it could change its mind after realizing its wrong about something
I disagree. People with anterograde amnesia still possess general intelligence.
I don't know I ton about amnesia, but I would think the facilities for changing their mind are still there.
E.g. ordering food, they might immediately change their mind after choosing something and correct their order.
I recognize they cannot form new memories but from what I understand they still would have a working memory, otherwise you'd be virtually unable to think and speak.
LLMs will change their minds today. Most major ones can change their minds on subsequent generations within the same context (“I’m sorry, my previous answer was incorrect,..”), and the biggest ones can change their mind mid-answer (mostly observed with GPT4).
Maybe it all boils down to copyright. Having a method that believably removes the capacity to generate copyrighted results might give you some advantage with respect to some legislation.
Also if you build some sort of search engine using an LLM governments will expect you to be able to remove websites or knowledge of certain websites for legal reasons (DMCA, right to be forgotten, etc).
There is no AGI without violence, its part of being free thinking and self survival.
Self survival idea is a part of natural selection, AGI doesn't have to have it. Maybe the problem is we are the only template to build AGI from, but that's not inherent to "I" in any way. Otoh, lack of self preservation can make animals even more ferocious. Also there's a reason they often leave a retreat path in warzones.
Long story short it's not that straightforward, so I sort of agree cause it's an uncharted defaults-lacking territory we'll have to explore. "Unlearn bad" is as naive as not telling your kids about sex and drugs.
Thanks but no violent AGIs thanks
AI has no concept of children, family, or nation. It doesn't have parental love or offspring protection instinct. Faced with danger to its children it cannot choose between fighting or sacrificing itself in order to protect others. What it is good at is capturing value through destruction of value generated by existing business models; it does it by perpetrating mass theft of other people's IP.
The point is to build things that are useful, not to attempt to replicate science fiction literature.
They are just trying to find a way to plausibly declare successful removal of copyrighted and/or illegal material without discarding weights.
GPT-4 class models reportedly costs $10-100m to train, and that's too much to throw away for Harry Potter or Russian child porn scrapes that could later reproduce verbatim despite representing <0.1ppb or whatever minuscule part of dataset.
You seem to be ignoring the potential to use this to improve the performance of LLMs. If you can unlearn wrong answers you can ask the model using any scoring mechanism to check for correctness instead of scoring for token for token similarity to the prescribed answer.
This is presumably about a chatbot though, not AGI, so it's basically a way of limiting what they say. (Not a way that I expect to succeed)
Because corporations won't buy the fancy chat bot if there's a chance it will occasionally use slurs in it's interactions with their customers.
So you have a problem with supervised learning like spam classifiers?
I disagree. Are committed pacifists not in possession of general intelligence?