I just tried. Asked a question on a research topic I'm digging into. It gave me some answers but no references. Then I copy the answers it gave me and specifically ask for references. Then I got:
I sincerely apologize for my earlier response. Upon reviewing the search results provided, I realize I made an error in referencing those specific studies. The search results don't contain any relevant information for the claims I mentioned earlier. As an AI assistant, I should be more careful in providing accurate and supported information. Thank you for bringing this to my attention. In this case, I don't have reliable references to support those particular statements about software tools and their impact on developer experience and software quality.
Not sure about this...
I hate this kind of thing so much.
Absolutely. Behaviour that in normal life in clean societies would be "eliciting violence": automated hypocritical lying, apologizing in form and not in substance, making statements based on fictional value instead of truthfulness...
That's all they can do. They seem impressive at first because they're basically trained as an adversarial attack on the ways we express our own intelligence. But they fall apart quickly because they don't have actually have any of the internal state that allows our words to mean anything. They're a mask with nothing behind it.
Ctrl+F for "Central nervous system":
https://en.wikipedia.org/wiki/List_of_human_cell_types
Choose any five wikilinks. Skim their distinct functions and pathologies:
https://en.wikipedia.org/wiki/List_of_regions_in_the_human_b...
https://en.wikipedia.org/wiki/Large-scale_brain_network
Evolution's many things, but maybe most of all lazy. Human intelligence has dozens of distinct neuron types and at least hundreds of differentiated regions/neural subnetworks because we need all those parts in order to be both sentient and sapient. If you lesion parts of the human brain, you lose the associated functions, and eventually end up with what we'd call mental/neurological illnesses. Delusions, obsessions, solipsism, amorality, shakes, self-contradiction, aggression, manipulation, etc.
LLMs don't have any of those parts at all. They only have pattern-matching. They can only lie, because they don't have the sensory, object permanence, and memory faculties to conceive of an immutable external "truth"/reality. They can only be hypocritical, because they don't have the internal identity and introspective abilities to be able to have consistent values. They cannot apologize in substance, because they have neither the theory of mind and self-awareness to understand what they did wrong, the social motivation to care, nor the neuroplasticity to change and be better. They can only ever be manipulative, because they don't have emotions to express honestly. And I think it speaks to a not-atypical Silicon Valley arrogance to pretend that they can replicate "intelligence", without apparently ever considering a high-school-level philosophy or psychology course to understand what actually lets human intelligence tick.
At most they're mechanical psychopaths [1]. They might have some uses, but never outweighing the dangers for anything serious. Some of the individuals who think this technology is anything remotely close to "intelligent" have probably genuinely fallen for it. The rest, I suppose, see nothing wrong because they've created a tool in their own image…
[1]: I use this term loosely. "Psychopathy" is not a diagnosis in the DSM-V, but psychopathic traits are associated with multiple disorders that share similar characteristics.
That is definitely not true.
Lying is a state of mind. LLMs can output true statements, and they can even do so consistently for a range of inputs, but unlike a human there isn't a clear distinction in an LLM's internal state based on whether its statements are true or not. The output's truthfulness is incidental to its mode of operation, which is always the same, and certainly not itself truthful.
In the context of the comment chain I replied to, and the behaviour in question, any statement by an LLM pretending to be be capable of self-awareness/metacognition is also necessarily a lie. "I should be more careful", "I sincerely apologize", "I realize", "Thank you for bringing this to my attention", etc.
The problem is the anthropomorphization. Since it pretends to be like a person, if you ascribe intention to it then I think it is most accurately described as always lying. If you don't ascribe intention to it, then it's just a messy PRNG that aligns with reality an impressive amount of the time, and words like "lying" have no meaning. But again, it's presented and marketed as if it's a trustworthy sapient intelligence.
I am not sure that lying is structural to the whole system though: it seems that some parts may encode a world model, and that «the sensory, object permanence, and memory faculties» may not be crucial - surely we need a system that encodes a world model and that refines it, that reasons on it and assesses its details to develop it (I have been insisting on this for the past years also as the "look, there's something wrong here" reaction).
Some parts seemingly stopped at "output something plausible", but it does not seem theoretically impossible to direct the output towards "adhere to the truth", if a world model is there.
We would still need to implement the "reason on your world model and refine it" part, for the purpose of AGI - meanwhile, fixing the "impersonation" fumble ("probabilistic calculus say your interlocutor should offer stochastic condolences") would be a decent move. After a while with present chatbots it seems clear that "this is writing a fiction, not answering questions".
This is just the start. Imagine giving up on progressing these models because they're not yet perfect (and probably never will be). Humans wouldn't accomplish anything at all this way, aha.
And I wouldn't say lazy at _all_. I would say efficient. Even evolutionary features that look "bad" on the surface can still make sense if you look at the wider system they're a part of. If our tailbone caused us problems, then we'd evolve it away, but instead we have a vestigial part that remains because there are no forces driving its removal.
But the issue is with calling finished products what are laboratory partials. "Oh look, they invented a puppet" // "Oh, nice!" // "It's alive..."
Wait for the first large scale LLM using source-aware training:
https://github.com/mukhal/intrinsic-source-citation
This is not something that can be LoRa finetuned after the pretraining step.
What we need is a human curated benchmark for different types of source-aware training, to allow competition, and an extra column in the most popular leaderboards, including it in the Average column, to incentivice AI companies to train in a source aware way, of course this will instantly invalidate the black-box-veil LLM companies love to hide behind so as not to credit original authors and content creators, they prefer regulators to believe such a thing can not be done.
In meantime such regulators are not thinking creatively and are clearly just looking for ways to tax AI companies, and in turn hiding behind copyright complications as an excuse to tax the flow of money wherever they smell it.
Source aware training also has the potential to decentralize search!
Yeah. Treating these things as advanced, semantically aware search engines would actually be really cool.
But I find the anthropomorphization and "AGI" narrative really creepy and grifty. Such a waste that that's the direction it's going.
What?
I've been playing with Gemma locally, and I've had some success by telling it to answer "I don't know" if it doesn't know the answer, or similar escape hatches.
Feels like they were trained with a gun to their heads. If I don't tell it it doesn't have to answer it'll generate nonsense in a confident voice.
The models weights are tuned towards the direction that would cause the model to best fit the training set.
It turns out that this process makes it useful at producing mostly sensible predictions (generate output) for text that is not present in the training set (generalization).
The reason that works is because there are a lot of patterns and redundancy in the stuff that we feed to the models and the stuff that we ask the models so there is a good chance that interpolating between words and higher level semantics relationship between sentences will make sense quite often.
However that doesn't work all the time. And when it doesn't, current models have no way to tell they "don't know".
The whole point was to let them generalize beyond the training set and interpolate in order to make decent guesses.
There is a lot of research in making models actually reason.
In the Physics of Language Models talk[1], he argues that the model knows it has made a mistake, sometimes even before it has made it. Though apparently training is crucial to make the model be able to use this constructively.
That being said, I'm aware that the model doesn't reason in the classical sense. Yet, as I mentioned, it does give me less confabulation when I tell it it's ok not to answer.
I will note that when I've tried the same kind of prompts with Phi 3 instruct, it's way worse than Gemma. Though I'm not sure if that's just because of a weak instruction tuning or the underlying training as well, as it frequently ignores parts of my instructions.
[1]: https://www.youtube.com/watch?v=yBL7J0kgldU
There are different ways to be wrong.
For example you can confabulate "facts" or you can make logical or coherence mistakes.
Current LLMs are encouraged to be creative and effectively "make up facts".
That's what created the first wow factor. The models are able to write a Star Trek fan fiction model in the style of Shakespeare. They are able to take a poorly written email and make it "sound" better (for some definition of better, e.g. more formal, less formal etc).
But then, human psychology kicked in and as soon as you have something that can talk like a human and some marketing folks label as "AI" you start expecting it to be useful also for other tasks, some of which require factual knowledge.
Now, it's in theory possible to have a system that you can converse with which can _also_ search and verify knowledge. My point is that this is not the place where LLMs start from. You have to add stuff on top of them (and people are actively researching that)
Just to follow up on this: I asked it to give me a brief explanation on how to use laravel 11 blade fragments, which it did reasonably well.
I then offered 3 lines of code of a route I'm using in Laravel and I asked to tell me how to implement fragment usage where the parameter in the url determines the fragment returned.
Route::get('/vge-frags/{fragment}', function ($fragment) { return view('vge-fragments'); });
It told me to make sure I have the right view created (which I did) and that was a good start. Then...
It recommended this?
Route::get('/vge-frags/{fragment}', function ($fragment) { return fragment($fragment); });
I immediately knew it was wrong (but somebody looking to learn might not know). So I had to ask it: "Wait, how does the code know which view to use"?
Then it gave me the right answer.
Route::get('/vge-frags/{fragment}', function ($fragment) { return view('vge-fragments')->fragment($fragment); });
I dunno. It's really easy to find edge cases with any of these models and you have to essentially question everything you receive. Other times it's very powerful and useful.
Seems a little bit of an unfair generalisation.
I mean, this is an unsolvable problem with chat interfaces, right?
If you use a plugin that is integrated with tooling that check generated code compiles / passes tests / whatever a lot of this kind of problem goes away.
Generally speaking these models are great at tiny self contained code fragments like what you posted.
It’s longer, more complex, logically difficult things with interconnected parts that they struggle with; mostly because the harder the task, the more constraints have to be simultaneously satisfied; and models don’t have the attention to fix things simultaneously, so it’s just endless fix one thing / break something else.
So… at least in my experience, yes, but honestly, for a trivial fragment like that most of the time is fine, especially for anything you can easily write a test for.
And you can have the LLM write the test, too.
This is a good point, and we have new application-level features coming soon that to improve verifiability.
I dunno if you need it but I'd be happy to come up with some scenarios and help test
Sorry about that, could you make sure that "Always search" is enabled and try that first query again? It should be able to get the correct answer with references.
It was on. If I ask the same question again it now gets the right answer. Maybe a blip? Not sure.
To be fair, I don't expect these AI models to give me perfect answers every time. I'm just not sure people are vigilant enough to ask follow up questions that criticize how the AI got the answers to ensure the answers come from somewhere reasonable.
I found that quite often even though the always search option is on, it won’t search at times; maybe that was the case here.
Honestly, that's a lot of words and repetition to say "I bullshitted".
Though there are humans that also talk like this. Silver lining to this LLM craze, maybe it'll inoculate us to psychopaths.