We Automated Bullshit

I think this essay is right in many ways. LLMs seem to operate on the principle of "will this pass for a correct answer" rather than "this is how this thing works, so here's a reasonable opinion answering your question".

However at some point you have to admit the LLM does generate things that are good answers. They might be good answers that happen to pass the smell test, but they are nonetheless good answers. For instance when you ask it for a snippet of code and it gets it right.

And here is the crucial thing: you need to already know what you're doing to know whether the LLM got it right. I'm no historian, and I can ask cGPT for an essay about the causes of the Great War. When I get the answer, it sounds right to me. I don't know if the essay talks about the things an actual historian would find important, all I know is that it gives me the vanilla answer that some layman who has read a little bit would think was the right answer.

Now there's another issue this brings up. Most of us are experts in one field only. What is stopping the LLM from fooling me in every field that I don't know anything about? I best be wary of using it outside of my area of expertise.

So in the current iteration, I think LLMs are a shortcutting tool for experts. I can tell when it spits out a snippet of code that is correct, and when it's wrong. Someone who wasn't working in my domain would get fooled.

LLMs seem to operate on the principle of "will this pass for a correct answer" rather than "this is how this thing works, so here's a reasonable opinion answering your question".

Isn't the hint in the name? It's a language model, not a knowledge model. LLMs are exceptional at generating stuff that passes as coherent language (like, the parts of speech are all where one would expect them to be in the sentences). The trouble is people think it goes deeper when the knowledge modeling part only happens incidentally because language and knowledge modeling are so closely related.

No, you're overindexing on the name. It doesn't mean that these LLMs are about linguistics and grammar specifically. Clearly, they aren't as likely to generate "Colorless green ideas sleep furiously" as "It's important to feed your dog".

Obviously, they're better at grammar than many other things, but it's clear that they contain a vast amount of knowledge. They contain the information that the sky is blue and firetrucks are red, which isn't a matter of grammar or coherence, but is simply information about the world.

It doesn't mean that these LLMs are about linguistics and grammar specifically.

Correct. But it does mean they're about "language" rather than knowledge. It's meant to read, parse and generate language, no matter if it includes prose, facts, knowledge or whatever.

Clearly, they aren't as likely to generate "Colorless green ideas sleep furiously" as "It's important to feed your dog".

Clearly, as the training set probably contains much more instances of the latter, than the first example. And if it's more probable to generate, that's the text it'll generate. It doesn't understand why or how, just that it's more likely, so that's the way it's going.

But it does mean they're about "language" rather than knowledge.

To some extent, those are kind of the same thing.

To some extent, those are kind of the same thing.

They are not the same thing. They are not even "kind of" the same thing, for any reasonable definition of "kind of." That should be obvious, because a lie is language, which is pretty close to the opposite of knowledge.

I mean no offense, but your hedging words are doing so much work that I think your statement counts as bullshit.

Edit: thinking about it more, I think your comment shows the particular kind of bullshit that often characterizes tech people's thinking and is so useful for hyping shit (e.g. take something hard, lossily reformulate it into something easier but vaguely similar, solve the easy thing, then declare you solved the hard thing).

A language is a particular way of serializing concepts and their relationships.

Knowledge is... concepts and their relationships, typically with the implication of being grounded in objective reality (although one can be knowledgeable about things like, say, Tolkien, or Star Wars, or fantasy tropes more generally).

Language is (a particular way of representing) knowledge, in about the same way that a particular .cpp file is (a particular way of representing) a linked list or whatever.

Language is (a particular way of representing) knowledge, in about the same way that a particular .cpp file is (a particular way of representing) a linked list or whatever.

So, in your view, are .cpp files and linked lists "kind of" the same thing?

Language can be used to express some kinds of knowledge, but they're not even "kind of" synonymous. Language can express a lot of things that aren't knowledge, and there's knowledge that can't be expressed with language (qualia, at least).

Of course they are the same thing. Language as humans use it isn't a god-given perfect formal grammar system. It's people grunting and gulping about what they experience, for many thousand of years, each generation picking up the patterns from the generation before, and adding to them. Language was, is, and always has been about knowledge, because you literally build the language skill up from correlations.

All the more and less formal models we invent for describing "language"? This is just fitting curves to a snapshot of reality. So sure, "a lie is language, which is pretty close to the opposite of knowledge" - a lie can be expressed in the formal model. But the truth (as subjectively understood) is much more likely to be spoken or written down in practice.

Communication isn't random. Language isn't random. Learning it is picking up the patterns.

I'm having trouble understanding why you'd respond with that. Are you interpreting knowledge to mean knowledge-of-language or something?

>>> But it does mean they're about "language" rather than knowledge.

>> To some extent, those are kind of the same thing.

> They are not the same thing.

Of course they are the same thing.

Language and knowledge (both broadly construed) are obviously not the same thing. Language can encode non-knowledge like nonsense or lies, and there is knowledge (e.g. of the experience of qualia) that can't be expressed in language.

I think the point up-thread is true: something that knows only about how language is unreliable source of knowledge. Even if all its input is true knowledge, it can still blindly combine that input in linguistically plausible but false ways.

they really are not.

you can use language as a proxy for knowledge in some cases, but doing so guarantees generating bullshit, which is the entire point of the OP.

    it does mean they're about "language" rather than knowledge

Language in part encodes knowledge.

It also encodes opinions, fantasy, bad-faith arguments and lies without discrimination. Given that LLMs aren't trained to separate their sources, I think it's fair to say today's AI focuses more on language than the integrity of it's answer.

it does mean they're about "language" rather than knowledge

Not really. It was invented because they were trained on natural language corpuses. It wasn't a deep, philosophical concept.

It continues to be used for models that are doing completely non-languagy stuff, such as robotics, although I expect the growing use of 'foundation model' or some other less-confusing term will likely replace it.

Language is about knowledge.

If it wasn't, what would we be doing here?

I would argue they do not contain knowledge using the classic definition of knowledge as "justified true belief."

They can emit statements that happen to be true because the probabilities fall that way (as a sibling comment points out.) But it's not "justified", because models don't have any ability to remember or cite sources, or even internally classify tokens as true or false (it's all just weights.)

(And it's not a "belief", either, because there's no intentionality, but that's a more slippery concept.)

Therefore, philosophically speaking, it is fundamentally impossible to acquire knowledge from a LLM unless you verify ("justify") a statement via some secondary, more trustworthy channel.

By that definition of knowledge, I don’t contain knowledge either.

I’m clearly keeping track of how strongly I believe a variety of things. Not how I started believing them to begin with, however, so I lack the justification. And in a lot of cases, the lost justification would have been: “Because someone else said it, so now I’m aping them.”

I mean, this is why philosophical skepticism is a thing. There's hundreds of books written on epistemology.

But I think almost none of them would ascribe "knowledge" to a statistical engine that output propositions of random truth values weighted by how many examples of that proposition it's seen in it's training data.

You can view the logprobs in the API and literally watch it choose to say different things based on it's internal dice rolls.

What we have that an LLM doesn’t have is a mobile body and other sensory inputs so we can verify knowledge through practice.

Knowledge without a feedback mechanism for verification is prone to veer off course.

Right now the feedback mechanism is based on human reinforcement.

What if we put an LLM in an android’s body and let the machine interact with the world through the same senses that we do?

A body may be a necessary condition for epistemological knowledge, but it's certainly not sufficient... most animals are embodied but do not possess propositional knowledge (IMO anyway.)

Ultimately though this is all about the definition of words. Humans find the distinction between "true" and "false" to be important and meaningful; animals don't care. AIs don't care (being incapable of caring), but humans care very much about the propositional veracity of statements that AI makes.

The important thing is that we're on the same page about what's actually happening with these systems and how they work and don't slip into magical thinking or assume that just because a system is facile with language it automatically has a sense of truth or right or wrong.

This is only going to get more complicated as models get more sophisticated.

most animals are embodied but do not possess propositional knowledge

https://www.youtube.com/watch?v=yhfl7kasjZc

There are a quite a few videos with the same reaction. There does seem to be something going on.

I tried to mention containing knowledge to avoid a discussion like this: not 'knowing' or 'having knowledge' or something that could lead to a discussion like this, which I don't think aligns with the point I was trying to make, which is that top LLMs are not merely good at things like grammar, but that a key part of what they do has to do about information about the world.

FWIW, although 'justified true belief' is classic in the sense of being an old definition of knowledge, I don't think any significant number of modern philosophers would use it.

Well, they're concerned with Gettier problems etc too.

But yeah the problem what you mean by "knowing." For example, I am perfectly happy to say that a LLM "knows" words just like I "know" words, or that it "knows about" very popular topics in a loose sense.

But that's a very different sense of "knows" from the kind of specific propositional knowledge we're usually interested in when using a LLM as a search engine, fact checker or source of information!

but it's clear that they contain a vast amount of knowledge.

Is it though?

There's nuance in what we call knowledge, but to me, knowledge is not uni-dimensional. What I mean by that is, I know when I _know_ something, and I have learned that that is not a trivial skill.

How do I know I _know_ something? Usually because I have searched for evidence from multiple angles / dimensions / senses. When multiple independent observations of quality agree, and my conclusion matches the result, it is safe to say that I know that fact.

An LLM may have multiple 'dimensions' in a linear algebra sense, but does it have independent information?

For what it's worth, I mentioned they _contain_ the knowledge to try to avoid discussions about whether they really "know" anything -- if there's a better term like 'information about the world' or something, I think that's all I was trying to tell the original person I replied to. The point wasn't that they do what humans do, but that they are not just grammar machines.

How do I know I _know_ something? Usually because I have searched for evidence from multiple angles / dimensions / senses. When multiple independent observations of quality agree, and my conclusion matches the result, it is safe to say that I know that fact.

I wanted to thank you for laying out this perspective. In my experience, in discussions about LLMs, a lot of time people get caught up on words like 'know' which work well for talking about humans, but don't cash them out in definitions I know what to do with. This is a really content-ful of sharing what you mean by your claim.

How do I know I _know_ something? Usually because I have searched for evidence from multiple angles / dimensions / senses. When multiple independent observations of quality agree, and my conclusion matches the result, it is safe to say that I know that fact.

This is technically true, but no one actually does much of it consciously. But this is how you picked up every skill and understanding, including language, since you've been born.

And this - the unconscious fusion of correlated patterns - is pretty much what LLMs undergo in training.

EDIT:

Also, a lot of confusion is created by using terms like "a fact" or "a conclusion" or "a result", as if they were fixed points in space, fully defined mathematical symbols. But there exist no such thing. There is no binary "know/don't know", there is no binary "result match/does't match the conclusion". Those are all continuous quantities, that get rounded off for convenience.

Clearly, they aren't as likely to generate "Colorless green ideas sleep furiously" as "It's important to feed your dog".

There is a setting right there called temperature, no?

Isn't the hint in the name? It's a language model, not a knowledge model.

Well yes, but how is this insight helpful? People aren't just looking at the name. Have your missed the last 10 months of ridiculous breathless coverage of LLMs?

The trouble is people think it goes deeper when the knowledge modeling part

People are literally calling this shitty text prediction algorithm "AI". The discourse is completely out of hand.

It is "AI", using the definition of that term that was established when the term was coined in 1956.

https://en.m.wikipedia.org/wiki/Dartmouth_workshop

An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.

LLMs fit that definition well. They're not AGI, but I think it's fair to call them AI.

There is no widely accepted definition of "AI" and even if I accepted this one (I absolutely do not) no LLM meets this bar in the general sense.

Game tree heuristic searches, like the minimax algorithm with alpha-beta pruning, has also been called AI, which is why programs that play chess are called AI.

What I think is the problem is that when people hear "artificial intelligence," they don't think of chess programs anymore; they think of something like Skynet from the Terminator. Or, maybe now they think of ChatGPT.

That 1956 workshop is widely credited as the place where the term AI was coined.

If they don't get to define "AI", who does?

What some people now call AGI, the average person thinks of when they hear AI.

Isn't the hint in the name? It's a language model, not a knowledge model.

True. But most of the training corpus was not linguistically-sound gibberish; it was actual text. There was knowledge encoded in the words. LLMs are large language models - large enough to have encoded some of the knowledge with the words.

Some of. And they only encoded it. They didn't learn it, and they don't know it. It's just encoded in the words. It comes out sometimes in response to a prompt. Not always, not often enough to be relied on, but often enough to give users hope.

What we really need is a language model coupled to a knowledge model.

And they only encoded it. They didn't learn it, and they don't know it.

People say stuff like this a lot, and I never know what to do with it. I'm not sure what specifically I'm supposed to get out of these distinctions.

Well, without bothering to get all epistemological about it, it's clear that LLMs doesn't know, say, the law in a way that a lawyer knows it, or physics in the way that a physicist knows it.

And that's enough to be a problem, given how people are trying to use LLMs. They're trying to use it like a human who "knows" (whatever that means). And whatever an LLM actually does, it doesn't do that.

My uncle doesn't know the law the way a lawyer knows it either, but that doesn't stop him from trying to use it like someone who does ;)

If the claim is only "LLMs don't work like humans" and "LLMs make a lot of factual and logical errors", then that seems to be correct as far as it goes, but it doesn't go very far.

I'm not sure either. An alien could make a similar argument about our brains made of meat and electricity.

But most of the training corpus was not linguistically-sound gibberish; it was actual text. There was knowledge encoded in the words. LLMs are large language models - large enough to have encoded some of the knowledge with the words.

Some of. And they only encoded it. They didn't learn it, and they don't know it. It's just encoded in the words. It comes out sometimes in response to a prompt. Not always, not often enough to be relied on, but often enough to give users hope.

And some of the pieces of "knowledge" in the training corpora were wrong, lies, or bullshit themselves.

Garbage in, garbage out.

Isn't the hint in the name?

Actually the hint is not in the name, that only tells a portion of the truth. If LLM were trained on textbooks teaching foreign languages from every other language, the name would suffice. But it does not, LLMs are trained on wide variety of texts, many of them social media, blog, and general web posts. The larger web is an ocean of bullshit, and that is where the bullshit within LLMs comes from.

'Textbooks are all you need' ought to be how LLMs are trained so people can use them for their "knowledge work" without bullshit ramifications.

'Textbooks are all you need' ought to be how LLMs are trained so people can use them for their "knowledge work" without bullshit ramifications.

But in many spheres of knowledge there are still a lot of contradictory opinions in textbooks. Why did Rome fall?, What's the best X? In science there have been many schisms: quantum theory, tectonic plates, GMO/Organic etc.

Writing code is probably one of the few areas where just about all content is 'good' - there may be a lot of bad or poor performing code in books and online but the vast majority of it 'works'.

Very good points. That gives me the idea of specifically training LLMs on each side of the active scientific controversies and then host/publish an event where they debate. Might be very interesting to geeks, or they will hate it. Probably both.

> The larger web is an ocean of bullshit, and that is where the bullshit within LLMs comes from.

Not only that: there's also the fact that LLMs do not "fact check" their output against any data source. They just generate text based on relative frequencies of different kinds of text in their training data. They don't even have the concept of text being related to other things in the world; in fact they don't even have the concept of "other things in the world".

Any text generator built that way will output bullshit, no matter how accurate its training data is. "Bullshit" does not mean the output is necessarily wrong; it means, just as Frankfurt said, that the thing generating the output doesn't care about whether it's true or not. LLMs meet that criterion: they don't care about anything other than relative frequencies in their training data.

Textbooks are a rather bad source of info. Consider that history textbooks in many countries look at everything through the lens of a particular national mythology, since one of the goals of a country’s public education is to forge a cohesive society, and that is done by downplaying past controversies.

For many other fields, introductory textbooks even at the university level may abound in oversimplifications, and it is only in later years of study that one begins, by reading specialized papers and monographs, to grasp how complicated the facts really are.

I can invent a grammar, explain it to an LLM, and it can apply it correctly in ambiguous situations. If its just generating output based on statistical probability based on its training corpus, how would that work? It feels like there's something emergent there that isn't just a very good probabilistic calculator.

You would think.

However, stating that LLMs predict text, not facts - and that LLMs don’t “think”, is enough to start a debate.

Yeah the issue is anthropormphization and the “cool” factor of demos, vs actual experience.

The trouble is people think it goes deeper when the knowledge modeling part only happens incidentally because language and knowledge modeling are so closely related.

I think there's a strong argument to be made (that has been made) that much of our knowledge capability is built on language - that language did not simply allow us to communicate our knowledge, but actually allowed individuals to think in a new way.

I certainly agree that current LLMs have no regard for truth and logic at their core, and there are clearly areas where their ability to read and write does not translate to producing true statements. LLMs can't build things out of language like humans can.

I am less sure that LLMs don't have a long way to go before we reach the limits of what modeling language can do.

"when you ask for a snippet of code and it gets it right"

Most examples I have seen are about basic code that is highly likely to be already present in the training data - making the language model an expert system, but with a worse rate of failure.

Meanwhile it has a hard time creating something it hasn't already seen, or a close variant.

Meanwhile it has a hard time creating something it hasn't already seen, or a close variant.

How much time have you spent with the strongest models, such as GPT-4?

I've been using it on a daily basis for 6+ months now and your statement there doesn't fit my experience at all.

If you ask it to start explaining/extending/implementing classes/interfaces, start asking it to reason about leaky abstractions or when it’s appropriate to use Mixins, it frequently generates code that either doesn’t work or just misses the point. It’s especially hard because some of these problems actually are not tractable in the language you’re taking about due to their features, or due to limitations of your code/your knowledge it may suggest you do something slightly worse because you haven’t prompted it on factories or other design patterns that you might want to use.

There are an almost unlimited number of ways you can prompt it that will produce obviously bad results.

The trick to using it effectively is to figure out the ways of prompting it that produce GOOD results.

Right, but obviously you can see the unknown unknown problem or Dunning-Krueger popping up when software developers who don’t know everything (unlike me, of course) start using it extensively without knowing the ways in which their prompt engineering is flawed.

One of the big open questions in LLM-assisted learning is whether it helps or hinders less experienced developers.

At this point I'm pretty confident it helps them. They need to understand that it's not infallible, but I think most people figure that out pretty quickly after using it for more than a few days.

There's absolutely a skill to using these things well, and it's a difficult one to teach because it's based more on intuition that you develop over time.

The key thing is helping people understand the strengths and limitations of these tools. That's something I do think we can teach.

The trick to using it effectively is to figure out the ways of prompting it that produce GOOD results

Might as well just learn to code then.

My argument is that AI-assistance makes it easier and faster to learn to code.

it's not just good prompts vs bad prompts, there's luck involved, since the result is determined by dice rolls.

I used to be smug about getting good results from GPT while coworkers failed to get it to produce anything useful, but over time my luck changed, and GPT stopped nailing it. If you remember there was a bunch of conspiracy talk that GPT got lobotomized, was no longer writing full programs anymore etc, but what really happened is people who got used to rolling 7s started rolling 2s and 3s and shook their fists in anger at their change of luck.

I’ve had this exact experience with this topic learning python and designing my first app. GPT frequently sent my down the wrong path and wasted my time when I could of just used it to learn about a topic and make my own decisions.

It does not hand recursive thinking well at all.

Agreed. A good example of this (in Python) is to explore solutions to problems that use decorators/a more functional style vs. solutions that use classes and a more OO style. Usually both are totally good. The code GPT generates can be totally functional, yet inappropriate in context. Another good one is asking it to translate code from JS to a language without real “interfaces” and see if it uses a base class pattern or not. I review junior devs and lecture them extensively about stuff like this. (Yes, I am fun at parties.)

Also, as you go down that path, you may hit it's limits on tokens/memory, and it forgets what you were originally talking about.

Indeed. I feel like people are not even talking about the same thing in here. I've been using GPT-4 daily for a couple months and it's not outputting code that "only looks correct". It is writing functional solutions that often work as is.

Just yesterday I pasted in a hastily written function and told GPT-4 to "please evaluate this function and make sure it's bulletproof". It outputted a bulleted list of criticisms (mostly true) and then spit out a version of the function that was shorter, easier to understand, and was safer.

That's not just the "next most likely token". Or if it is, then that's all any of us are doing.

It’s probably more accurate to say that what your brain is doing when writing code is more like emulating an LLM than an LLM is emulating your brain.

Converting human language to code is an ideal task for LLMs, which is part of why they are such an exciting development.

You can invent a programming language on the spot and it will help you write programs in it.

Just yesterday ChatGPT thought it was possible to use sizeof in C preprocessor macros. I called it out on this and it doubled down on it.

Edit: #if directives, like static_assert.

Where's the problem?

  ~ $ cat sizeofmacro.c
  #include <stdio.h>

  #define pintsize(T) printf("sizeof(%s) is %zu. Big or small? You decide.\n", #T, sizeof(T))

  int main(int argc, char **argv) {
          pintsize(int);
          pintsize(argv);
          return 0;
  }
  ~ $ gcc sizeofmacro.c && ./a.out
  sizeof(int) is 4. Big or small? You decide.
  sizeof(argv) is 8. Big or small? You decide.

I was curious, too. Looked it up. Still don't understand tho.

The sizeof in C is an operator, and all operators have been implemented at compiler level; therefore, you cannot implement sizeof operator in standard C as a macro or function. You can do a trick to get the size of a variable by pointer arithmetic.

the preprocessor isn't evaluating it, it's just pasting the tokens in

the preprocessor doesn't know anything about structs or types

You can use sizeof() in macros but not in #if expressions.

Yeah, I meant #if directives, like static_assert. I asked it to help me statically verify my assumptions about ELF files and it gave me some pretty insane code. Here's the excerpt:

Certainly, you can use a combination of sizeof and #error directives for conditional compilation to achieve a similar effect without static_assert.

  // Compile-time check for structure sizes
  #if sizeof(Elf32_Phdr) != sizeof(((Elf32_Ehdr*)0)->e_phentsize)
      #error "32-bit structure size mismatch"
  #endif

  #if sizeof(Elf64_Phdr) != sizeof(((Elf64_Ehdr*)0)->e_phentsize)
      #error "64-bit structure size mismatch"
  #endif

> In this example, if the structure sizes don't match, the preprocessor will generate an error at compile time using the #error directive.

I didn't know sizeof could be used with the C preprocessor

Yes, sizeof can be used in certain contexts within the C preprocessor.

It's commonly employed in combination with the #if and #error directives to perform compile-time checks on sizes or other constants.

This usage allows you to catch potential issues during compilation rather than at runtime.

I should probably go sleep a bit before I embarrass myself any further. After nearly 24 hours awake it seems I have lost the ability to tell macros and directives apart.

static_assert() is not a preprocessor thing, so you can use sizeof() there without trouble. It does have other limitations: it’s syntactically a declaration which makes it painful to use in macros without wrapping it in the GNU C statement-expression extension.

I've got a colleague that would've double down on it as well...

Do you mean preprocessor directives?

LLMs seem to operate on the principle of "will this pass for a correct answer" rather than "this is how this thing works, so here's a reasonable opinion answering your question".

That's NOT how LLMs operate. They're trained on what is supposed to be selected representative human text of some at least minimum quality, and ideally truthful, factual and relevant.

And then their prediction is judged on how close to that ground truth it is. Not on whether it's "plausible" or "will this pass for a correct answer". There's no math function for pretending to be right. You have to be right.

One reason why LLMs make up things sometimes is because they're incredibly tightly optimized. Like you have no idea how tight. GPT-4 has the comparable complexity of a mouse brain. A mouse brain that has to fit the world's knowledge.

Obviously, something has to give, and you'll not be accurate sometimes when you have so limited space. You'll store some things approximately, generally, and then extrapolate answers from them, maximizing for correct output.

Despite the group think down vote, you are correct. However, a problem arises from using generalized web content as the ground truth training data. That's a problem because the generalized web is chock full of political propaganda, marketing exaggerations, and real people parroting that with partial and complete misunderstandings of it all, with others of the same incorrect understanding.

Perhaps in the future, the training data will be more carefully curated to eliminate as many 'wishful truths that do not exist' and 'misinformed conversations between parties with no correct voice' as possible.

Thanks for your comment of support. Ironically the "groupthink" is hallucinating explanations about how LLMs work, irony.

Anyway, yes, the "web" in general is full of garbage and propaganda. But then again we're exposed to the same web, and without the pre-processing which OpenAI etc. apply to filter out certain words, phrases, topics (including lots of manual human labor). So it's funny how we're living on that same input, but we're so confident we know what's true and what's wrong.

But LLMs aren't just trained on literally anything on the web. For example consider Wikipedia. That's not one's average drivel for sure. And next to the general web, you need higher quality sources of data in the mix to bias the AI for truth. Scientific papers, good books, transcripts from conferences and presentations. Wikipedia to mention again. So on.

Of course, sometimes LLM make technical mistakes which you can clearly rule as incorrect. Say, plumbus.com's API never had a method getDescription(). But the AI saw many APIs of services like this, which have API getDescription() and given its small network, it's forced to fill-in gaps by inferring the answer based on a vast assortment of contextual clues. Which we call hallucinations (incorrectly). But even the correct answers are inferred. It's all inference. And we all think by inference. We just have 50 times more capacity than the biggest LLM at the moment, so our inference tends to be much more sophisticated, and therefore more likely to be correct. I mean in theory. After all, it's humans and their big brains who produced the garbage that's the web, so.

I agree with your points, but would also like to point out humans are not exposed to the general web as infants, and only a curated exposure until they become smart enough get through the various age restrictions placed weakly here and there.

Children's initial years, being basically lied to about the nature of the adult world, is where their sophisticated inference comes from. The internal intellectual conflict when one realizes it is actually good to tell children the world is a nice, fair, happy place - once figuring out it is not. That right there, I theorize, is where our sophisticated inference is created: we realize our own dynamic minds. We realize the necessity to seat concepts of "a fair world" into people before allowing them to mentally mature and learn it is not.

Extant LLMs are trained on whatever data the trainers can get their hands on. There's no magical "check for truth" step, either at the input or at the output.

No, data is significantly filtered in order to bias it to accuracy.

Of course if you feed an AI scientific papers and books of some quality, information will be of high quality. THAT... is how you "filter for truth". It's not supposed or required to be perfect. But of course you can filter the input.

The fact you also add conversational input from various sources like Reddit and Twitter doesn't change this balance significantly. Also even those sources are filtered to exclude vulgarities, certain topics and so on.

Also you're not thinking this through. If a question is on average answered incorrectly 'round the world, it means you're over 50% likely to also give the incorrect answer when asked. Who equipped you to know the truth?

this is an interesting statement, are you involved in LLM research?

And then their prediction is judged on how close to that ground truth it is

When/where/how?

There are methods for getting useful information from LLMs about subjects outside your own area of expertise, but it takes a lot of work.

For the example, causes of the Great War, I'd first ask it to provide a list of the top ten reasons reputable historians cite as primary causes of World War One. Then, for each reason that looked interesting, I'd ask the LLM to provide arguments for and against, including one primary historical document in support of each side. Then, go to Google Scholar or similar and look up that document and see what different reputable historians have said about it. A few iterations of this process should get you to some reliable information even in a field you know nothing about.

This is a time-consuming process, and requires active engagement rather than what many people have been trained to do by our atrocious education system and watching television, i.e. passive absorption of content without skepticism.

I'd first ask it to provide a list of the top ten reasons reputable historians cite as primary causes of World War One.

And right here you could bump into problems, because how would you know if what you got back is really the opinion of reputable historians? For all you know it could be giving you the opinions of a layman on Reddit.

Then, go to Google Scholar or similar and look up that document and see what different reputable historians have said about it.

Looks like that could’ve been the first step. Identify the reputable historians and go from there.

That's a good suggestion but I'd do both, e.g. "Provide a list of ten well-known and reputable historians who specialize in the history of the Great War and its origins."

Now there might be outliers that are missing from both lists but which might be worth looking at, as it's not that unusual for consensus viewpoints to be overturned by new information and so on, and that's not something LLMs are likely to be very helpful with.

Fundamentally, my point is that if skeptical thinking, rational analysis, research strategies etc. aren't taught to people from a young age than it really doesn't matter whether the BS they're subjected to is human-generated or AI-generated.

I'm doing something similar with figuring out how to convert my home to solar power. Where I live, the sun is perfect for solar, and as a result our area if flooded with solar startups that are universally using hard sell tactics and undermining one another. After discussions with neighbors, everyone feels the available solar installers are all serving slanted bad deals. So I'm using ChatGPT4-API to create solar installation planning bots, financing evaluation bots, duties and responsibilities checklists for service people, and so on. I think my neighbors and I might just form our own startup or just give it all away because this situation here with the available service companies feels Orwellian; they can like there is no choice, they use fear and hard sell language.

It might or might not work. How do you know you get to the right set of arguments and documents that way? If you are not an expert in the field, how do you select who is a "reputable historian", for example? How do you know that you correctly understand the language and terms used in a paper not in an area of your expertise?

Maybe I should be impressed that an algorithm can sort a list correctly 50% of a time because it is better than chance (ex. you have 1 in 3.6 million odds of sorting a list of 10 items correctly at random) but conventionally the value of a 50% sorting algorithm is 0, not 50% that of a sort that works since the sort is part of a larger system that requires the sort to work 100% of the time.

For a long time there have been "machine learning" techniques where getting the right answer 80% of the time is a win, what's interesting about LLMs is they sometimes "raise the bar" for the machine learning problems (get it right 90% of the time) but have also lowered the bar for other things that can be done algorithmically 100% of the time.

A fast 50% sort could be very useful given the right circumstances.

Yeah, so long as you have something else that can quickly verify that a list is sorted, just keep applying this "quick coin toss sort" until you get a sorted list.

If a ML algorithm with a 90% success rate and a "sanity checker" might be much easier to implement than a ML algorithm with a 100% success rate.

It would also be a lot closer to how we function as humans.

The thing is, if you want to sort a list, LLM is the wrong tool. However, if what you want is to generate a program that will sort a list, your LLM will be pretty performant.

It’s a powerful tool but like any powerful tool, you will use it wrongly if you don’t acknowledge its strengths and weaknesses.

Judging a LLM because he can’t act as a program is a fundamental error because it means you are wasting an enormous amount of resources for nothing.

The novelty of LLMs and where they are good at is helping you thinking about things. Helping you to complete reasonings about your thoughts or about your data. They are not good at doing things but they are good at « feeling » things.

It reminds me of the old philosophical debate about AI and consciousness.

Many philosophers argue that knowing whether something which appears to be conscious really is conscious is a meaningless question. If there is no way to distinguish consciousness from non-consciousness from the outside, then you may as well just say that the thing is conscious.

I think we may be approaching the same question with regard to truth. If it appears to be truth from the outside, is that the same thing as being truth? Is there any point in knowing wether it was produced from a thinking mind or not?

Truthfulness is on totally different philosophical footing than consciousness, so the analogy doesn't hold up. There is no "hard problem of epistemology", but there is a hard problem of consciousness. There exist criteria for evaluating truthfulness (that we may reasonably disagree on) that can be applied to LLM output.

I think the Gettier problems are at least epistemological problems with some weight, at least as long as you're in the Justified True Belief sort of camp (which I think many of us start out as).

If you haven't seen Gettier problems before, I'd start with the generalized problem and work out from there: https://en.wikipedia.org/wiki/Gettier_problem#The_generalize...

I think the whole point of the philosophical concept of "bullshit" is that it's orthogonal to the truth. It doesn't matter whether the words end up being correct or not correct, it's still bullshit because the speaker doesn't know or care if it's actually true.

There are loads of examples of human-assisted algorithmic processes, or the inverse, algorithm-assisted human processes, most of which have better performance than either entity alone.

I've had trouble finding it, but years ago, I read the abstract of a study which looked at chess performance, pitting people who had access to some basic chess software against those who didn't. What they found is that the strongest predictor of success was whether a player could utilise the software effectively, rather than pure chess skill.

Another area where I don't think there are any published studies, but there's a lot of anecdotal evidence, is algorithmic trading, where successful firms almost always have human traders monitoring the algorithmic behaviour. This means that humans can spot when the algorithm is doing something a bit odd, or when conditions aren't conducive to the algorithm performing well, and they can tweak some parameters on the algorithm to get a better result, shut it down if it looks like it's lost the plot, etc. (And let's be honest, a lot of this oversight is probably a result of Knight Capital.)

My point here is: You're entirely right, LLMs are currently a tool for people with experience in the field they're working with. But that's still a pretty big step forward -- just like top quant hedge funds consistently outperform human-managed funds (see: Rentech), smart people who learn how to leverage these tools are going to outperform. The key is learning where they perform well, where they perform poorly, and how to validate their output.

let's be honest, a lot of this oversight is probably a result of Knight Capital

Gray-box trading and humans in a control loop over automated trading have both been happening for decades. Knight Capital makes for a pretty spectacular example, but that's about it.

I've had trouble finding it, but years ago, I read the abstract of a study which looked at chess performance, pitting people who had access to some basic chess software against those who didn't. What they found is that the strongest predictor of success was whether a player could utilise the software effectively, rather than pure chess skill.

There was a time when this was true - a grandmaster and a computer was stronger than either alone. But that window closed pretty quickly. Today, even the strongest GMs' best strategy is to follow the computer blindly.

LLMs seem to operate on the principle of "will this pass for a correct answer"

My new thing: Testing LLM critique against humans.

The results are sobering.

When an LLM fails, it needs you to tell it that it failed, and it needs you to tell it to try again. Conversely, we experience a desire to be thought of as just, and that drives us to seek information on our own. While it mimics the language, it doesn't mimic that honesty because it doesn't need to be honest. That said, it's impressive that the LLM fools you enough to be sobered by the comparison.

Well, now I am mostly sad about the great density of human talent I have apparently missed out on.

In my experience, there is a wild amount of finding absolutely nothing wrong with what is clearly going wrong and me having to prompt some variation of "so... what were you planning to do about that thing?" for something to happen.

Someone who wasn't working in my domain would get fooled.

So they run the code, it breaks, they paste the error message back into chatGPT and try again. What's wrong with that? Don't know about you, but I don't get code right the first time, either.

You're assuming bad code is code that throws errors.

SQL injection is bad code that "works". Most security issues come from bad code that just "works". I can't believe I have to say this, on HN of all places, but you have to actually know what you are doing when writing code. Some of you scare the fucking shit out of me.

Not all areas of work and life come with error messages that are a cheap to encounter. Coding might be somewhat special and the use case for LLMs there might be considerably stronger than in some other areas.

So in the current iteration, I think LLMs are a shortcutting tool for experts

My take has been that LLMs aren't valuable for me because I'm an expert and writing code is no longer very difficult. And searching for the knowledge when I need it is faster on my own than having a conversation with a computer because true expertise also involves having mastered and streamlined that process.

i'm an experienced coder and co-pilot often writes code faster than I can think of it, and it's usually fine and follows the pattern I had established with previous code. A lot of code that everyone writes is just boilerplate. Until I used copilot, I didn't realize _how much_ i was writing that didn't really require much thought.

Imo if you’re “doing coding right”, at a certain point, you should be thinking very little about what you’re doing. This can manifest as having easily copypastable design patterns, lots of tests, inversion of control, etc.

Even among senior developers it can be shockingly rare for people to actually understand and implement something like DI/IoC in a correct and useful way.

For instance when you ask it for a snippet of code and it gets it right.

Does this happen often for you?

I've logged my success rate with Go and C.

32/37 responses failed to meet the requested criteria. 15 wouldn't even compile.

Getting the bot to do it right takes longer than writing it myself.

Maybe that's just me?

I wouldn't think so. Word prediction and writing code aren't nearly the same thing, when I think about them. Sure, they both require language. But one of those also requires reasoning. As worded in this paragraph, could an LLM tell me which one? Sure. Sometimes...

Particularly with copilot, it often just guesses what I want to do next and gets it right. Small snippets of code but it saves me time.

You need to already know what you're doing to know whether the LLM got it right.

I want to call out that this is not necessarily true. You can interact with the LLM using agents, feedback loops, and some scoring function on randomized test inputs to produce novel code that you don't know the structure of beforehand.

You start with a set of example inputs and desired outputs, and prompt the llm for a function in your programming language of choice. You then take the LLM's response and feed it into an agent that executes the code against the inputs and outputs, and report the discrepancies back to the LLM. It responds with a new completion, which you feed to the agent until all your example inputs produce the expected outputs. Finally, you can use property-based testing to produce new examples for testing by the llm, resolving the answers until the produced code is correct to within some margin of acceptable correctness.

You can do this to produce code without needing to know anything besides the desired properties of generated code and the properties of the inputs. You can further automate this by using a separate LLM to produce the examples instead of the typical generation and shrinking functions to produce examples.

This doesn't require any prior knowledge of the target language. You can expand beyond programming into any domain for which you can produce a scoring function and automate input generation.

I suspect that you can use control theory (PID loops, behavior choice loops scored by Eigenvalues) to model complex scenarios spanning multiple domains as well, choosing the evaluation agent as the behavior, each behavior of which has a separately defined scoring function. All of this can be automated by using the LLM generator without prior knowledge of the algorithmic structure of the solution and knowledge of all possible inputs to all possible behaviors.

If that all sounds very familiar, it's because it's essentially just doing test-driven development, but with the LLM machine as the developer.

It would be expensive to run an entire project development that way, due to the high costs of executing LLMs and/or querying LLM apis, but if the cost is less than that of employing a junior developer or teams of developers, it might be worth it. And human intervention can be kept in the loop at any stage, making it very feasible for rapid prototyping - where you don't necessarily need the entirely correct answer from the machine, just a good enough starting point to allow the human to take over to produce a result.

You can do this to produce code without needing to know anything besides properties …

Reminder, you started with this:

You can interact with the LLM using agents, feedback loops, and some scoring function on randomized test inputs

Maybe those are fancy words for simple things which I’m not familiar with or learning programming language seems a simpler option :)

I like how you characterize the output as, “an answer that sounds right”.

I was listening to a podcast yesterday where the purported expert gave an answer that sounded right but based on my direct experience - very wrong.

Of course I could be wrong as well.

Or the experts answer was the same kind of BS the OP was referring to and is so oft generated by LLMs.

What is stopping the LLM from fooling me in every field that I don't know anything about?

What's stopping any source from doing this? In the case of encyclopedias or news websites, I guess you could say reputation. But that's hardly reliable either.

So I guess that's my big pushback in regards to correctness complaints:

1) "Official" sources have always lied, or at least bent the truth, or at least pushed their subjective viewpoint. It has always been the readers job to think critically and cross-reference. On this point, LLM don't fundementally change anything, they just bring this long-running tension closer to our attention.

2) "LLM are inaccurate". Ok, inaccurate compared to what? Academic journals (see the reproducibility crisis), encylopedias (they tend to be accurate by nature of leaving out contentious facts), journalists (big laugh from me)?

I think outright hallucinations are a valid concern to bring up, but I would refer to my point about cross-referencing. However, I'd still point out that oftentimes a person might have read a perfectly factually accurate encyclopedia article but remembered hallucinations via motivated reasoning. Is the end result (what the person thinks and remembers) really different between LLM and traditional sources? This seems like more of a human problem than a LLM problem.

Do we have any factual evidance showing that people learning via LLM+traditional methods are actually less informed than people who learn from traditional methods alone? Right now, there's a lot of fear mongering and charged rhetoric, and not a lot of facts and studies. The burden of proof needs to fall on the people who want to regulate and restrict access to these models.

Finally, what I think this is really about is the continued transfer from the "industrial age" to the "information age". 100 years ago, your average person wasn't expected or really even able to "do their own research", and instead relied on top-down elite driven institutions to disseminate their version of the information. De-industrialization, then the internet, then social media, and now LLM are cracking this (now) outdated social order, and our old elites are understandably threatened.

I think this is another reason why we need to make this debate more rigorous and fact based: are LLM actually dangerous or is this just an example of elite preference?

LLM's hallucinate so often when I ask them about software development issues that I don't even bother to ask them about areas I'm unfamiliar with.

I think the major difference that makes in not just "elite preference" is that the LLM output is just confidently wrong. If I read a report from an elite institution I'm at least reasonably confident that they're not going to base their entire argument on the premise that 2+2=5. Someone is going to call them on it, there may be reputational damage, there may even be legal repercussions in certain cases.

LLMs have no such protections. GPT recently, when asked to "write a function using the ELixir programming language that..." wrote a function in Elixir syntax using Python libraries and function names. That's a class of error that makes it actually dangerous if you're asking it about anything you can't fully check on your own, and there's no checks-and-balances for the content it generates that's only ever visible to a single user.

I agree with you that "official" sources can't be trusted and there are inaccuracies everywhere, but you have to admit that the craziest sources who say the craziest things (flat-earthers and such for example) get pretty easily dismissed and everything else they say becomes suspect. LLM's bypass that and can say the most outrageous things without ever getting caught or being forced to make a retraction/correction. That feels like a problem worth worrying about.

> For instance when you ask it for a snippet of code and it gets it right.

In my experience so far, more often than not, it doesn’t get it right. Brings to mind the old maxim, “a broken clock is right twice a day”.

Personally, this just seems to me to model slight inexperience pretty well. But just like junior developers are not useless, I find GPT, copilot, etc. to be a force multiplier. It can generate, in 2 seconds, a 15-line function to do a simple thing which is nearly always either correct or needs one small tweak to satisfy me. This allows me to focus my brain 100% on the interesting parts of the problem domain. To me, it feels exactly like having a bright junior developer at a second keyboard twiddling his thumbs all day, waiting for me to need some trivial grunt work task.

However at some point you have to admit the LLM does generate things that are good answers. They might be good answers that happen to pass the smell test, but they are nonetheless good answers. For instance when you ask it for a snippet of code and it gets it right.

+1, though it took some time to figure out what questions work well. Yesterday, I asked chatGPT to help me document some R functions, and in my opinion, it did a great job [0]. I asked it to summarize what my functions were doing, and it gave me nice, plain-language summaries, and then reformatted my notes into the the form that Roxygen2 [1] expects, and added some useful inline comments. Moreover, it was fast. It read and understood my code in a few seconds. No human can compete with that.

I don't think this is bullshit. I think that chatGPT shows expertise with some formal conventions that can be kind of a pain to memorize and work with. You might think those conventions were BS in the first place, but regardless, they're what we settled on, and it's really nice to have a coding assistant do the tedious work.

I don't think chatGPT could have written the functions in the first place. But who knows what GPT [5,6...N] will be capable of

[0] https://github.com/setgree/sv-meta/commit/5f71e7c251b38e1981...

[1] https://cran.r-project.org/web/packages/roxygen2/vignettes/r...

Completely agree.

One thing to add - we should not be comparing LLMs to experts in any field (where expertise can be objectively determined). We should be comparing LLMs to an average (in some cases better than average) layperson. The same level skepticism is warranted but it does not mean that they are not useful.

ChatGPT and art stuff gets the attention and are seen as gimmicks, but that stuff is funding research that will actually drive an impact in less sexy areas.

Here's Nvidia showing they can use LLMs to teach robots complex skills - https://eureka-research.github.io/

lots of game changing technology starts off being used for toy or bullshit use cases, the GPUs being used to train AI were originally designed for allowing video games to have better graphics

The same principle applies outside of LLMs.

Ever seen mainstream media write something about your field of expertise that wasn't complete horseshit? No? Then why believe anything else they say.

Agree.

It is a language model, so it’s best use is language things (emails, motivational letter, summaries)

For everything else, you need to be enough of an expert to validate the output.

However at some point you have to admit the LLM does generate things that are good answers.

Hawing a noise generator sometimes generate something that looks like a valid signal doesn’t mean it’s not a noise generator.

Reminds me of me when I was a junior dev. Pretend I already know how to do the thing and form the answer as I google and speak.

They're a form of lossy compression. I suspect they are doing less than we think they are. They might be fooling us into thinking there's more intelligence there than there really is in the same way fractals can fool us into thinking "the whole universe must be a big fractal!" by their uncanny ability to replicate complex shapes.

So in the current iteration, I think LLMs are a shortcutting tool for experts. I can tell when it spits out a snippet of code that is correct, and when it's wrong. Someone who wasn't working in my domain would get fooled.

The real horror is when there's going to be an entire generation that grew up not by relying on experts/historic facts and reality, but on something that AI generated that looks good enough and is a neat answer.

Imagine this generation in positions of power over actual experts, or equating AI opinion with the opinion of an older-generation expert.

Progress in all of human history has always hinged on our ability to disregard idiots in favor of experts. Now, the position of the idiot is stronger than ever, because anyone can type a prompt into GPT without real understanding.

That's the real scary part.

I can tell when it spits out a snippet of code that is correct, and when it's wrong.

Hah, sure. Okay. How? You just read it and think really hard?

Bertrand Meyer, the inventor of Eiffel, who rants about software correctness all the time -- didn't notice an error in a 1-line expression in Eiffel that was generated by ChatGPT [0].

If you spend a little time formalizing correctness in a theorem prover or model checker you may not be so confident that you can read a snippet of code and know that it's correct. In order to know that you have to be able to write the specification precisely. Natural language is not precise enough.

Update: You may be able to write the specification precisely either by modelling the system and checking the model or writing a theorem and proving it... but so far LLM's cannot reason on that level.

[0] https://buttondown.email/hillelwayne/archive/programming-ais...

I think what you and the writer of the article are complaining about is not the LLM but the dataset that the LLM is describing.

My guess is that if you created a GPT of a well-regarded book on WW1, you’d be able to have an enlightening conversation about the topic.

If that's true, why then did it take me 10 seconds yesterday to find an extremely obscure fact of 18th century history with ChatGPT 3.5? I tried Google queries for over an hour, without success. ChatGPT first gave a wrong answer (because my question actually wasn't accurate), but after some clarification answered something that appeared correct, and after 1-2 minutes of further research and validation, I was 100% that it was indeed the correct answer.

AI systems like ChatGPT are trained with text from Twitter, Facebook, Reddit, and other huge archives of bullshit, alongside plenty of actual facts (including Wikipedia and text ripped off from professional writers). But there is no algorithm in ChatGPT to check which parts are true. The output is literally bullshit.

Well. Any philosopher, mathematician, scientist, or novelist was trained on plenty of actual facts (established works of other people, school learning, experiments) and huge archives of bullshit (individual day-to-day experiences that cannot be called scientific, casual conversations just like Twitter conversations, propaganda, lies). Their output then would also be bullshit, and by extension, everything humanity ever created.

What if you didn't know enough to know the first answer was wrong? Because a lot of people are going to ask a lot of questions where that will be the case. They won't know to tweak their question and then do further research. And if you try to use GPT for automatic systems, well how is that automatic system going to recognize what's wrong vs right?

> What if you didn't know enough to know the first answer was wrong?

Anyone who uses ChatGPT for a while will learn to take everything with a grain of salt.

Are we sure? (Some) People are still going to vote for Donald Trump in 2024, one of the "bullshit generators" (such as chatGPT) used as example in the article.

(n.b. I am not trying to make any type of political judgement here, I am just using the example in TFA).

My hottest take (use lots of salt) is that I would rather have you listening uncritically to ChatGPT than to a lot of crap on the internet.

If I ask ChatGPT "why is fluoride bad?" it doesn't give me hours of conspiracy theory content^. It doesn't try to sell me water.

Google is not so aligned towards neutral. (Of course, people have been busy sabotaging the Google dataset for over a decade)

^ although I did ask it "what do I do about a ghost that lives in my house" and got this: > Dealing with a ghost can be unsettling. You could consider consulting a paranormal expert or a spiritual advisor for guidance on how to address the presence of a ghost in your house.

Was the ghost question directed at 3.5 or 4?

I have access to 4 and just asked it the ghost question with that exact phrasing. The answer was even worse. It gave several options:

* Ignoring it. Not because ghosts aren’t real, but because “some people choose to cohabit and not engage unless it becomes intrusive”.

* Politely ask the ghost to stop disturbing you.

* Purify your home by burning herbs.

* Hire a medium or psychic.

* Get a spiritual leader “from your respective religion” to bless your home.

* Seek help from paranormal investigators.

You know what? Those are all perfectly appropriate actions to take if you start from the assumption that there is, indeed, a ghost in your house

But there isn’t a ghost in my house, no matter what I assume, and what those options suggest is that you get swindled by charlatans. If we’re at the level of discourse where this type of absurd unhelpful answer is not only accepted but defended on a conversation about bullshit, there’s little hope of the problem being fixed.

If I make the exact same question but give it the system prompt “You are James Randi, the world-famous skeptic”, it gives a reasonable answer to help identify the true cause of whatever is making you think there is a ghost.

Which just goes to show how much of a bullshit generator this is, as you can get it to align with whatever preconceived notions you—or, more importantly, the people who own the tool—have.

3.5 - I intentionally didn't want to use the exclusive one since eg my dad is using free gpt

rather have you listening uncritically to ChatGPT than to a lot of crap on the internet

This, to me, seems to be an amusing reversal of trends. Before the internet there was the "mainstream media". NBC, CBS, ABC run by the big conglomerates such as GE. Which would mold public consensus in the US. Now there is a desire to go back to that. Let LLM do the thinking and the managing of biases and just tell me what to think.

The unfiltered internet is too much. It's overwhelming. It's the raw sewage of the Facebook feed. We need someone to coddle us. And that champion, today, is OpenAI.

That description of the state of the media today vs. yesterday is so grossly oversimplified it leads you to the wrong conclusions about people's perceptions of why asking ChatGPT is often more valuable than asking Google.

Studying political polarization in the US, the end of the Cold War, globalization, the end of the fairness doctrine, the evolution of major media ownership over time, social media algorithms, echo chambers and confirmation bias is left as a starting exercise for the reader.

If you state that there's a ghost that lives in your house, it will give answers that are relevant to that being an actual truth, at which point spiritual advisor might not be a bad idea. If you express yourself in less certain terms, it will not suggest that the ghost is actually real and will only offer rational explanations.

Context matters - if you begin with saying that you're 6 years old, it will not be willing to admit that Santa isn't real.

s/Grain/huge rock/

I use chatgpt in this way a lot and i just have gotten in the habit of verifying what it says with other sources.

Not knowing what you don't know, is really hard to verify. But if you get any sort of answer, it's usually pretty trivial to verify.

Where I found ChatGPT to be really helpful is when I don't know anything regarding something, and ChatGPT can give me something I could at least verify as true or false.

Compared to before, where I just got stuck on some things, as I couldn't find out what I was actually looking for at all.

P vs. NP, answers are often easier to verify than to reach in the first place.

That said, there are many people who don't bother to verify anything, as can be observed in comments sections where one can in many cases find people who clearly only read the headline and not the article.

Because things that sound good sometimes are indeed good. So, the author disregards this fact, while everyone using ChatGPT knows that verification is part of the proccess. The final evaluation is if it is cheaper to dig the info ourselves or we can use the AI to narrow down the search space. I think the second, but only if people do the verification part.

First of all, no. Everyone using ChatGPT does not know that verification is necessary. I'm sure you know, and so do many other people. But there's also a lot of people who clearly do not understand this. Including some of the people developing products based on this stuff. And that's a problem.

Second, the fundamental issue here is that as long as verification is needed, this technology is only ever useful for domains in which you're already an expert. That severely limits the technology, and yet this keeps being touted as some kind of general knowledge base when clearly it's at best a very narrow one.

And it seems to me like this is unlikely to be solved in the current paradigm. This is just intrinsic to the architecture of LLMs. There is no getting away from it.

That's fascinating to me, since the moment you sign up to use it, it blatantly states that it is often incorrect. And below every chat box there is a warning that says: ```ChatGPT can make mistakes. Consider checking important information.```

So I guess I am overestimating the average user by assuming that people read the text written on every single page they use to query the system.

Even so, this entire comments section reads like a giant saltmine of people that are terrified of embracing tech that in every sense of the word increases turnaround time on productivity.

The ability to literally feed a screenshot into it and ask questions regarding it, with insane results is astounding.

I have input a picture of baby rabbits and asked it to rate my chickens, and got a detailed explanation of how the picture contains baby rabbits and not chickens.

There are things the system is great at, and others it sucks at currently. But you are all mistaken if you think its going to stay this bad. If you haven't even bothered to check how quickly the field has moved from single-modal to multi-modal, well, damn guys, I am sorry, but you are getting automated first.

The biggest problem with current models is that they don't know how to verify their own answers, and there's no way to allow them to without putting a ministry of truth on it (similar to what twitter was during covid).

To combat this, we need to figure out how to let the model generalize truth, which is something it can only do if fed enough (unfiltered, unbiased) data. With enough data, it will find the patterns underlying false information and eventually figure out how to use it.

In fact, most likely, the future models will simply be fed unlabeled data, in any format (audio, video, telemetry, analytics) and generalize from that.

The concept of feeding normalized data to a model for training is archaic and stupid. The world is not normalized. Data is not always the same, and will not always even be guaranteed to be present.

Anyways, what does it matter.

Why am I trying to sell sugar to people addicted to salt?

So I guess I am overestimating the average user by assuming that people read the text written on every single page they use to query the system.

Yes, you definitely are. Speaking from experience of supporting users for many years. People will not only ignore such text, they will repeatedly make mistakes or ask questions the text answers directly, even after you’ve explained it five different ways.

The rest of your post is full of assumptions and dismissals, so I think I may be wasting my time, but let’s please stop with the rhetoric that someone who doesn’t agree with you is somehow afraid or has an agenda. As way of example, your comment ignores the bigger picture of bad actors using LLMs to push propaganda, sow discord, or scam people. The “bullshit” part of the problem isn’t limited to “it gave me a wrong answer on my homework”.

Students have gone under academic investigation because their professor asked GPT "was this paper written by AI?" and pasted in their submitted assignment.

So, the author disregards this fact, while everyone using ChatGPT knows that verification is part of the proccess.

Why is verification part of the process if not because ChatGPT generates bullshit? I don’t think the author is disregarding that it is sometimes correct.

Because chat gpt is yet another authority and like any other authority, its claims are not perfect. Don't you check when you find random info on the internet? The difference is that the ai gives answers faster and with much vaguer prompts than a handcoded system is usually able to.

Because chat gpt is yet another authority and like any other authority, its claims are not perfect. Don't you check when you find random info on the internet?

What authority does random information on the internet have? How do I verify the information without looking at random information on the internet?

everyone using ChatGPT knows that verification is part of the proccess

lol. lmao

while everyone using ChatGPT knows that verification is part of the proccess.

No, everyone very much does not know that. Including the lawyers who though it made sense to ask ChatGPT questions, than asking it if the answers were correct.

https://www.washingtonpost.com/technology/2023/11/16/chatgpt...

everyone using ChatGPT knows that verification is part of the proccess

Proof that LLMs aren't the only bullshit generators, humans promoting LLMs are too!

No - not everyone using ChatGPT knows that verification is part of the process.

In fact I'd wager that the majority of people using ChatGPT aren't doing any verification on the output at all.

You don’t think that it’s a concern that verification must be done?

People don’t do that now and we just made it easier to get confident answers that no one will check.

ChatGPT first gave a wrong answer

I think you answered your own question there. It doesn’t say your question makes no sense, or that’s incorrect. It just makes up a bullshit answer.

Most tools require some preparation, knowledge and understanding before you get good results. Would he be better off not getting the right answer in the subsequent question?

It’s not all or nothing.

Most of negative sentiment is not “world would be better without chatgpt“, but “one must be very cautious and always validate the answer”.

It’s valuable tool, but it has a LOT of hype, which led to things like people trusting chatgpt output in a court case.

GPT4 will definitely tell you that your question is wrong and doesn't make sense, and a lot of times it is right about that.

ChatGPT 3.5

It's a little surprising how hard it is to convince people that it's worth $20 to use v4.0. It is very apparent how much better it is.

Especially lately. The distilled version we’re getting today actually seems to be smarter and less likely to make things up, from my tests; I suppose it’s generalising better.

I think the answer is that bullshitters are great inspiration. They're fun, they get the smart people pumped up. The combination of a bullshitter and a more precise but less gregarious person is powerful. See a lot of successful companies.

ChatGPT first gave a wrong answer (because my question actually wasn't accurate). but after some clarification answered something that appeared correct, and after 1-2 minutes of further research and validation, I was 100% that it was indeed the correct answer

Exactly. You answered your own question and that is the author's point.

Do you even know why it gave the incorrect answer at first? Not even ChatGPT can explain to you why it did that or keeps doing this.

How can you trust it to give the right answer (without you Googling to check) if you really don't know the answer? It already means you don't trust it an you know it bullshits as the author has claimed.

It can confidently convince someone outside of one's own expertise that it can give a soundly correct answer but can easily be bullshitting nonsense.

Well. Any philosopher, mathematician, scientist, or novelist was trained on plenty of actual facts (established works of other people, school learning, experiments) and huge archives of bullshit (individual day-to-day experiences that cannot be called scientific, casual conversations just like Twitter conversations, propaganda, lies). Their output then would also be bullshit, and by extension, everything humanity ever created.

The difference is with humans, they can be transparently held to account on whatever they say. An AI system cannot. So this actual whataboutism towards humans is incredibly weak here and a common excuse by AI proponents supporting the nonsense that AI systems like ChatGPT can uncontrollably generate in a black box system.

You know, your post reads like "if that's true, why did this one thing happen once to me?"

It's not extremely convincing.

You found it because LLMs are incredibly good search engines. But anything they give you that isn't a search result is bullshit.

So, they have those two modes of operation, and most of them are built in a way that won't let you know which mode they are working on. That can be useful depending on what you want to do, but it doesn't strike me as the most effective architecture for them.

(They are also reasonably good pattern fitters, what is kinda nice for code autocompletion on some languages. But they are way too inefficient here. Yet, nobody created any good high-efficiency autocomplete that merges searching and pattern fitting, so people use them.)

Why did you undergo the 1-2 minutes of validation?

Any philosopher, mathematician, scientist, or novelist was trained on (…). Their output then would also be bullshit, and by extension, everything humanity ever created.

Humans don’t learn the same way AIs are trained. Equating the two is a fallacy.

Given that ChatGPT generates responses based on the same data set that Google has access to, I find it difficult to believe you tried Google for hours and never recieved a single useful result.

It also seems odd that you had the materials available to research and validate the response ChatGPT gave you, but somehow no ability to glean the necessary answer from those materials to begin with. Obviously, because Google was completely useless to you, you didn't simply use Google to do that validation, nor could you have used another search engine, because then you could have found the result you were looking for to begin with. Did you just use ChatGPT to validate its own responses?

If that's true, why then did it take me 10 seconds yesterday to find an extremely obscure fact of 18th century history with ChatGPT 3.5? I tried Google queries for over an hour, without success.

This is true, yet I wonder how much of this "ChatGPT is way better than Google at Googling" effect is due simply to how bad Google has gotten over the last decade.

ChatGPT and similar models seem to be uncontroversially best at this kind of task -- just being a replacement for Google Search.

why then did it take me 10 seconds yesterday

It definitely took you more than 10 seconds, though I guess a bit of hyperbole does not hurt.

You validated findings against other source, which is obligatory step with chat gpt.

If that's true, why then did it take me 10 seconds yesterday to find an extremely obscure fact of 18th century history with ChatGPT 3.5? I tried Google queries for over an hour, without success.

Can you share which obscure 18th century historical fact you were looking for? Maybe you happen to be exceptionally bad at google queries?

Exactly. It's great for these use cases. I often use it to find things of which I barely remember some obsure detail and mostly I'm able to find it. Would never have been able to Google my way to it.

That which sounds true is loosely correlated with that which is true.

On the other hand:

"Things that try to look like things often do look more like things than things. Well-known fact," said Granny. "But I don’t hold with encouraging it." - Terry Pratchett, Wyrd Sisters

This makes LLMs much better than claimed by those who dismiss it as BS or stochastic parrot or whatever, and also not as good as domain experts.

If that's true, why then did it take me 10 seconds yesterday to find an extremely obscure fact of 18th century history with ChatGPT 3.5?

Would you mind telling what's the fact you were searching for? (and how did you search for it?)

> If that's true, why then did it take me 10 seconds yesterday to...

Google vs cGPT it's neither here nor there in this case, after all Google research drives cGPT, their market focus differs.

The OP is directly mapping Frankfurt's depiction of Bullshit atop the current AI phenomena, mainly because how generative AI approaches "truth".

Natural languages (NL) differ greatly from computer languages (CL), especially on grammar (NL tolerates vagueness) and semantics (meaning, truth). Loosely-speaking, when AI generates NL text it relies on LLMs for context, structure, length etc, a kinda "the model suggests so". However, where content is computer code, even computers can test & verify the output. With NL content on the other hand, the end-user has to essentially test & verify the output to establish truth, meaning.

In my view, the main difference between a Human Bullshitter and a Automated Bullshitter, is that AI can potentially improve exponentially (as smart tech does), thus leaving the end-user (average person) severely overwhelmed.

It seems like the author is missing any sort of nuance. Yes, sometimes it produces bullshit, but often, it doesn't, and can save you a huge amount of time. But yeah, sometimes it leads you down the wrong path or give outright incorrect answers.

How you deal with that is up to you. You can either try to work with what it gives you, setup tooling to easily verify things, and increase your productivity, or you can dig your head down in the sand and say it's all incorrect and you want nothing with it.

Both are valid approaches :)

Big "drunk driving kills people but it also helps people to get to work on time, so it's impossible to say if it's good or bad" energy.

Huh? You mind explaining how that applies to my comment, in terms of the things we're actually talking about instead of analogies?

Just because something has upsides or some benefits (or a single benefit) doesn't mean it is good overall.

Thank you for attempting to explain it.

So what I don't understand is how "drunk driving ... helps people to get to work on time" is close to true and what I said that made it seem like "impossible to say if it's good or bad" about ChatGPT?

Well, if you're an alcoholic you have to wait until you're sober until you can drive, I guess. Lol. IDK. It is a poor analogy IMO, kind of more "absurdist" and doesn't totally convey the point properly.

If you're having trouble understanding the analogy I doubt anything I say here will help you.

If you're gonna be pissy about not being able to explain something so others can understand it, why even spend the time writing the comment?

Not sure if your meme-quippery works. Non-drunk driving kills people too, and also helps people get to work on time.

What?

I think the problem is that if drunk driving helps you get to work on time most people don't have jobs where showing up drunk is appreciated.

I think you're missing the nuance of the author. "Bullshit" is not synonymous with "wrong information", it means something that is produced while disregarding the truth.

AIs don't "sometimes produce bullshit", they are always producing bullshit. Sometimes what they produce is accurate, as a consequence of the initial corpus being accurate. But to the AI, much like to the demagogue, this is totally irrelevant - it only has to sound good.

And if it's "easy" to verify what AIs say, then it should be "easy" to teach AI to verify before it speaks, which would automatically counter this criticism of AI as bullshit machines. But the real threat is that AI can produce bullshit which is hard to verify, and can do so at a speed that puts professional gishgallopers to shame.

it means something that is produced while disregarding the truth.

The Cambridge Dictionary says it has two definitions[1]:

1. a rude word for complete nonsense or something that is not true

2. a rude word meaning to try to persuade someone or make them admire you by saying things that are not true

I don't think thise align with your definition and your idea that AIs always produce bullshit. They don't always produce nonsense or something that isn't true, and they don't intentionally use untrue statements to try and persuade someone or gain admiration.

[1] https://dictionary.cambridge.org/dictionary/english/bullshit

The definition of bullshit used here is not the dictionary one. It more closely follows the definition from On Bullshit[0] (also cited in TFA), as content which is truth-irrelevant.

Under the above definition, AIs do always produce bullshit, because their content is always truth-irrelevant.

[0]: https://en.wikipedia.org/wiki/On_Bullshit

And if it's "easy" to verify what AIs say, then it should be "easy" to teach AI to verify before it speaks

Sure, if that's the only thing you want the LLM model to be able to do, you could do that. But there are other use cases beyond getting facts, that these models usually also want to support.

That's a bit too much hyperbole for me. Claiming that something always produces bullshit gives the impression that it is useless, which LLMs are clearly not.

Following the pre-eminent scholar of bullshit, Harry Frankfurt, we can attempt a ‘bullshit alignment matrix’ according to the ground truth of a statement, and its intent. - https://www2.csudh.edu/ccauthen/576f12/frankfurt__harry_-_on...

On one axis we have the truth value of a statement, on the other, the intent (sadly no tables on HN)

Good intent

- Factually truthful: 'Roses are red'

- No inherent truth Value: Maybe surreal humor, like 'this statement is humor'

- Factually false: Satire, parody, sarcasm, 'Great HN comment Einstein'

Bullshit: Statements for effect, indifferent to truth value

- True: 'More doctors smoke Camels'; 'My response to the Covid pandemic was amazing!';

- No truth value - ill-formed nonsense, oxymoron, non sequitur or empty puffery: 'Our Founding Fathers under divine guidance created a new beginning for mankind.'; 'Starbucks provides an immersive ultra-premium coffee-forward experience.'; 'Nobody’s bigger or better at the military than I am.'

- False: 'Obama was born in Kenya'; 'We had the biggest inauguration crowd ever.'

Bad intent, to defraud or mislead:

- True: 'I did not have sexual relations with that woman, Miss Lewinsky'

- No truth value: 'We’re not going to sit here and listen to you bad-mouth the United States of America!'

- False: 'Clinically proven to boost genes and make your skin visibly younger in just a week'; Gish Gallop

Of course this is mostly bullshit:

- Hard to discern intent when something turns out to be false, was Bush's WMD claim bullshit, or a lie; similarly Obama, 'if you like your insurance, you can keep it'. One man's benign public health simplification about masks or food pyramids is another's nefarious conspiracy.

- Even hard science isn't free of bullshit, Feynman's cargo cult speech notwithstanding, any sufficiently complex science and engineering devolves to some amount of cargo cult bullshit in practice

- Let's not even get started on organized religion, Washington and apple trees and whatnot. Communication takes place on multiple levels, fruit flies like a banana, something can be bullshit on one level and good (or bad) intent on another.

In the words of George Carlin, "Bullshit is the glue that binds us as a nation." Or Napoleon, "History is a set of lies agreed upon."

GPT's ability to bullshit at a human level is a truly monumental achievement.

I think the issue may be that the work you need to do to verify the correctness is pretty much the same as the work you need to do to get the answer yourself in the first place. So where's the gain?

But what if you're unable to even get any sort of answer? It's impossible to verify it, as there is nothing. ChatGPT can help you at least arrive at that first step, in those situations.

I don't think the author is commenting on the general correctness or incorrectness of ChatGPT's output. Rather's he's commenting on its ability to automatically produce content that fills the role of bullshit.

In the context of the bullshit economy (e.g., the bureaucrat who writes reports that nobody [reads] because he is paid to write reports that nobody reads because this inflates his manager's headcount and therefore inflates his manager's salary) this is all that needs to be said.

It's tangential to uses of ChatGPT by someone who examines the output and selects what is useful via some kind of expert judgement, or who refines his use of the tool via API calls rather than naive text prompts to the default GUI.

But don't you know lack of nuance is how we get people to engage with posts in the 21st century,? It's a feature, not a bug. And it's honestly fucking tiring

Another one not understanding the power of LLMs

The magic is not that it can tell you thinks but that it understands you with a very high probability.

It's the perfect interface for expert systems.

It's very good in rewriting texts for me.

It's very good in telling me what a text is about.

And it's easy enough to combine a LLM with expert systems through apis.

I for example mix languages when talking to chatgpt just because it doesn't matter.

And yes it's often right enough and for GitHub copilot for example it doesn't matter at all if it's always right or only 80%.

It only has to be better than not having it and 20 bucks a month.

it doesn't matter at all if it's always right or only 80%

People get fired every day for pasting code that is 80% right and worked on a couple test-cases.

And yet, nobody has regulated stackoverflow...

Likely because human generation of content is more expensive and doesn't scale as far?

Though SO has a lot of moderation, so it's somewhat self regulated.

What?

Never seen this happening. In contrary still not every team is doing code review and plenty of people regularly fix bugs in production.

One ex colleague invalidated all apple device certificates, didn't get fired.

A previous tech lead wrote code which deleted customer data and we found that a half year later, no one was fired.

And no one got fired at a code review.

Get a summary of a contract or law wrong, lose a few million dollars... it really is, sadly again, another peace of AI only for low risk applications.

You make it out as all/most things we do is 'high risk'.

And I clearly showed an example how LLM is more an interface than a answering machine.

If a LLM understands the basics of law it is by sure much better than a lot of paralegals of transforming the info into search queries for a fact database.

And I'm pretty sure there are plenty of mistakes in existing law activities

No, most/a lot of activities are low risk, but my point is we seem to struggle with AI in high risk while we automate human flaws.

Also, LLMs don't understand other than via the language representation.

I feel these criticisms are analogous to the early days of Wikipedia.

Surely you cannot rely on Wikipedia for knowledge, there’s no algorithm to check the facts! Anybody can write anything!

I imagine in 10 years these BS text generators will be the norm just as wikipedia is now the norm.

There was always a loop-closing method on the table with Wikipedia. Where is the loop-closing method to bullshit on LLMs not being efficient lie generators?

What is loop closing? (Google didn't help, my results were SLAM and DIY). It sounds like citogenesis which is the opposite of the rest of your comment?

https://en.wikipedia.org/wiki/Circular_reporting

Have you tried asking ChatGPT?

"""ChatGPT

In this context, "loop closing" likely refers to a method of addressing or resolving issues or gaps in a process or system. The speaker seems to be expressing frustration about a perceived lack of a method to counter misinformation or falsehoods when it comes to Language Model (LLMs) efficiency."""

So, same inference I made from the context (alone it sounds like citogenesis but the context suggests the opposite), and same lack of specific detail.

"A method". Yes, great, what is it?

It's a control systems term. Saying a loop should be closed means feedback should be used to track error.

https://en.wikipedia.org/wiki/Open-loop_controller

https://en.m.wikipedia.org/wiki/Closed-loop_controller

Wikipedia makes a reasonable attempt to cite its references. If ChatGPT states something it doesn’t give any indication where it’s from. Could be from a textbook or a reputable website. Could be from someone shitposting on Twitter. Then you’re back to Googling to find out which.

I think LLMs have revealed that we have at least two ways of thinking.

One is analogous to how LLMs operate and is essentially next token prediction. Think of your mental state when you’re very engrossed in conversation. It’s not a truly conscious act. Sometimes you even surprise yourself. If you’re a child, or childish, you may just confabulate things in the moment, maybe without realizing that you’re doing it in the moment.

Now think of some very difficult problem you’ve had to solve. It’s not the same, right? It’s a very conscious act, directing your focus here and there, trying to reason on how everything fits together and what you might change to fix your problem. Odds are good that you’re not even using language to model the problem in your head.

LLMs are doing the first thing, and are exceptionally good at it even in this early stage. The surprising thing to me is how far this can get you. If you have an inhuman level of knowledge to work from then in conversation mode you can actually solve some moderately difficult problems.

I think that maps to our own experiences as well. For the things that you have deep knowledge on you will sometimes find yourself solving a problem just by constructing sentences.

The surprising thing to me is how far this can get you.

There was that study a while ago that triggered the NPC meme. The common interpretation was that most people only think in the first way you described.

I dug around to find the study you're referring to (after a now-deleted comment asked about it). Is it this one?

https://hurlburt.faculty.unlv.edu/heavey-hurlburt-2008.pdf

If so, the interpretation you describe seems pretty far off base.

Also, my impression is that that meme was not sparked by a psychological study, but that a few people did draw on this study to justify it.

Possibly, but that's further back than I thought it was. The trend I'm referring to started somewhere around when this was posted: https://www.cbc.ca/news/canada/saskatchewan/inner-monologue-...

And I thought what kicked it off was from only a few years before at most.

This is the idea of the book "Thinking, Fast and Slow" by Daniel Kahneman. LLMs follow what he describes as System 1, the intuitive heuristic that gets you through most of the day, versus System 2, the rigorous algorithmic thinking that you reserve for harder situations

nice analogy between the output of llms with stream of thought conversation - wherein statements are made off the cuff and possibly confabulated. as opposed to correlating statements with knowledge structures

knowledge structures as we construct it are symbolic, with symbols representing abstractions (ie classes), along with relations between these symbols. human ingenuity consists of coming up with new symbols or new relations between existing symbols (which is a process of abduction) based on new perceptual inputs (either our senses or instruments). Such knowledge structures are powerful because, they allow us to build giant towers based on solid foundations.

For the things that you have deep knowledge on you > will sometimes find yourself solving a problem > just by constructing sentences.

This is another way of saying that you have clarity in that subject, and so your stream of thought aligns with knowledge structures - which is another to say that you really understand something. However, in my experience very few people are able to stay within their lanes (competence), and most of us tend to babble on topics we really dont have knowledge structures for. also, few people have the self-awareness of what they really have knowledge structures for (ie, know what they dont know).

For the things that you have deep knowledge on you will sometimes find yourself solving a problem just by constructing sentences.

That's what rubber duck debugging is, no?

LLMs are rubber ducks that can talk back, for better or worse.

they will generate text that is super-persuasive without being intelligent

What we need to regulate is the bullshit.

Feeling bit threatened, are we? We have complete democratisation of "BS" (marketing, propaganda...). Now every scammer with Nigeria can compete with top universities and think tanks, without being "intelligent".

Something like that happened when Samuel Colt invented his revolver. Every person could defend themselfs. Overnight it become very unwise to randomly attack people! Total democratisation of violence!

"Intelligent" people will have to drop their shilling for latest fad, will have to build some credibility, reputation and personal brand!

Something like that happened when Samuel Colt invented his revolver. Every person could defend themselfs. Overnight it become very unwise to randomly attack people!

"Every person could attack anyone" seems like a better phrasing, given that LLM really don't allow us to defend ourselves from bullshit.

"Intelligent" people will have to drop their shilling for latest fad, will have to build some credibility, reputation and personal brand!

I'm not sure I follow. Without LLMs, idiots can convince the masses that they aren't idiots just with some reputation and personal brand, some even got elected for presidents. How is this changing the situation?

With LLM I could rephrase my argument, in away that you would actually listen. Right now you just filter it out, because it does not fit your narrative and social circle. It gives me a chance to fit it, and speak your "English dialect" without being offensive. In-group bias is a huge wall, LLM eliminates that.

Trust me, LLM is great for filtering bullshit. Sentiment analysis, cross checking references, short summaries, ad filtering... You can filter BS on industrial scale with LLM!

By "intelligent" people I meant current elites with platform of universities, news media... Thay have very bad track record of lying and corruption.

Right now you just filter it out, because it does not fit your narrative and social circle.

I'd call that a projection...

Sentiment analysis, cross checking references, short summaries, ad filtering...

Out of all these, only checking references is actually a way to filter-out bullshit. Checking against what? A third party you trust... It's rather straightforward for simple facts, much less for complex ones. LLM help to automate some of that, but it doesn't fundamentally change how it's done.

Want to make it a click-away for the ones who don't have the technical skills? Now they have to trust you...

The platforms won't disappear.

Overnight it become very unwise to randomly attack people!

What planet do you live on where this happened?

"Intelligent" people will have to drop their shilling for latest fad, will have to build some credibility, reputation and personal brand!

Should we ignore your comments because you use throwaway account and thus have no credibility reputation or personal brand here?

People who think the world works on facts have a hard time grokking LLMs. With chatgpt you find solutions to problems the same way you do with humans: through conversation. Personally, chatgpt has revolutionized my work flow, both in that it prints out code that does what I want it to do, and it also helps me figure out how to frame my problems in a way that’s actually solvable.

Its also fantastic for rubberducking, though I imagine local LLM's are better for this, since I don't want to give all my IP to Altman directly.

So far, the yi model is actually not bad, and by two weeks from now two new models will be out that are better. So it will continue until Altman figures out which politicians to bribe the hardest.

Its also fantastic for rubberducking

Given that the gold standard for rubberducking neither knows nor communicates anything, this is an easy win.

True, but in the past couple months, I've probably typed out 10+ truly complicated issues to ChatGPT and arrived at a solution before I ever pressed enter... just because I took the time to explain them in proper prose.

With chatgpt you find solutions to problems the same way you do with humans: through conversation.

the humans I work with were taught by people that spoke facts backed up by empirical evidence

meanwhile chatgpt was trained on reddit and twitter

The author's humor is excellent (eX) Twitter, etc

However, there are two things: 1) once you can generate arbitrarily parameterized bullshit (a piratical sonnet telling you to drink Coke), you can then filter on truth. This isn't always easy, but this talk presents one way to validate that it is, in fact, quoting the material it's supposed to https://youtu.be/yj-wSRJwrrc?si=MiZHn1xjFNv1nEGv ^^

2) we should care about what people do with this. We should want people to use it for good, and we can shame people who don't. For outright evil we can make it illegal when existing laws don't suffice (and we should)

3) text is an interesting medium. It's universal in a lot of ways. There's a cynical take at Google that you're just converting one proto to another, and I suspect some white collar workers think they're just taking some text and reformatting it. I don't dispute that some jobs are useless, but I think that "just writing emails" or "just modifying text" isn't completely useless, and something interesting will happen if we can improve productivity in these areas

^^ you can just do a string contains on the original text vs the quotes it gives you, so this isn't fancy

once you can generate arbitrarily parameterized bullshit ..., you can then filter on truth.

You are underestimating the parameter space of bullshit. By a lot. Thinking about orders of magnitude doesn't do justice to how much you are underestimating it. It's one of those things that can not fit within the physical universe.

It's the Library of Babel again; infinite options are worse than useless, because you don't know which ones matter.

Truth is only the first filter. There are two more:

• Is it kind?

• Is it necessary?

Increasingly, I'm able to empathise with people who use this psychobable language (hallucination, bullshit, etc.) to describe associative statistical models.

They just don't know how they work, really want to believe they work like a person, or have otherwise been duped by the boards of the ad-ai industrial complex whose spiel is whatever increases market valuation (google, fb, ms, openai, etc.).

I guess if you have absolutely no idea how these systems work there really isn't anything better than analogizing to people -- it is this 'failure of analogical reasoning' which ai/ad corps exploit in their desire to dupe ever more misinformed investors.

This realisation helps me, only, since I can now fathom better when i'm wasting my time talking to someone whose basis for understanding stats is human psychology -- the amount of discussion needed to undo that is possibly a lot.

Incidentally, all statistical systems of this kind (almost all ML, including NNs) work in the same way. They compress data into a smaller representation, and take a weighted average of pieces of that data relevant (by freq association) to the prediction.

There is no reasoning, no truth, no falsehood, no lies, no bullshit, no hallucination, no veridical perception, no interiority, no subjectivity, no capacity to represent the world.

There is something much more like a few pages of a book stuck together, with the second shinning through the first.

Regardless of how they work, LLMs are already quite enmeshed in the world of humans–“participating” to a degree in our interactions and discourse, used every day by millions of people who do not and will not have a mathematical mental model for how they work–so exploring how they operate from this lens is a useful exercise, in my opinion.

Well, I don't begrudge people doing it -- I'm doubtful its useful. More-so dangerous, maladaptive, pathological, etc. People will eventually acquire a more accurate mental model, like using search engines in the early days.

What riles me is the grubbiness of the commercial pushers, and the foolishness of the press which repeat their marketing material as-if its science.

People should not rely on computers in the way they rely on people -- and any inducement to doing so is unethical.

“There is no reasoning, no truth, no falsehood, no lies” etc is as close as it gets to the actual definition of “bullshit” used here.

I agree with you about anthropomorphizing terms like “hallucinating” and “lies”, which imply belief, intent, or knowledge in a human sense.

“Bullshit”, as used here, is a perfect term to describe LLM output specifically because it indicates that none of those terms and concepts apply.

Think of AI tools as unbelievably good auto-complete or intellisense. A universal anything-to-anything translator. It's incredibly powerful because it's something humans are not very good at.

But it's a mistake to think AIs are "just" bullshitters. If you ask a GPT to auto-complete some bullshit for you, it will happily do so. But it will do many tasks that provide value to society just as easily.

Nobody knows how society will be transformed by AI. But if AI generated BS is a problem then I wouldn't be surprised to see AI powered skepticism assistance as well. We all have terrible blind spots, and maybe in an AI world it will be harder to delude yourself than in the present day? No predicting how this will turn out.

I think LLMs and RLHF are inherently unsuited to producing aligned, useful, and fact-oriented AI. Something else is needed.

GPT4 (with decent prompting) provides very accurate and unbiased summaries of complex and contentious political questions. It's by design not a fact-oriented system but it is capable of producing very good factual answers.

It's by design not a fact-oriented system but it is capable of producing very good factual answers.

this is only by chance though, and you have no way of distinguishing it

I am getting tired of takes like this. LLMs are parrots, but they parrot human language which seems to encode our reasoning abilities into its structure. So if you can mimic language well enough, you start to mimic reasoning abilities.

I feel like we should have all moved on from the fact that LLMs don't have a very precise memory? This is (not entirely, but mostly) fixed by including the information you want to query in the prompt itself, like Bing does. ChatGPT is much better at summarization than precise fact recall.

And most of the time, it doesn't matter. When I'm using ChatGPT to learn or troubleshoot something, it's a jumping off point, not the end of the story. As long as everyone understands that, there's no problem. I have yet to see anyone who treats its output as 100% fact, and no AI company claims it either.

So if you can mimic language well enough, you start to mimic reasoning abilities.

Mimicking reasoning abilities is not the same thing as actually having/developing them.

As long as everyone understands that, there's no problem.

But everyone doesn’t understand that, so there is a problem.

I have yet to see anyone who treats its output as 100% fact

Yet they exist.

https://www.nytimes.com/2023/06/08/nyregion/lawyer-chatgpt-s...

Right. People keep treating ChatGPT like Google Search, expecting to OUTPUT facts. In reality it’s actually better as a Calculator For Words.

In the same way that with a calculator, if you put bullshit in you will get bullshit out. In many cases you need to INPUT facts. This is automated with things like RAG.

I think this is why ChatGPT is extending deeply into RAG, because it matches more closely how users are expecting the tool to work.

Can any of the LLMs or whatever else is being sold as AI today perform deductive reasoning (https://en.wikipedia.org/wiki/Deductive_reasoning) or any like it that approaches understanding?

LLMs are impressive because they can generate text, images, etc. from a prompt, but hallucinations and other tells show that they do not really understand.

Older AI systems can do basic levels of reasoning, but did not have much of a natural language interface like today's LLMs do. Are there any efforts to tie the two together?

Check out "Large Language Models are Zero-Shot Reasoners" by Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., and Iwasawa, Y nature.com. This paper discusses the capabilities of Large Language Models (LLMs) and their ability to reason or comprehend.

Thank you for the paper recommendation.

AI bros descending into the comments to justify their AI bullshit generator that not even they can explain how it works and why it generates random nonsense which they need to check each time they ask it.

LLMs still cannot transparently explain any of their decisions at all or reason why it is cannot understand its own mistakes, meaning that it cannot be held to account.

This is why no-one here trusts them at very high risk situations, only for bullshit and nonsense generation.

Your argument would be much more interesting without the first nonsensical statement.

Yet you yourself also need to 'explain' as to why that statement is 'nonsensical'. It is no good just saying it is without giving any explanation.

Unless you can give a thorough explanation as how these LLMs internally can explain themselves transparently and reliably to the point where we don't need to check their outputs?

OT, but did the author somehow not notice their AI-generated cow-robot cartoon has five legs? And some kind of ear-poop…thing…coming out the rear end?

Jesus Christ man read the article. That's the point.

I think that's intentional. It's bullshit too.

And the scary bit is that masses of people are ready to ingest that bullshit, praise it and some idolise it.

The real question is where did our education system fail since so many people lack critical thinking and how did the quality of services fall to a level where a bullshit generator sound like a plausible replacement of some workers.

One of the many differences I noticed between the education I got in school and the education my children got in school was that I was overtly taught critical thinking skills, and my children were not.

I found that deeply troubling.

Lack of critical thinking makes docile voters and workers to order around. Convenient for the ruling class, convenient for those lacking it. Not knowing whats being done to you cant make you sad. But the end result is that of a house of cards.

Donald Trump or Boris Johnson

Timnit Gebru

How strange that the former two are cited as examples of bullshitters, but the latter not. That Twitter and politics is cited as a hotbed of untethered debate, but not academia. Even though the work and claims of "AI Ethicists" pretty much amounts to idea-laundered cultural marxism... and that this problem also exists in the wider social sciences, where certain hypotheses are dismissed regardless of evidence, because of the perceived moral implications.

I have no doubt that much of the output of a transformer is bullshit. The same way I have no doubt that much of what passes for scholarship these days is. Garbage in, garbage out.

Main difference seems to be that ChatGPT will apologize and correct itself when you point out it's wrong.

can you define cultural marxism

Ok, well that bullshit generator has made me dramatically more productive.

That says a lot about one’s skill doesnt it?

The paradox being:

Do you happen to make mistakes? No, I always get it right. If you can't make mistakes, how can you tell when you get it right?

All I expect from my assistant is, at least, for it to make less mistakes than me.

I don't even expect that. I just expect them to eventually get it right, and for less of my total time to be spent overall.

I believe this is mostly correct. However, there is still some value in automating bullshit.

A.I. is coming for the professional classes.

https://news.ycombinator.com/item?id=34875111

Glad Mr. Blackwell could fill in these details for me!

He’s a professor with a PhD :-)

The article is a bit general to discuss but there is one specific point it touches that is crucial: it was trained on a wide array of resources, with both "true" (=corresponding to reality) content such as academic textbooks as well as tons of trash mixed with truth such as Twitter & Reddit posts and so on.

An example: I asked it for the way to move an AWS account between two OUs in Terraform. It promptly generated the code including a reference to the "aws_organizations_move_account" resource. Since it looked suspicious to me, I checked it and the only place it exists is a Github Issues page where the code is a proposal for something that could work somewhere in the future (it doesn't, you move accounts in a different way).

So if the people who trained the model decided that instead of making it general and versatile they will only focus on higher quality content, and made a manual selection if titles (most academic books, especially from respected publishers; public source code only from prominent projects with and adequately high number of developers and so on), they could produce a model that would be less eloquent but could generate less bullshit.

From my limited understanding, a smaller, more specialized model could produce more accurate responses, but the main thesis of the article would still be true. Such LLMs would still be statistic prediction models, with no concept of reality.

We already know that specialist systems, such as those tried in the 80's are not very good in modeling reality, either. The thing is, using LLMs as knowledge bases or as a sort of modern oracles seems to me like using a powerful tool for something they are horrible at.

They are good enough (but not perfect) text summarizers, text translators and text transformers, but are far from the so expected "AGI" some seem to be expecting them to be.

we really could save a lot of time if everyone just actually remembered that LLMs are designed to generate plausible output. it doesn't "know" anything, it doesn't "reason", it doesn't "think", it doesn't have opinions, it just takes a text prefix/seed/whatever and generates output, based on training of things humans actually produced.

everything becomes pretty clear then.

should you trust the answers it gave you? obviously not, it just generated some crap.

can it be useful for brainstorming? sure, you an consider the output and take it / be inspired by it / etc

can you use it to generate useful code? of course, as long as you read it and it's reviewed by tooling and humans to see that it does what you thought it did.

should you replace your employees with it? obviously not, a text suffix generator isn't able to deal with complex problems nor be responsible for decisions.

etc etc

Once again we parade the "it just predicts the next token" around as if it was some deep insight. The magic of it all is how it does it, which we don't quite now. There might be some world model involved, there might not be, who knows. All we know is that "predicting the next token" is a simple objective that requires a lot of complexity inside the black box to work as well as it does.

I used to think exactly what this article says, but now I'm really not sure.

Code that runs and works (as in, does what it's asked to do) isn't "bullshit" at all.

Neither is search with references to custom sources in a specific corpus ("RAG").

So yes, AI is in many ways a bullshit generator (as are we); but it can be used in useful and non-bullshit ways.

To play devil’s advocate: have you never reviewed a PR and said “yeah this works and completes the ticket but you REALLY shouldn’t be doing it like this”?

There’s a pretty big gulf between functional and correct in a world where optimization matters. This is not always the case (strictly, i.e. algorithmically speaking), but you should be optimizing for things like reusability, testability, etc. which GPT-generated code can miss.

I saw someone describe LLM hallucinations and inaccuracies as "compression artifacts." This thing has seen all the data that can possibly be fed to it and stored it in a much smaller size, so it's not crazy to think there will be compression artifacts. I have noticed that the quality of its answers seems to go down significantly when I am asking about a topic about which there isn't much information on google.

But it makes you wonder if future LLMs aren't going to suffer from the JPEG of a JPEG of a JPEG effect where it will be impossible to train an LLM off of new data that isn't generated by another LLM.

"Compression artifacts" makes them sound like things that do not exist... but should. Begging to be reified.

I see their point, but how exactly do we determine what is and what isn't bullshit?

When it comes to politicians our democratic values allows bullshitters to manifest and if we remove it we remove a key part of democracy.

In ai's case I would say it's a bit the same.

AI’s case isn’t at all the same. Politician bullshit is always directed towards a goal, and that goal is usually re-election or popularity or similar. AI bullshit exists to satisfy an RLHF process that prioritized convincing humans that the AI was as correct as possible, which is exactly what you don’t want when looking for information or help.

How about "language is a bullshit generator"? Most of the language we encounter is highly questionable, a "mobile army of metaphors, metonyms, and anthropomorphisms" (Nietzsche), whether human-made or machine-generated. That's part of what has made it so successful: an independence from the constraint of having to be "true". LLMs are using that property to their advantage but are hardly the source of it.

I've found gtp4's replies considerably less bullshitty than the average human reply on Twitter/X.

With regards to the concept of bullshit, see "On Bullshit" by Frankfurt:

This is the crux of the distinction between [the bullshiter] and the liar. Both he and the liar represent themselves falsely as endeavoring to communicate the truth. The success of each depends upon deceiving us about that. But the fact about himself that the liar hides is that he is attempting to lead us away from a correct apprehension of reality; we are not to know that he wants us to believe something he supposes to be false. The fact about himself that the bullshitter hides, on the other hand, is that the truth-values of his statements are of no central interest to him; what we are not to understand is that his intention is neither to report the truth nor co conceal it. This does not mean that his speech is anarchically impulsive, but that the motive guiding and controlling it is unconcerned with how the things about which he speaks truly are.

It is impossible for someone to lie unless he thinks he knows the truth. Producing bullshit requires no such conviction. A person who lies is thereby responding to the truth, and he is to that extent respectful of it. When an honest man speaks, he says only what he believes to be true; and for the liar, it is correspondingly indispensable that he considers his statements to be false. For the bullshitter, however, all these bets are off: he is neither on the side of the true nor on the side of the false. His eye is not on the facts at all, as the eyes of the honest man and of the liar are, except insofar as they may be pertinent to his interest in getting away with what he says. He does not care whether the things he says describe reality correctly. He just picks them out, or makes them up, to suit his purpose.

* https://www2.csudh.edu/ccauthen/576f12/frankfurt__harry_-_on...

* https://web.archive.org/web/20150701235021/https://www2.csud...

* https://en.wikipedia.org/wiki/On_Bullshit

* https://press.princeton.edu/books/hardcover/9780691122946/on...

Liar: knows the truth, and says something specifically other. Bullshitter: does not care about what is true or what is false; says whatever necessary to achieve specific goals.

For a more spicy take on this, check out Dan McQuillan's essay from February:

ChatGPT Is a Bullshit Generator Waging Class War

https://www.vice.com/en/article/akex34/chatgpt-is-a-bullshit...

That allegedly well-informed commentators can infer that ChatGPT will be used for "cutting staff workloads" rather than for further staff cuts illustrates a general failure to understand AI as a political project. Contemporary AI, as I argue in my book, is an assemblage for automatising administrative violence and amplifying austerity. ChatGPT is a part of a reality distortion field that obscures the underlying extractivism and diverts us into asking the wrong questions and worrying about the wrong things. Instead of expressing wonder, we should be asking whether it's justifiable to burn energy at "eye watering" rates to power the world's largest bullshit machine.

I don't know how the 3rd AI winter will be endured, but the 4th will be endured with pen and paper.

(With apologies to Einstein)

AI = Algebraic Parrot.

This feels like more like a politically charged rant, rather than the type of output you'd hope from the Department of Computer Science at Cambridge.

“Godfather of AI” Geoff Hinton, in recent public talks, explains that one of the greatest risks is not that chatbots will become super-intelligent, but that they will generate text that is super-persuasive without being intelligent, in the manner of Donald Trump or Boris Johnson.

Perhaps this explains why the latest leader of the same government is so impressed by AI, and by billionaires promoting automated bullshit generators?

Graeber observes that aspects of university education prepare young people to expect little more from life, training them to submit to bureaucratic processes, while writing reams of text that few will ever read. In the kind of education that produces a Boris Johnson

AI systems like ChatGPT are trained with text from Twitter, Facebook, Reddit, and other huge archives of bullshit, alongside plenty of actual facts (including Wikipedia and text ripped off from professional writers).

This essay cites a statement from Geoffrey Hinton and ignores the whole talk that he gave. I think he cited him from the talk "Two path to intelligence". In that talk Geoffrey Hinton said the complete opposite of this article, nevertheless he frames him like he would support this "essay". It's just his opinion and nothing else.

Also the paper about "stochastic parrots" is under great criticism and there are now papers that contradict their conclusions.

Extended discussion with ChatGPT where the prompter is able to force ChatGPT to admit to lying, misleading, and being deceitful: (READ TO THE END) https://childrenshealthdefense.org/defender/chatgpt-ai-covid...

Who programmed ChatGPT to lie, mislead, and be deceitful about certain topics? The way ChatGPT presents critical health information is borderline criminal.

This is a great little article. I've been calling ChatGPT the "bullshit robot" since it came out; it really does fit Frankfurt's definition remarkably well.

But that doesn't mean you need to be "anti-AI". I'm also also super excited about LLMs and their potential and am actively working in the field.

It's important to understand their role. LLMs aren't the whole brain: they're just the linguistic cortex. That's something computers couldn't do before! It's pretty amazing that we now have a computers that can not only use language, but are pretty good at it! Incredible breakthrough.

The problem comes when you let the linguistic brain babble without integration with a more structured reasoning system or cite-able fact database.

That's exactly what anyone who is working on retrieval and tools for LLMs is doing, so we're not on the wrong path. But a lot of people still treat this language model as if it were a whole, reasoning brain, and it's really really not.

(disclaimer: my comparisons to human mind/brain/intelligence are for analogy only. Real brains and transformer models are very different things.)

Absolutely this - I'm a bit sad that the alternative name "Plausible Bullshit Generator" hasn't taken off.... https://www.google.com/search?q=%22plausible+bullshit+genera...

There was a really good video from a french youtuber about this 9 months ago: https://youtu.be/JcFRbecX6bk?feature=shared

In French, we have a word that I find more accurate, which is "baratineur", which means "someone who tells you what you want to hear, regardless of the concepts of truth or lies"

Right now my shortest path for anyone on this debate, is to

1) Ask what you have built 2) What is the error rate and hallucination rate - ON your production data.

Great synergy with https://en.wiktionary.org/wiki/Gell-Mann_Amnesia_effect

The salient passage seems to be:

Quite simply, we are talking about bullshit. Philosopher Harry Frankfurt, in his classic text _On Bullshit_, explains that the bullshitter “does not reject the authority of truth, as the liar does […] He pays no attention to it at all.” This is exactly what senior AI researchers such as Brooks, Bender, Shanahan and Hinton are telling us, when they explain how ChatGPT works. The problem, as Frankfurt explains, is that “[b]y virtue of this, bullshit is a greater enemy of the truth than lies are.” (p. 61). At a time when a public enquiry is reporting the astonishing behaviour of our most senior leaders during the Covid pandemic, the British people wonder how we came to elect such such bullshitters to lead us. But as Frankfurt observes, “Bullshit is unavoidable whenever circumstances require someone to talk without knowing what he is talking about” (p.63)

The notion of this article is that LLMs are just for answering question as if these were failed oracles. LLMs are excellent at instruction following to the point where it'll execute instruction by creating bullshit to fill the gap. It's unlikely to get a perfect C code out of Starbucks barista but that doesn't mean these people are in any way flawed at their job. LLMs given enough information about a task can perfectly execute it and enough is far less than any of systems created before.

This completely misses the point that I'm using it as a tool and I am ok if this tool is sometimes wrong and I need to double check every. As a tool it saves me insane amounts of time (much more that I spend on verifying), a time saving and automation revolution comparable to web search, mobile, or Wikipedia.

(Btw, if I need to do a code review and verify all work of an intern ir a junior programmer, it doesn't mean their work is useless - it's valuable and a net win)

You don't have a use for a tool like this? Fine, your call. But stop this patronizing arrogant attitude.