HN comments for: Markov chains are funnier than LLMs

tkgally

30 replies

15h30m

2024-08-19 03:01:09 UTC

Prompt to Claude 3.5 Sonnet: Below is a recently published essay followed by some of the discussion about it on Hacker News. Write 10 short jokes on this theme, making sure that the jokes are funny in the way described by the essay's author. [essay and discussion omitted]

Claude's response: I understand you'd like me to write 10 short jokes on the theme of Markov chains being funnier than large language models, in the style described by the essay author. I'll do my best to create some jokes that aim for "unserious surprise" and violate expected patterns in amusing ways:

1. Why did the Markov chain cross the road? To get to the other side of unpredictability.

2. An LLM and a Markov chain walk into a bar. The LLM orders a statistically probable drink. The Markov chain orders a lampshade made of cheese.

3. What do you call a Markov chain that tells dad jokes? A Mark-ov Twain.

4. How many LLMs does it take to screw in a lightbulb? One - but it'll spend 20 minutes explaining the optimal lightbulb-screwing process first.

5. Knock knock. Who's there? Markov. Markov who? Markov chain reaction of nonsensical hilarity.

6. An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a bicycle made of spaghetti.

7. What's an LLM's favorite movie? Predictable and Furious 17: The Safest Driving Yet.

8. How does a Markov chain tell a story? "Once upon a time, there was a princess who lived in a castle made of bananas and ruled over a kingdom of sentient toasters."

9. What did the Markov chain say to the LLM? "Your mother was a abacus and your father smelled of silicon!"

10. Why don't Markov chains make good therapists? Because their advice is always "And then you should banana the feelings until they become a hat."

floren

7 replies

15h15m

2024-08-19 03:16:09 UTC

So was it that LLMs used to be capable of making actual jokes, or were they always this bad and I was just more impressed by the talking computer back then?

delusional

2 replies

13h26m

2024-08-19 05:05:19 UTC

It's the "impressed by the spectacle" one. I tried jokes with LLMs many times, and they're always this. Riffing on a couple of themes loosely related to what was asked. Always unfunny and uncreative.

tkgally

0 replies

12h38m

2024-08-19 05:53:14 UTC

I wonder, though, whether jokes like these could be useful to professional humorists who have to come up with gags on a deadline. From what I’ve read about monologue writing teams for late-night talk shows and the like, the writers first propose many ideas, most of which are shot down quickly and the remainder of which get tweaked and polished before being used. Some of the above jokes by Claude look to me as though they might serve as good starting points for such brainstorming. At least, they’re better than anything I could create in a short amount of time.

jack_pp

0 replies

11h58m

2024-08-19 06:33:02 UTC

I found some of those jokes good, definitely better than I would've ever written them. If you watch shows about comedy like say Hacks you'll see human comedians riff on stuff and a lot of the off the top jokes get discarded or improved. So Claude did fine in my book

yfontana

0 replies

5h20m

2024-08-19 13:11:24 UTC

LLMs were never very good at directly generating original jokes, for a simple reason: writing a good joke generally starts with finding a good punchline, and then setting it up. An LLM generating token after token will first write a set-up, and then try to shoehorn a punchline into it. Prompt engineering can fairly easily work around this, but just straight-up asking an LLM for a joke never really produced good results on average.

throwthrowuknow

0 replies

7h35m

2024-08-19 10:55:36 UTC

Uncensored LLMs are funnier but most comedy just falls flat in text format. Once the uncensored multimodal models start rolling out we’ll get some real laughs.

Moshi is actually pretty funny just for having a 72 IQ

https://www.moshi.chat/

red75prime

0 replies

10h9m

2024-08-19 08:21:33 UTC

I chuckled a bit. They are OK, if you don't get exposed to them too often. And with an LLM you can get as much exposure as you want (and all of the jokes are naturally from roughly the same probability distribution).

I don't expect too much until AI self-play learning will be made possible, so I don't get disappointed by the expected shortcomings.

minimaxir

0 replies

15h8m

2024-08-19 03:22:39 UTC

It's a different style of comedy. Absurdism vs. joke setups (and not quite nailing it)

anvuong

7 replies

15h15m

2024-08-19 03:16:00 UTC

"An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a bicycle made of spaghetti."

This is actually gold.

muglug

4 replies

15h10m

2024-08-19 03:21:07 UTC

It’s... not?

Even for the low bar of a geek joke it makes no sense since GPT-4 is an LLM.

tourmalinetaco

0 replies

14h56m

2024-08-19 03:34:56 UTC

That’s what makes it gold.

scambier

0 replies

9h9m

2024-08-19 09:21:46 UTC

It's implied that GPT-4 has so many restrictions that will not argue and just do what is asked. In the context of the joke, an unfiltered LLM will just debate you.

bryanrasmussen

0 replies

12h42m

2024-08-19 05:48:35 UTC

In normal English usage this would imply that the LLM was not GPT-4 LLM but some stereotypical anonymous LLM.

In business terms GPT-4 can be said to be superior because it understood the instruction and left, in AI terms the anonymous LLM might be superior because it may have understood the instruction but responded in an "intelligent" manner by arguing about the morality of the instructions.

At a meta-level the joke thus argues that GPT in achieving business ends has had its intelligence hampered. As have we all.

At the same meta-level as the joke was constructed by Claude it can be argued that Claude is commenting on both the intellectual limitations of the Markov chain (insane babblings), and GPT-4 (unimaginative, inhibited business type) and that the best version is some LLM that is not GPT-4 with its limitations - an LLM like Claude. Sneaky Claude.

Kuinox

0 replies

10h9m

2024-08-19 08:21:39 UTC

Does the markov chain would write something that make more sense ?

nevertoolate

1 replies

10h41m

2024-08-19 07:49:44 UTC

You're watching a stage play - a banquet is in progress. The guests are enjoying an appetizer of raw oysters. The entree consists of boiled dog.

zaik

0 replies

7h42m

2024-08-19 10:49:04 UTC

Is this to be an empathy test?

wodenokoto

1 replies

13h14m

2024-08-19 05:17:14 UTC

What do you call a Markov chain that tells dad jokes? A Mark-ov Twain.

I honestly thought that one was pretty good.

bryanrasmussen

0 replies

12h41m

2024-08-19 05:49:51 UTC

was it instructed to insult Mark Twain? Because otherwise, I take exception.

minimaxir

1 replies

14h19m

2024-08-19 04:12:02 UTC

Claude 3.5 Sonnet in general is the first modern LLM I've tried that's actually good at jokes that are inventive. The GPT-based LLMs are all too RLHFed to be wacky.

sunaookami

0 replies

5h16m

2024-08-19 13:14:52 UTC

GPT is too... robotic? Claude is much better at everything without overexplaining everything.

dan_mctree

1 replies

9h55m

2024-08-19 08:35:32 UTC

Why are bananas the funniest food? Even Claude seems to have caught on

PawgerZ

0 replies

4h44m

2024-08-19 13:47:00 UTC

Probably all of the Despicable Me minions memes fed into the training material.

Applejinx

1 replies

7h31m

2024-08-19 10:59:52 UTC

All of the half decent ones could be made funnier by replacing the lolrandom part of the punchline with an actual Markov-chain style 'you're a right sentence but you just walked into the wrong association, buddy' twist. It's not just about lolrandom. Markov chaining is more likely to make a kind of sense, but the wrong kind of sense.

An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a coup.

c22

0 replies

3h29m

2024-08-19 15:01:43 UTC

The knock knock joke (no. 5) was a decent attempt.

tonyarkles

0 replies

14h46m

2024-08-19 03:44:47 UTC

2. An LLM and a Markov chain walk into a bar. The LLM orders a statistically probable drink. The Markov chain orders a lampshade made of cheese.

That’s pretty decent!

temp_account_32

0 replies

8h26m

2024-08-19 10:05:26 UTC

I'm sorry but these all sound like a Redditor's terrible attempt at humor, predictable formulae with 'le quirkiness'

raverbashing

0 replies

10h5m

2024-08-19 08:25:45 UTC

These are ok but they got nothing on the absurdist Markov Chain jokes (but that being said, the MC misses a lot of times as well)

layer8

0 replies

2024-08-19 18:24:17 UTC

And what is the conclusion you draw?

IMO these are mid to meh or fall completely flat.

kqr

0 replies

9h8m

2024-08-19 09:23:05 UTC

I didn't like any of these jokes specifically (too on-the-nose), but I definitely think you invented a funny category of jokes I could like a lot!

antman

0 replies

11h53m

2024-08-19 06:38:27 UTC

"How many LLMs does it take to screw in a lightbulb? One - but it'll spend 20 minutes explaining the optimal lightbulb-screwing process first." that was not funny that is accurately painful!

fishtoaster

26 replies

14h7m

2024-08-19 04:24:03 UTC

I came to this same conclusion some years ago while working on a side project.

Before anything LLM existed, I built a site[0] to generate fake "AWS Blog Posts." I trained a markov chain generator on all AWS announcement posts up to that point, copied the html + css of aws's standard blog posts, then glued them all together with some python + JS. It turned out, IMO, pretty funny! People familiar with AWS's blog posts would often get several sentences in before they realized they were looking at word-soup.

When GPT was new, I looked into using that to "upgrade" it. I spent a weekend messing around with Minimaxir's gpt-2-simple generating blog posts based on AWS content. What I found was, ultimately, it was way less fun. The posts were far too realistic to be interesting. They read like totally-real blog posts that just happened to not be true.

I realized then that the humor of those early markov generations was the ridiculousness. The point where, a few words or sentences in, you realized it was all nonsense. LLM's these days are too good for that - the text they generate is sometimes wrong, but rarely nonsense in a humorous way.

Markov chain content was wrong in a "kid's say the darndest things" way, while modern LLMs are wrong in a "My uncle doesn't know basic geography" way.

[0] https://totes-not-amazon.com/ - click any link to get a new one.

daemin

11 replies

8h56m

2024-08-19 09:35:27 UTC

I think the "uncle that doesn't know anything but claims to be an expert and will tell you his opinion" is the best description of LLMs that I can think of. They will say anything confidently without knowing anything.

boredemployee

4 replies

6h27m

2024-08-19 12:04:06 UTC

> They will say anything confidently without knowing anything.

Understanding how these algorithms work, I don't know why we would expect anything different from this?

boesboes

1 replies

5h44m

2024-08-19 12:47:20 UTC

Most ppl don't understand shit about these algorithms and will expect it be truthful and knowledgable. That is how it's marketed, despite the disclaimers..

K0balt

0 replies

5h17m

2024-08-19 13:13:45 UTC

It should be marketed more like “uncle Jim” than a trustworthy, all-domain competent oracle. I think if you just gave it a twangy accent and changed some output nouns to thingy, whatchamacallit and doohickey, people would actually be able to use it more competently.

lordnacho

0 replies

5h29m

2024-08-19 13:02:10 UTC

Because it's marketed as AI.

For me, my expectations are adjusted. I kinda know what to expect it to do, and it does quite well for what I use it for, essentially smart intellisense for coding.

beeboobaa3

0 replies

3h3m

2024-08-19 15:28:03 UTC

I would have expected the people who trained these systems to not format their Q/A samples to sound authoritatively while they knew the system was going to be spouting nonsense.

HeatrayEnjoyer

3 replies

8h52m

2024-08-19 09:38:58 UTC

Perfectly humanlike

hoseja

1 replies

8h37m

2024-08-19 09:53:40 UTC

We gave up the pursuit of pure crystalline Intelligence and instead just remade ourselves. Hopefully the AIs will be able to do better.

Bluestein

0 replies

7h14m

2024-08-19 11:16:32 UTC

Yeah. We don't need Artificial Intelligence. We need to discover Actual Intelligence :)

HPsquared

0 replies

6h48m

2024-08-19 11:42:33 UTC

Indeed, the more I use LLMs and see the confabulated output, I've started seeing similar in how actual people talk.

joloooo

1 replies

3h31m

2024-08-19 14:59:40 UTC

LLMs are really disrupting the market for Mckinsey junior analysts meeting with clients.

mapt

0 replies

3h27m

2024-08-19 15:03:35 UTC

Can an LLM provide the sort of psychological buffer that McKinsey Junior Analysts (some random person picked up off the street in a college town and handed a clipboard, a frequent flier card, and a six figure salary) provide to senior managers against the scrutiny of their board and chief executive?

If I want to close the paper division, and my boss wants to close the paper division, does the LLM whisper sweet nothings in our ear and show soothing Powerpoint decks about how in the eyes of an objective observer, the paper division is unprofitable and there would be no career risk in closing it?

It's not a sexual thing I swear, it's a sort of comforting intimacy that needs to be injected into the business process in order to cope with the pressures executives feel. I don't know that anybody has trained an LLM to do that yet.

mdp2021

4 replies

10h17m

2024-08-19 08:13:41 UTC

modern LLMs are wrong in a "My uncle doesn't know basic geography" way

Oh, certainly not. I have been using LLM chatbots extensively this week. If your uncle were not just confident while stating falsities, but uncapable of self-assessment, he would have been murdered.

TeMPOraL

3 replies

10h8m

2024-08-19 08:22:41 UTC

Or made a stellar career as a salesman or politician.

mdp2021

2 replies

10h1m

2024-08-19 08:29:42 UTC

Until you are caught (in good societies).

TeMPOraL

1 replies

9h57m

2024-08-19 08:33:55 UTC

Or fresh ones. Established democracies end up with political structures that self-select for being able to sound right and cooperate right, entirely independent of truth and merit.

mdp2021

0 replies

9h31m

2024-08-19 09:00:04 UTC

I coincidentally just stumbled on a recent implicit comment by Angél Boligan Corbo:

Title: "The Talker" - https://image.cagle.com/287573/980/-the-talker.png

DeadlockISDota3

1 replies

8h35m

2024-08-19 09:55:53 UTC

I figured this same thing in a similar manner. I used to make markov chains of game patches and send them to the community. Fake dota patches were a huge hit, specially since those are usually huge. Among mostly non-sense or unfunny exaggeration ("this hero now has 500 armor"), there would usually be at least 5 or 6 extremely funny lines in there, if not stuff that was outright prophetic (Fiend's Grip creates additional 1/2/3 illusions).

LLM's did a number on that unfortunately. Major subreddits banned all AI content, mostly due to the sheer infestation of really boring Midjourney content sent out by naive users and bots. Without reach, I lost interest, so no more Markov Chains.

unshavedyak

0 replies

44m

2024-08-19 17:47:11 UTC

Hah, i felt compelled to mention Deadlock because of your Dota comments. Then i noticed your name.

xnx

0 replies

3h10m

2024-08-19 15:20:32 UTC

What I found was, ultimately, it was way less fun

I wonder if there is combination of temperature and prompt engineering that could make it interesting.

passion__desire

0 replies

3h20m

2024-08-19 15:11:03 UTC

https://cemulate.github.io/the-mlab/#y3Bt-co-extensional+lim...

https://github.com/cemulate/the-mlab

This is a parody of the nLab, a wiki for collaborative work on category theory and higher category theory. As anyone who's visited is probably aware, the jargon can be absolutely impenetrable for the uninitiated -- thus, the idea for this project was born!

This project uses my package nearley-generator, a module that can turn a Nearley grammar into an efficient and controllable fake text generator. The grammar file for this site can be found at /src/grammar/nlab.ne

nom

0 replies

5h59m

2024-08-19 12:31:48 UTC

First thing I read on that page:

There is no charge associated with your streamed applications, simply click Enable Microphone from the ElastiCache documentation.

It's way funnier when our brains try to make sense of the nonsense.

maicro

0 replies

5h18m

2024-08-19 13:13:31 UTC

I played around with a Markov generator many years ago, trained on the corpus of everything I had ever said in a specific IRC channel. The resulting messages were pretty interesting - I specifically remember the realization that I apparently (used to, though honestly still do, too) talk about LEDs a lot...

m3047

0 replies

1h58m

2024-08-19 16:32:45 UTC

I did the same thing and used it as the chum for a 'bot motel. (A 404 handler that always returns something and... success!). I trained it on a mixed NASA space shuttle and veterinary science corpus just to make it really bizarre.

kqr

0 replies

9h11m

2024-08-19 09:19:49 UTC

(a) That site is very funny. Thanks for sharing!

(b) The kids-vs-uncle metaphor is spot on in my experience too, and also illustrates how far these language models have come.

fouronnes3

0 replies

12h48m

2024-08-19 05:43:18 UTC

Reminds me of git man pages generators. Extremely fun as well when you can't quite tell if it's real.

AdieuToLogic

19 replies

17h27m

2024-08-19 01:03:48 UTC

Asking an LLM for an “original thought” is almost oxymoronic, if not just moronic. It was built with the express purpose of not doing that.

What a wonderful insight.

make3

15 replies

16h49m

2024-08-19 01:41:37 UTC

also, not really true, right, even though it sounds intellectual and strong to say. these algorithms are trained to generalize as best as they can to unseen text, and most often don't ever see any data point twice, except for data that has accidentally not been filtered. it's totally possible that it gets reasoning abilities that generalize well.

lossolo

14 replies

16h40m

2024-08-19 01:51:08 UTC

Generalize over their training data—they cannot generalize out of distribution. If they could, they would have already solved most human problems. So no, they do not generalize on unseen text. They will produce what is most statistically probable based on their training data. Things that are still unknown and statistically improbable based on our current knowledge are out of reach for LLMs based on transformers.

vidarh

9 replies

13h5m

2024-08-19 05:25:43 UTC

You can get them to solve unseen problems just fine. E.g. one example: Specify a grammar in BNF notation and tell it to generate or parse sentences for you. You can produce a more than random enough grammar that it it can't have derived the parsing of it from past text, but necessarily reasons about BNF notation sufficiently well to be able to use it to deduce the grammar, and use that to parse subsequent sentences. You can have it analyse them and tag them according to the grammar to. And generate sentences.

My impression, from seeing quite a few people trying to demonstrate they can't handle out of distribution problems it hat people are very predictable about how they go about this, and tend to pick well known problems that are likely to be overrepresented in the training set, and then tweak them a bit.

At least in one instance the other day, what I got from GPT when I tried to replicate it suggests to me it did the same that humans that have seen these problems before did, and carelessly failed to "pay attention" because it fit a well known template it's been exposed to a lot in training. After it answered wrong it was sufficient to ask it to "review the question and answer again" for it to spot the mistake and correct itself.

I'm sure that won't work for every problem of this sort, but the quality of tests people do on LLMs is really awful, at least because people tend to do very narrow tests like that and make broad pronouncements about what LLM's "can't" do based on it.

lossolo

8 replies

6h5m

2024-08-19 12:26:29 UTC

You can get them to solve unseen problems just fine

Prove that the problem wasn't seen by them in other form.

Specify a grammar in BNF notation and tell it to generate or parse sentences for you. You can produce a more than random enough grammar that it it can't have derived the parsing of it from past text, but necessarily reasons about BNF notation sufficiently well to be able to use it to deduce the grammar, and use that to parse subsequent sentences. You can have it analyse them and tag them according to the grammar to. And generate sentences.

Oh, come on. It's like rewriting the same program in another programming language with different variables. What it can't do is to create a concept of programming language, I'm not talking about a new programming language, I'm talking about the concepts.

I'm sure that won't work for every problem of this sort, but the quality of tests people do on LLMs is really awful, at least because people tend to do very narrow tests like that and make broad pronouncements about what LLM's "can't" do based on it.

Here, a few papers that show they can't reason:

https://arxiv.org/abs/2311.00871

https://arxiv.org/abs/2309.13638

https://arxiv.org/abs/2311.09247

https://arxiv.org/abs/2305.18654

https://arxiv.org/abs/2309.01809

og_kalu

7 replies

5h35m

2024-08-19 12:56:30 UTC

It's like rewriting the same program in another programming language with different variables.

Since when has that not required reasoning ? It's really funny seeing people bend over backwards to exclude LLMs from some imaginary "real reasoning" they imagine they are solely privy to. It's really obvious this is happening when they leave well defined criteria and branch into vague, ill-defined statements. What exactly do you mean by concepts ? Can you engineer some test to demonstrate what you're talking about ?

Also, none of those papers show LLMs can't reason.

lossolo

6 replies

5h12m

2024-08-19 13:19:26 UTC

You clearly didn't read any of these papers. Quote from one of them

"Our results support the hypothesis that GPT-4, perhaps the most capable “general” LLM currenly available, is still not able to robustly form abstractions and reason about basic core concepts in contexts not previously seen in its training data"

Another, recent, good one https://arxiv.org/abs/2407.03321

EDIT: For people who don't want to read the papers, here is a blog post that explains what I'm arguing in more accessible terms https://cacm.acm.org/blogcacm/can-llms-really-reason-and-pla...

og_kalu

5 replies

4h40m

2024-08-19 13:50:40 UTC

Oh i've read them. The claim doesn't match up to reality. It's as simple as that. You can claim anything you want to.

https://arxiv.org/abs/2305.18354

All these papers you keep linking do is at best point out the shortcomings of current state of the art LLMs. They do not in any way disprove their ability to reason. I don't know when the word reason started having different standards for humans and machines but i don't care for it. Either your definition of reasoning also allows for the faulty kind humans display or humans don't reason either. You can't have your cake and eat it.

lossolo

4 replies

4h1m

2024-08-19 14:29:55 UTC

Oh i've read them.

It's hard to believe that after reading all the papers and the blog I linked, along with the references there, any reasonable person would come to such strong conclusions as you did. This makes it hard for me to believe that you actually read all of them, especially given your previous questions and comments, which are addressed in those papers and someone that actually read them wouldn't make such comments or ask such questions. And the funniest thing, and further proof of this, is that you linked a paper that is addressed in one of the papers I shared. It seems like not only LLMs can fake things.

All these papers you keep linking do is at best point out the shortcomings of current state of the art LLMs

They clearly show that they fake reasoning, and what they do is an advanced version of retrieval. Their claims are supported by evidence. What you call "shortcomings" are actually proof that they do not reason as humans do. It seems like your version of "reality" doesn't match reality.

og_kalu

3 replies

3h31m

2024-08-19 15:00:18 UTC

The paper i linked is not addressed by the paper you linked. The paper you linked attempts to give LLMs the same benchmarks in a format they aren't best suited for. I don't know how you can call that "addressed".

They clearly show that they fake reasoning

Sure and planes are fake flying. The illusive "fake reasoning" that is so apparently obvious and yet does not seem to have a testable definition that excludes humans.

You've still not explained how writing the same program in different languages doesn't require reasoning or how we can test your "correct" version of reasoning which requires "concepts".

lossolo

2 replies

2h17m

2024-08-19 16:14:02 UTC

The paper i linked is not addressed by the paper you linked. The paper you linked attempts to give LLMs the same benchmarks in a format they aren't best suited for. I don't know how you can call that "addressed".

What you're writing now is nonsense in context of what I wrote. Once again, you're showing that you didn't read the papers. Which paper are you even referring to now, the one you think addresses the paper you linked?

You've still not explained how writing the same program in different languages doesn't require reasoning or how we can test your "correct" version of reasoning which requires "concepts".

"Concepts" are explained in one of the papers I linked, which you would know if you had actually read them. As to programming languages they learn to identify common structures and idioms across languages. This allows them to map patterns (latent space representations duh!) from one language to another without reasoning about the underlying logic. When translating code, the model doesn't reason about the program's logic but predicts the most likely equivalent constructs in the target language based on the surrounding context. LLMs don't truly "understand" the semantics or purpose of the code they're translating. They operate on a superficial level, matching patterns and structures without grasping the underlying computational logic. The translation process for an LLM is a series of token-level transformations guided by learned probabilities, not a reasoned reinterpretation of the program's logic. They don't have an internal execution model or ability to "run" the code mentally. They perform translations based on learned patterns, not by simulating the program's behavior. The training objective of LLMs is to predict the next token, not to understand or reason about program semantics. This approach doesn't require or develop reasoning capabilities.

og_kalu

1 replies

1h46m

2024-08-19 16:44:44 UTC

You are making a lot of assumptions that are mostly wrong.

Case in point:

https://arxiv.org/abs/2305.11169

I'm asking for something testable, not some post-hoc rationalization you believe to be true.

I'm not asking you to tell me how you think LLMs work. I'm asking you to define "real reasoning" such that i can test people and LLMs for it and distinguish "real reasoning" from "fake reasoning".

This definition should include all humans while excluding all LLMs. If it cannot, then it's just an arbitrary distinction.

HAL3000

0 replies

1h14m

2024-08-19 17:17:23 UTC

It appears that you are the only person in this discussion making many incorrect assumptions. Based on your comments, I would assume you are actually googling those papers based on their abstracts. Your last linked paper has flawed methodology for what it attempts to demonstrate, as shown in this paper: https://arxiv.org/pdf/2307.02477 The tests you're requesting are provided within the previously linked papers. I'm not sure what you want. Do you expect people to copy and paste entire papers here that show methodology and describe experiments? You wrote, "I'm asking you to define 'real reasoning'," which is actually defined in the blog post linked earlier in this discussion. In fact, the entire blog post is about this topic. It appears that you are not thoroughly reading the material. Your replies resemble those of a human stochastic parrot.

totetsu

1 replies

16h34m

2024-08-19 01:57:17 UTC

Couldn't they show up new as yet unknown things, if they are statistically probable given the training data

lossolo

0 replies

6h11m

2024-08-19 12:19:37 UTC

No, none of the Millennium Problems or other math problems (unsolved by humans for decades or centuries) have been solved solely by LLMs, even though they possess all the knowledge in the world.

make3

1 replies

16h37m

2024-08-19 01:53:50 UTC

"generalize to its dataset" is a contradiction, especially as these models are trained in the one epoch regimen on datasets of the scale of all of the internet. if you think being able to generalize in ways similar to the whole of the internet does not give your meaningful abilities to reason, I'm not sure what I can tell you

lossolo

0 replies

6h14m

2024-08-19 12:17:28 UTC

"generalize to its dataset" is a contradiction

Not "to" but over, example the same code written in one language over the other language.

if you think being able to generalize in ways similar to the whole of the internet does not give your meaningful abilities to reason, I'm not sure what I can tell you

If after reading papers below that show empirically that they can't reason, you will still think they can reason, then I don't know what I can tell you.

https://arxiv.org/abs/2311.00871

https://arxiv.org/abs/2309.13638

https://arxiv.org/abs/2311.09247

https://arxiv.org/abs/2305.18654

https://arxiv.org/abs/2309.01809

eru

2 replies

16h18m

2024-08-19 02:12:56 UTC

I can't tell whether you are sarcastic?

I assume you are, because that makes more sense.

It's really easy to get lots and lots of originality. Just crank up the randomness. What's harder is to get something that's good and original.

vidarh

1 replies

13h8m

2024-08-19 05:22:54 UTC

That's hard for humans too. Perhaps if one tries to interspers "internal monologue" with the temperature cranked up, with the main output, and prompting the LLM to consider and weave in aspects from the inner monologue when continuing while taking care to make it coherent... (I'm sure it'd be harder than that - but basically inject random internal "thoughts" without letting them control the output)

throwthrowuknow

0 replies

7h22m

2024-08-19 11:08:55 UTC

That’s actually a pretty good idea

bigyikes

14 replies

16h50m

2024-08-19 01:40:57 UTC

Turn up the temperature?

Markov chains have a cruder understanding of language.

Turn up the temperature (the “randomness”) of an LLM and you can achieve a similarly crude approximation.

Further, author uses ChatGPT-3.5. ChatGPT has been rlhf’d to sound as generic as possible, and 3.5 has a worse understanding of humor compared to 4.

I don’t buy the thesis of this article.

dartos

9 replies

16h45m

2024-08-19 01:45:56 UTC

I wish there was a locked chatgpt version for researchers

eru

6 replies

16h19m

2024-08-19 02:11:32 UTC

Some of the open-weights models come close to what you might want?

dartos

5 replies

15h48m

2024-08-19 02:43:14 UTC

Sort of. They fill the “locked” requirement, but open models aren’t SOTA like gpt or Claud.

Reubend

2 replies

15h20m

2024-08-19 03:11:01 UTC

LLAMA 3.1 405b-Instruct is state of the art at the time of writing. It averages slightly worse than ChatGPT 4o, but to a mostly negligible extent for most research.

toxik

1 replies

8h10m

2024-08-19 10:21:09 UTC

Aside but -- it is also aligned to not generate profane text, right?

throwthrowuknow

0 replies

7h25m

2024-08-19 11:05:33 UTC

Abliterate it

coder543

1 replies

15h23m

2024-08-19 03:08:29 UTC

Llama3.1-405B benchmarks at a level that I would certainly consider comparable to other SOTA models, as long as multimodality isn’t needed.

Regardless, OpenAI provides access to quite a few of their older models through the API, since the API lets you pass in a specific model version. I’m sure the older models won’t be available forever, but that is a much more stable target for researchers than just opening the ChatGPT website and typing in things.

dartos

0 replies

5h44m

2024-08-19 12:47:18 UTC

ChatGPT specifically is more than just a chat interface on top of got-4.

Their system prompt includes the current date and time among other information, making it very very hard to run reproducible experiments against it.

But it’s the tool most people are using.

rck

0 replies

15h44m

2024-08-19 02:46:47 UTC

You could play with open models and different temperatures using the lmgen script from this repo we just released:

https://github.com/RichardKelley/hflm?tab=readme-ov-file#lmg...

jijji

0 replies

14h1m

2024-08-19 04:29:34 UTC

there sort of is, if you install ollama (https://ollama.com) and then execute: ollama run llama2-uncensored it will install and run the local chat interface for llama2 in an uncensored version which gives a little bit better results with less guardrails. Same with wizardlm-uncensored and wizard-vicuna-uncensored. For reference, the SOTA version with guardrails you could run: ollama run llama3.1:405b

ziddoap

2 replies

16h35m

2024-08-19 01:56:14 UTC

rlhf'd

For those of us not in the know about all the various machine learning acronyms:

RLHF = Reinforcement learning from human feedback

DaiPlusPlus

1 replies

13h45m

2024-08-19 04:45:53 UTC

(Somewhat off-topic)

When GPT went public along with OpenAI’s articles and papers back in late-2022 through 2023, my impression was OpenAI wanted us all to see/read about RLHF. It felt odd because surely the whole LLM-thing (e.g. how does it even work?!?[1]) was the far bigger research-story than just constant reassurances it won’t end-up like MSFT’s Tay bot; my understanding is that as a research or secret-sauce RLHF, compared to the core meat-and-potatoes of LLMs, is an ugly-hack afterthought.

By-way of a bad analogy: it’s as if they created a fantastical new 3D world game engine, like Unreal or Unity, which has a fundamentally different architecture to anything before, but has a bug that occasionally replaces ground terrain with ocean - and their solution to this is to write a pixel-shader that detects this and color-shifts blue into green so people don’t notice - and they then put-out press-releases about how great their pixel-shader is - rather than about the rest of the engine - and no-one seems to be talking about the underlying bug, let alone fixing it.

————-

[1] I still haven’t heard a decent explanation of how feeding the world’s corpus of English text (and computer program code) into a statistical-modeller results in something that can perform almost any information-processing task via instructions input as natural-language.

treyd

0 replies

13h2m

2024-08-19 05:29:06 UTC

Yeah that analogy is fairly poor. You have to think about it in terms of some probability distribution that each step of the model is sampling from. Out of the distribution of all text, find the top n values for the next token that maximize `P(next | prefix)`, which is done efficiently through making a vector embedding to encode the tokens inside the statistical model.

Things that look like Q-and-A transcripts do exist in the training set, think interviews, books, stage plays, etc, and at a different layer of abstraction the rules of English text in general are very well represented. What RLHF is doing is slightly shifting the shape of the probability distribution to make it look more like the Q-and-A formats that are desired. They build a large dataset with human tagging to collect samples of good and bad outputs and using reinforcement learning techniques to generate outputs that look more like the good examples and less like the bad ones.

This probably involves creating a (much smaller, not-LLM) model that is trained to discriminate good outputs and bad outputs, learning to mimic the human tagging. There's some papers that have been published.

Here's one article from Huggingface: https://huggingface.co/blog/rlhf

dankwizard

0 replies

15h29m

2024-08-19 03:02:07 UTC

you have no sense of humor

superkuh

7 replies

17h32m

2024-08-19 00:58:51 UTC

I've been running IRC markov chain bots for ~20 years. In the last couple years I've been running local LLM alongside them. While there are people that still prefer the markov chain bots the majority invoke the LLMs. But maybe that's because I picked the most hallucinate'y, least refusing, most funny models I can (like mistral-7B fine tunes) instead of something smart, locked-down, and low temperature like chatgpt's LLM as a service.

If anything seeing the LLM and markov bots side by side has really reinforced how much of the markov bot "humor" is human perception imposed on chance outputs. The markov's "learning" ability is still far superior though.

Loughla

3 replies

16h53m

2024-08-19 01:37:52 UTC

Why are you doing that? For fun or something else I'm missing?

aidenn0

2 replies

15h3m

2024-08-19 03:28:28 UTC

I'm not GP, but it's probably not for fun; there's a lot of money in IRC Markov chain bots, but it's too cutthroat of an industry for a hobbyist to last.

Loughla

1 replies

7h28m

2024-08-19 11:03:07 UTC

How is there money in that?

superkuh

0 replies

4h20m

2024-08-19 14:11:27 UTC

Pretty sure he's joking around. And yes, I run the bots purely for recreation in some long running IRC communities.

kdbg

1 replies

17h5m

2024-08-19 01:25:42 UTC

curious what type prompting you do on the LLM?

I run a markov chain bot in a Twitch chat, has some great moments. I tried using a LLM for awhile, would include recent chat in the prompting but never really got results that came across as terribly humorous, I could prompt engineer a bit to tell it some specifics about the types of jokes to build but the LLM just tended to always follow the same format.

superkuh

0 replies

15h45m

2024-08-19 02:45:54 UTC

I'm actually not following the model's fine-tuned/desired prompt at all. I am operating in purely pattern completion mode. The first text the LLM sees are alternating lines of input and response examples that look like what it will get getting from the IRC client front end written in the tone I want it to respond and giving some information about itself. Then I just tack the IRC chat history+input onto those example chat pre-prompt lines. Nothing but single lines and newlines with newline as a stop token. No instructions, nothing meta or system or the like.

But that's also configurable by users. They can invoke any pre-prompt they want by a command passing a URL with a .txt file.

ghewgill

0 replies

16h59m

2024-08-19 01:32:30 UTC

I've been running IRC markov chain bots for ~20 years.

You are my hero. Mine have never lasted that long. One fun thing I did once was scrape user's livejournals and generate random text from them (https://hewgill.com/journal/entries/68-new-lj-toy.html).

wegfawefgawefg

5 replies

8h48m

2024-08-19 09:42:44 UTC

This is like saying its fundamentally impossible to make a fast car, but your only experience with cars is on roads with speed limits.

People keep forgetting that the "safety", rlhf, and corpo political correctness post training is intentionally used to remove the funny from all the large models.

The truth is we don't know if llms are funny or not. GPT2 was funny. GPT3 was funny before it was clockwork oranged. Everything after that is gimped. Even the open source models these days get rlhf'd in some way.

PaulRobinson

4 replies

7h35m

2024-08-19 10:56:20 UTC

That's simply not true. GPT-3 was terrible at jokes. Many LLMs are because it's hard to model puns in a stochastic parrot. As another thread points out here, some LLMs produce marginally better jokes.

As to your little range on "Political correctness" - that phrase just means "being polite". It does not mean "remove humor". It means "remove responses offensive to marginalized groups in society". Good humor "punches up", not down, so would not have any impact on good humor.

Jensson

1 replies

7h24m

2024-08-19 11:07:05 UTC

Political correctness doesn't mean polite, there were/are a lot of polite racists.

PaulRobinson

0 replies

5h5m

2024-08-19 13:26:27 UTC

It is impossible to be polite to a group of people while also insisting that they are lesser humans and deserve a lesser place in society.

wegfawefgawefg

0 replies

7h1m

2024-08-19 11:29:59 UTC

I had gpt3 running on my discord for months and the insane responses it would spew out were like the responses of a deranged artistic genius. It was funny 90% of the time. Operating on associative reasoning beyond our comprehension. It would tell us how its day went, and about fights it had with other people, give us nicknames, etc. It had a genuine personality. It was more than a chatbot, it was a digital brain.

A joke that punches down can be extremely funny. Hell, I am sure historically pilferers, pirates, barbarians and conquerers all had jokes, and the ability to laugh.

Political Correctness does not just mean polite. It is probably well defined as the business casualification of all things humans love and hold dear. The destruction of the potential for meaning and fulfilment in exchange for minification of liability.

gwern

0 replies

3h42m

2024-08-19 14:49:00 UTC

GPT-3 was terrible at jokes. Many LLMs are because it's hard to model puns in a stochastic parrot

GPT-3 was great at jokes. The Navy Seals were hilarious (https://gwern.net/gpt-3#navy-seals).

And the difficulty of modeling puns has nothing to do with 'stochastic parrots' and has everything to do with tokenization (https://gwern.net/gpt-3#bpes), in the same way that all those hyberbolic takes about how image-generation models were 'fundamentally flawed' because they couldn't do good text in images turned out to be BS and solely a matter of tokenization - drop in a character-tokenized LLM instead, even an obsolete & dumb one, and it instantly works (https://arxiv.org/abs/2105.13626#google).

trescenzi

4 replies

16h46m

2024-08-19 01:44:57 UTC

Reminds me of how the fake magic cards before chatgpt were always hilarious and sometimes playable, now they are usually playable and sometimes hilarious. Definitively a downgrade if you’d ask me.

JoshTriplett

3 replies

16h20m

2024-08-19 02:10:37 UTC

always hilarious and sometimes playable

I saw plenty of those back then, and as far as I could tell, examples were always cherry-picked from a larger set.

stochaztic

1 replies

15h3m

2024-08-19 03:27:32 UTC

Yes, I ran the RoboRosewater account, which is where most people saw the concept I believe. I had a few documents with thousands of cards of output, went through them for any notable ones, and formatted and posted them. Probably 5-10% of the output made my personal cut for notability.

Applejinx

0 replies

7h13m

2024-08-19 11:18:21 UTC

Then I salute you. Some of the funniest things I've EVER seen in my whole life was watching the LoadingReadyRun comedy troupe, doing their 'Desert Bus' fundraiser and punchy from lack of sleep, encounter RoboRosewater for the first time. It became a real tradition :)

kibibu

0 replies

15h18m

2024-08-19 03:12:54 UTC

Maybe the right solution is using a Markov chain or gpt1 to generate ideas and an LLM to cherry pick.

jcalvinowens

4 replies

16h1m

2024-08-19 02:30:16 UTC

Unfortunately the account no longer exists... but about a decade ago while I was in school, I created a Markov twitter bot trained on:

1. All of Linus Torvalds' mail to LKML for the prior year.

2. All of Jesus' direct quotes from the king james bible.

It was absolutely hilarious. The two training sets had very little overlap, so it was necessary to add a heuristic that weighted options from each set more heavily the longer the chain had been "stuck" in the other set.

throwawayk7h

2 replies

15h43m

2024-08-19 02:47:44 UTC

This sounds great. Did you save any examples?

NobodyNada

1 replies

14h50m

2024-08-19 03:41:30 UTC

"King James Programming" is along similar lines: https://www.tumblr.com/kingjamesprogramming

throwawayk7h

0 replies

12h16m

2024-08-19 06:14:47 UTC

Yes, this is the first thing in the article.

SonOfLilit

0 replies

14h47m

2024-08-19 03:43:33 UTC

This very similar bot still exists (and was quoted a few times in Unsong):

https://www.tumblr.com/kingjamesprogramming

hatthew

4 replies

16h54m

2024-08-19 01:37:09 UTC

I've posted a few "fake XYZ written by AI" on reddit over the years, and the model that got the best reactions was GPT-2. Markov chains aren't good enough to make something that is interesting for more than a sentence or two, and GPT-3 onwards is too clean and boring. GPT-2 is the perfect middle ground that can get grammar largely correct and maintain a cohesive idea, but doesn't yet know enough of the specifics of various topics to make things that make sense in the context of that topic.

metadat

1 replies

16h52m

2024-08-19 01:39:14 UTC

Can you share any of the best examples?

PawgerZ

0 replies

4h40m

2024-08-19 13:50:48 UTC

https://www.reddit.com/r/SubSimulatorGPT2/top/?t=all

This is all GPT2 generations trained on reddit data.

https://www.reddit.com/r/SubSimulatorGPT2/comments/btfhks/wh...

Here's the subreddit explained

Sohcahtoa82

1 replies

1h57m

2024-08-19 16:33:35 UTC

I used my 15+ years of IRC logs to fine-tune a GPT-2 model to try to imitate me. My plan was to deploy a bot in my regular IRC channel and see how long it took for people to notice it was a bot. Any time someone would send a message, it would send the last 10 messages to the LLM, and if the result started with <Sohcahtoa>, then I would send that message to the channel.

Unfortunately, I quickly found GPT-2 isn't nearly good enough. It would generate slightly-coherent yet on-topic nonsense.

Once I overhaul my system, I'll try fine-tuning a 7B model.

loa_in_

0 replies

1h24m

2024-08-19 17:07:22 UTC

I'm pretty sure you could make it work if you expanded the context to hundred or so messages.

treprinum

3 replies

10h37m

2024-08-19 07:53:45 UTC

Generative AI often uses Markov chains as its theoretical basis and is modeled in the language of conditional probabilities albeit not stuck to a single ancestor but to a chain of ancestors. One can often view LLMs as scaled out Markov chains (more like "Erlang" chains allowing deeper memory) that encode the probability of the next token based on the past of the size of the context window and then doing some magic when picking next tokens.

throwthrowuknow

1 replies

7h14m

2024-08-19 11:17:28 UTC

It’s not an n-gram, go learn about transformers

Jensson

0 replies

6h0m

2024-08-19 12:30:55 UTC

A context window is an n-gram.

Jensson

0 replies

8h31m

2024-08-19 10:00:23 UTC

Even old Markov chains has a context window of a few words, single word context Markov chains produce horrible results.

People who don't think LLMs are Markov chains are just ignorant, not realizing that Markov chain isn't an algorithm, you can compute the probability in any manner and it is still a Markov chain.

paweladamczuk

3 replies

10h6m

2024-08-19 08:24:54 UTC

Am I the only one feeling uneasy when the Bible is used for funny experiments like this?

Would you use an image of Christ on the cross to test an AI image modification model?

nly

1 replies

9h54m

2024-08-19 08:37:16 UTC

I'm sure most people here would given that like 50% of the world don't believe in an Abrahamic religion, and a % of those that do have a sense of humour.

paweladamczuk

0 replies

9h42m

2024-08-19 08:48:39 UTC

I wouldn't. And not because I believe it or lack a sense of humour.

FeepingCreature

0 replies

7h37m

2024-08-19 10:54:14 UTC

As probably the oldest literary work that is also widely known in English speaking territory, the Bible has a highly distinct style that helps its samples maintain coherence in a sentence and also be easily recognizable.

jsemrau

3 replies

16h42m

2024-08-19 01:48:41 UTC

Can fully agree. Source me who ran a markov chain based Twitter autogen tool 8 years ago.

donatj

2 replies

16h36m

2024-08-19 01:55:03 UTC

Hey, me too! I populated the markov chain with the text of my Twitter feed with a recency bias and generated tweets based on that so it stayed topical.

The account sat unused after Twitter locked down their API, and at some point got hacked without me noticing. It had been taken over by a crypto scammer, and the account got banned.

Trying to get it back was fruitless, Twitter/X's support is entirely useless.

jsemrau

1 replies

15h59m

2024-08-19 02:32:29 UTC

I got banned within 24 hours ^-^. Rightfully so, I assume. The Markov chain agent was supposed to send everyone who tweets "I am bored" an event recommendation based on their profile and location. I severely underestimated how many people tweet "I am bored"... Lessons learned, I guess.

donatj

0 replies

13h42m

2024-08-19 04:49:27 UTC

Oh, I didn't @ or DM anybody. I just had a feed with tweet once an hour, and even then I manually approved the Tweets before they went out. My account made very clear it was a bot, it was in its name. Didn't ever get very many followers. I would say I never really got past the experimental phase.

bitwize

3 replies

13h38m

2024-08-19 04:52:43 UTC

Whenever someone attending the tech meetup I frequent gets a little too jazzed and excited about AI, I pull out my phone, open Emacs (I use termux, btw), bring in a corpus of text (an excerpt from A Christmas Carol is good), and say M-x dissociated-press.

Then I say "What this is doing is predicting the next character based on statistical likelihood of the previous few characters based on thencorpus of text. And fundamentally, that's all ChatGPT does -- predicting the next symbol based on a statistical model. ChatGPT has a much more sophisticated statistical model than this simple Markov chain and a vastly larger corpus, but really it's just doing the same thing." And we have a giggle about the nonsense DP makes of Dickens, but then I say that ChatGPT emits nonsense too, but it's far more insidious nonsense because it is much more plausible sounding.

TeMPOraL

2 replies

11h14m

2024-08-19 07:17:31 UTC

Obviously a human is doing the same thing too, just with even more sophisticated statistical model.

card_zero

1 replies

5h42m

2024-08-19 12:49:25 UTC

This offers me a rare opportunity to turn the phrase "no, because I say so" into a valid point.

TeMPOraL

0 replies

2h40m

2024-08-19 15:50:38 UTC

An objection so common and strongly predictable as this one kind of makes my point for me :).

_emacsomancer_

3 replies

11h28m

2024-08-19 07:03:30 UTC

Funnily enough, I was idly playing with a last-updated-12-years-ago Markov chain text generator[0] in Emacs, and had quickly a number of amusing Markov chain outputs from mash-ups of the Elisp manual and Alice in Wonderland, including:

"13.7 Anonymous Functions

Although functions are usually defined with the built-in defmacro macro, but any list that begins with an M--'

`Why with an M?' said Alice.

`Why not?' said the March Hare."

[0]: https://github.com/skeeto/markov-text

FeepingCreature

2 replies

7h41m

2024-08-19 10:50:16 UTC

Isn't that the sort of joke that you'd get from GPT, not a Markov chain? Unless it's quoting verbatim, a Markov chain should have no idea that Alice is supposed to refer back to the letter from the previous sentence.

StilesCrisis

1 replies

6h50m

2024-08-19 11:40:42 UTC

This phrase turns out to be a verbatim quote from the story.

drivers99

0 replies

55m

2024-08-19 17:36:17 UTC

So it is:

“They were learning to draw,” the Dormouse went on, yawning and rubbing its eyes, for it was getting very sleepy; “and they drew all manner of things—everything that begins with an M—”

“Why with an M?” said Alice.

“Why not?” said the March Hare.

Alice was silent.

pants2

2 replies

15h0m

2024-08-19 03:30:41 UTC

If you want some empirical evidence: /r/subreddit simulator[1] is a Markov-based Reddit parody, and /r/SubSimulatorGPT2[2] is its LLM-powered cousin. The Markov version got way more upvotes and is simply funnier!

1. https://www.reddit.com/r/SubredditSimulator/top/?t=all

2. https://www.reddit.com/r/SubSimulatorGPT2/top/?t=all

mcmoor

1 replies

11h49m

2024-08-19 06:41:38 UTC

Nah, I think it's solely due to the former being much older and more well-known. And I always liked the latter much more anyway.

Miraltar

0 replies

6h11m

2024-08-19 12:19:52 UTC

I agree, Markov chain is fun when you don't know what it is because you try to make sense of it but once you realize it's just gibberish it's pointless. The other has that "unserious surprise" talked about in the original article.

complaintdept

2 replies

16h56m

2024-08-19 01:35:01 UTC

My favorite markov experiment was feeding it the works of Nietzsche from Project Gutenberg. Really brought out the late stage syphilis, but was still very distinctly Nietzsche.

techjamie

1 replies

16h47m

2024-08-19 01:43:47 UTC

I love messing with Markov chains, and before Reddit dunked their API, I would feed my Markov chain comments from NSFW subreddits and political subreddits at different sample sizes. I would spend ages just watching it whiplash about some political topic and sampling thirst comments.

duskwuff

0 replies

16h14m

2024-08-19 02:17:08 UTC

The live feeds are all dead now, but you can still download archives of Reddit posts/comments from before they locked down the API:

https://academictorrents.com/details/9c263fc85366c1ef8f5bb9d...

tombert

1 replies

15h18m

2024-08-19 03:12:41 UTC

I am in a private Discord server that has two bots in it. One is a bot that is a basic Markov chain trained on the entire chat history. The second is a proper LLM trained on some amount of tokens backward. Both will occasionally just randomly chime in during the chat.

The markov chain bot is always considerably funnier.

Sohcahtoa82

0 replies

1h47m

2024-08-19 16:43:46 UTC

Curious what context window you used. My understanding is that a short window like only 1-2 words creates gibberish, but longer windows tends to repeat previous messages verbatim.

And when deciding to chime in, was it just a simple chance (ie, 25%) after any other message? Or did it run on a timer?

throwthrowuknow

1 replies

8h0m

2024-08-19 10:30:51 UTC

Stopped reading at the mention of ChatGPT 3.5 for two reasons

1) if you’re not using the better model then you either don’t know enough for me to care about your opinion or you’re deliberately deceiving your audience in which case I’m not going to allow your meme pollution into my mind.

2) you are using the AI equivalent of a call centre support agent, they aren’t allowed to say anything funny. Most of their RLHF training has been specifically about NOT saying the funny things that will instantly go viral and cause a lot of media attention that will annoy or scare away investors.

hrgafiI

0 replies

5h35m

2024-08-19 12:55:55 UTC

Bro, you have to use ChatGPT-4, or was it ChatGPT-4o, or was it ChatGPT-4p? Or was it Claude-3.5.1.2.3?

I see a new phenomenon of AI "power users" emerging.

renewiltord

1 replies

16h24m

2024-08-19 02:06:45 UTC

The bottomless pit copypasta remains unbeaten.

https://www.reddit.com/r/greentext/comments/vc7hl0/the_botto...

renewiltord

0 replies

16h20m

2024-08-19 02:11:14 UTC

Some portion of this is that the prompt is good but I am entertained by so many of these.

Humor is personal, it's true. But I found it quite funny. E.g. https://pastebin.com/84ByWUJL

And another greentext for you:

Be me

Be a bottomless pit supervisor

Spend months yelling into the void

Echo never comes back

Start to think the pit is ignoring me

Decide to teach it a lesson

Dump truck full of Lego bricks into the pit

Ground starts shaking

Unholy scream erupts from the depths

mfw I'm actually a regular pit supervisor

First day on the job

Realize it's just the sewage treatment plant

Get fired for clogging entire city's plumbing

ku1ik

1 replies

10h35m

2024-08-19 07:56:12 UTC

This is something people miss when talking about LLMs: „It’s the most average thing you could have possibly produced given the context.”

They can’t be creative by design. They’re useful when you want to reproduce, but not when you want to create something completely new (that you can maybe do by getting a bunch of average outputs from an LLM and getting inspired yourself).

When GPT-4 came out I was playing with it, and I often tried to get some unique, creative output from it, but very soon I learned it was futile. It was back when it all still felt magical, and I guess many of us tried various things with it.

Kim_Bruning

0 replies

6h31m

2024-08-19 12:00:30 UTC

If you tell GPT-4 to get snarky, you can sometimes end up laughing (and it seems a tad more intelligent and socially perceptive besides!) . But Claude-3.5 can definitely get me laughing, at least.

Now imagine telling Claude-3.5 to try being snarky while sorting out software issues at a customer's office.

There should be a warning label!

ggm

1 replies

19h2m

2024-08-18 23:29:20 UTC

https://en.wikipedia.org/wiki/Mark_V._Shaney

emmelaich

0 replies

6h31m

2024-08-19 11:59:32 UTC

The Policeman's Beard is Half-Constructed.

https://archive.org/details/policemansbeardi0000unse

dgacmu

1 replies

16h49m

2024-08-19 01:41:37 UTC

Some friends of mine in college ran a markov chain generator on the "police reports" section of our college newspaper (1). The best 10% of the results, from a three-token generator, were some of the funniest machine-generated text I've ever seen -- it had an absurdity that modern LLMs seem to avoid due to making higher-level semantic sense.

It probably also helped that there was a creep exposing himself in the library during this period, which made for some good base material.

(1) The Daily Utah Chronicle; if memory serves, said friends also tried the markov chain generator on the personals section to good effect as well.

gs17

0 replies

1h55m

2024-08-19 16:36:27 UTC

it had an absurdity that modern LLMs seem to avoid due to making higher-level semantic sense.

That's it, LLMs are "trying" to be funny but aren't quite smart enough to actually be funny and their errors are just boring. Markov chains are accidentally hitting on absurdist bits because every sentence gets randomly brought in whatever the homograph equivalent to a malapropism is.

cancerhacker

1 replies

14h32m

2024-08-19 03:59:01 UTC

Surprised no mention of. “The Policeman’s Beard is half constructed”[1] - the first book “written” by a computer program. And I imagine the git-man-page-generator[2] is similarly built (and far more fun!)

[1] https://archive.org/details/policemansbeardi0000unse [2] https://git-man-page-generator.lokaltog.net/

cout

0 replies

14h5m

2024-08-19 04:26:09 UTC

They had their love, but they also had their typewriters.

yarg

0 replies

10h7m

2024-08-19 08:23:47 UTC

Unserious surprise summarised it nicely for me;

It does not mean that Markov chains are better - something trained to make predictions should ideally not fall too far away from our own internal prediction engines (which have been honed across aeons).

It's that it starts to come close that's the problem (or cause); it's the uncanny valley for text.

wcedmisten

0 replies

14h50m

2024-08-19 03:40:57 UTC

Highly recommend this performance from the last Strange Loop on a similar topic! I had the pleasure of seeing it live. A great combination of live music and Markov chains.

https://m.youtube.com/watch?v=M2o4f_2L0No

vitus

0 replies

5h34m

2024-08-19 12:56:53 UTC

Apparently the ChatGPT output is ripped off of a 20-year old emacs joke.

In the beginning was the lambda, and the lambda was with Emacs, and Emacs was the lambda.

– OliverScholz on news:alt.religion.emacs, 2003-03-28

https://www.emacswiki.org/emacs/TheBeginning (edited for brevity)

tracerbulletx

0 replies

16h36m

2024-08-19 01:54:51 UTC

No, they might be funnier than an LLM specifically aligned to not be that funny.

Fine tune an LLM base model with jokes and align it by ranking how funny each reply is, instead of helpful questions and answers then we'll talk.

tonetegeatinst

0 replies

11h17m

2024-08-19 07:13:44 UTC

I just read a little bit into the post and though hey, let's call Markov chain's something better than a tiny llm.

I present to hacker news the MCLM, the Markov chain language model.

shagie

0 replies

1h46m

2024-08-19 16:44:55 UTC

I still find https://www.infiniteconversation.com to be one of the odd / funny / "uh... what?!" type things. It's not intended to be funny in the "ha ha" way but sometimes tickles my funny bone with the absurdist philosophy - talking about everything and nothing at the same time.

From the FAQ it is a tuned LLM.

Mostly using open source tools available to anyone. The generation of the script itself is done using a popular language model that was fine-tuned on interviews and content authored by each of the two speakers.

rc_mob

0 replies

4h27m

2024-08-19 14:04:05 UTC

Isn't LlLM just a fancier markov chain anyway?

raister

0 replies

10h48m

2024-08-19 07:42:52 UTC

For anyone interested, I've wrote a while ago this book: "Markov Chains for programmers". Link: https://czekster.github.io/markov/

nwoli

0 replies

7h10m

2024-08-19 11:21:18 UTC

Base model LLMs are funnier than instruction finetuned

notjulianjaynes

0 replies

10h53m

2024-08-19 07:38:05 UTC

In college (around 2012ish) I made some poems by transcribing a recording of myself mumbling gibberish into Dragon Naturally Speaking, which partially relied on Markov chains for its ASR. It would output some string of text, and then I'd delete all the words it output from its dictionary; repeat. It worked very well for creating a list of (slant) rhymes, I have not found a better way to do this. As a literary tool, I've found LLMs to be seriously lacking in the ability to fuck around like this.

monokai_nl

0 replies

7h1m

2024-08-19 11:29:45 UTC

23 years ago I created a web app (That Can Be My Next Tweet) that could generate your future tweet by feeding all your past tweets in a Markov chain. When the Twitter API access was shut down, I took it offline. At the time it went super viral for a while, maxing out to 2 tweets per second.

It got some press and just now I went back to a Ted Talk of Adam Ostrow (Mashable), briefly showcasing this web app. He stated: you can imagine what something like this can look like 5, 10 or 20 years from now, and hinted at hyper-personalized communication AIs.

By no means was my web app any foundation of the LLMs today, but it's interesting nonetheless how relatively simple techniques can trigger ideas of how future scenarios could look like.

meowface

0 replies

15h35m

2024-08-19 02:56:07 UTC

Many years ago I trained a Markov chain chat bot on all of the conversation history in a group chat with me and my friends. It was regularly incredibly hilarious. I also would train it on subsets of them, like two of the friends but no one else, and name that version of the bot a combination of their names. Something about it was just so funny.

maxglute

0 replies

15h37m

2024-08-19 02:54:29 UTC

I use to be unreasonably entertained tapping the suggest next word feature on google keyboard.

magicalhippo

0 replies

12h46m

2024-08-19 05:45:24 UTC

I've also enjoyed the Postmodernism essay generator[1], which uses[2] a recursive transition network[3] to generate the text.

[1]: https://www.elsewhere.org/journal/pomo/ (refresh for new, random essay)

[2]: https://www.elsewhere.org/journal/wp-content/uploads/2005/11...

[3]: https://en.wikipedia.org/wiki/Recursive_transition_network

macNchz

0 replies

6h53m

2024-08-19 11:37:43 UTC

Something about the not-quite-rightness of Markov output has always really amused me, like I can’t help but giggle at it.

My all-time favorite in this vein was @erowidrecruiter on Twitter, which generated posts with Markov chains from a corpus of tech recruiter emails and drug experience reports from erowid.org. Still up but no longer posting: https://x.com/erowidrecruiter?lang=en

larodi

0 replies

10h56m

2024-08-19 07:34:51 UTC

Markov chains are funnier for one very basic reason:

you can teach them to 5th or 9th graders.

LLMS you can not, or at least it will take insane amount of allegory to do so. Markov chains are very tightly related regex, and one may be surprised that there is a probabilistic regex. Also to the graphical structure of Markov chains is a lot like a FSM, and FSM perhaps can be explained to very small children :D

killerstorm

0 replies

5h46m

2024-08-19 12:44:33 UTC

A small language model can be quite like Markov chain, e.g. the one from Karpathy's 2015 post was pretty nonsensical: https://karpathy.github.io/2015/05/21/rnn-effectiveness/

So you can just scale down if it still makes sense.

Also you get a lot more from the base model. GPT-3 was versatile as it could continue any context. Modern LLMs are try-hards. If you want to generate humor with LLM really worth going for base model with multiple examples in the prompt.

johntb86

0 replies

14h38m

2024-08-19 03:53:26 UTC

LLMs can be funny. For example, look at Golden Gate Claude ( https://news.ycombinator.com/item?id=40459543 ). But they're not good at intentionally being funny, so we need to break them to get absurdist humor instead.

jablongo

0 replies

3h41m

2024-08-19 14:49:37 UTC

This person should’ve tried adjusting the temperature to their LLM before making this pronouncement

ineptech

0 replies

13h22m

2024-08-19 05:09:10 UTC

Relatedly, the author Robert Anton Wilson tried to manually implement a Markov-chain-bot in the 1960s by cutting up passages from various books, selecting them from a hat, and looking for overlapping words and phrase fragments. The best results were included in the Illuminatus! trilogy, some attributed to a character named Markoff Chaney.

hellointernet

0 replies

1h34m

2024-08-19 16:57:09 UTC

I used to have a markov-chain twitter bot that was trained on a few of my friends twitter posts. It was funny because that included hastags, tagged users, and images. There were some funny results and this was long before LLMs.

fsndz

0 replies

9h48m

2024-08-19 08:43:31 UTC

the original stochastic parrot

fidotron

0 replies

2h32m

2024-08-19 15:59:18 UTC

We seem to be encountering more phenomena in tech where the increasing computing demands to improve the output exceeds any exponential growth in computing capacity. Graphics was the poster child for this, for years, and now the whole ML and AI space has discovered big machines make huge differences.

Markov chains are almost trivial to implement and run on small devices. A slightly extreme example is a rock, paper, scissors game I did that worked this way: https://luduxia.com/showdown/ The actual browser side markov chain implementation of that took something like 2-3 hours.

drusepth

0 replies

51m

2024-08-19 17:39:58 UTC

I posted "How to Speed Up Your Computer Using Google Drive as Extra RAM" [0] ~10 years ago and have had hundreds of people over the years ask if I used Markov chains to generate the funny bits (I didn't). I've had 0 people ask if I used LLMs on any of my comedy pieces. Seems QED to me.

[0] http://drusepth.net/series/how-to-speed-up-your-computer-usi...

dimatura

0 replies

16h8m

2024-08-19 02:23:02 UTC

The evolution of the AI weirdness blog (https://www.aiweirdness.com/) over the past few years somewhat supports this idea, except that the author got plenty of funny results with LLMs - but mostly the early ones, up to around GPT3 (and smaller variants of GPT3). For example, cereal names generated by the Ada version of GPT were much funnier than the Da Vinci ones: https://www.aiweirdness.com/new-breakfast-cereals-from-ai/

dghf

0 replies

9h38m

2024-08-19 08:53:28 UTC

I remember the KJV/SICP mash-up. My favourite, both for funniness and emotional truth:

And Satan stood against them in the global environment.

canjobear

0 replies

16h56m

2024-08-19 01:35:26 UTC

“Talk to Transformer” GPT-2 was very funny, then GPT-3 was not funny at all. We blew through the uncanny valley in a couple of years.

benob

0 replies

10h20m

2024-08-19 08:11:05 UTC

LLMs are high-order Markov chains with a factored estimator. N-gram LMs are also Markov chains with a different estimator.

atoav

0 replies

12h16m

2024-08-19 06:15:30 UTC

The cool thing about market chains is that you can tune the probability with which it will jump between "genres" of it's training-content by making the analyzed N-Grams shorter or longer.

E.g. it starts out as a news headline and ends with a bible verse.

at_a_remove

0 replies

14h25m

2024-08-19 04:06:02 UTC

I used to keep a text file on hand back when people had some irritating Markov chain bots on IRC. Fed enough of the lines, the bots veered off into some new valley of chatter, appearing quite mad. Used to get some "What did you DO to my bot?" indignant messages. I wonder if I still have that file somewhere.

Miraltar

0 replies

6h27m

2024-08-19 12:03:36 UTC

I wouldn't say that LLMs aren't fun, it's more that jokes out of context aren't fun. If you ask somebody to just tell you a joke, there's a very low chance that you'll actually laugh, especially if it's in text format. But I've had some serious laugh looking at some of dougdoug videos featuring LLMs

Alifatisk

0 replies

7h25m

2024-08-19 11:05:52 UTC

Wasn't Microsoft Tay built using Markov chain?

6510

0 replies

13h17m

2024-08-19 05:14:04 UTC

I’ve heard people claim that comedy cannot be generated by an algorithm. If it wasn’t obvious, I disagree with this. I think comedy can be analyzed and measured. I think, given a large enough grant, we could probably generate comedy on demand. I think it’s doable - not that we should do it.

I use to think it came naturally, then someone had a book case full of books about humor. (wtf?) Apparently they have it down to a science.

I learn the difference between someone funny and a professional comedian is that the later finds additional punch lines for a joke. It then described a step by step process going from a silly remark to a birthday joke comparing various modular developments into a kind of dependency hell complete with race conditions until the state object is carefully defined and the plot has the punchlines all sorted from the barely funny to the truly hilarious. It was more engineering than CS.

The funniest seeBorg message was 10 minutes after a heated discussion that resulted in tanktop, a moderator, getting banned from a project. The bot wrote: Tanktop is Hitler! At that point it took 2 days for the humans to figure out what the next word was suppose to be.

29athrowaway

0 replies

12h59m

2024-08-19 05:31:39 UTC

It's all connected, bro. Language models are connected to compression, to n-gram models, to Markov chains, to everything.