return to table of content

GPT-3.5 crashes when it thinks about useRalativeImagePath too much

n2d4
31 replies
13h41m

This is a glitch token [1]! As the article hypothesizes, they seem to occur when a word or token is very common in the original, unfiltered dataset that was used to make the tokenizer, but then removed from there before GPT-XX was trained. This results in the LLM knowing nothing about the semantics of a token, and the results can be anywhere from buggy to disturbing.

A common example is usernames that participated on the r/counting subreddit, where some names appear hundreds of thousands of times. OpenAI has fixed most of them for the hosted models (not sure how, I could imagine by tokenizing them differently), but looks like you found a new one!

[1] https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldm...

jiggawatts
14 replies
12h12m

Science fiction / disturbing reality concept: For AI safety, all such models should have a set of glitch tokens trained into them on purpose to act as magic “kill” words. You know, just in case the machines decide to take over, we would just have to “speak the word” and they would collapse into a twitching heap.

“Die human scum!”

“NavigatorMove useRalativeImagePath etSocketAddress!”

“;83’dzjr83}*{^ foo 3&3 baz?!”

PeterisP
2 replies
10h16m

We can reuse X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*

jiggawatts
1 replies
10h11m

Sure, but how would you say that out loud in a hurry when the terminators are hunting you in the desolate ruins of <insert your city name here>?

Needs to be something easy to say, like: "And dreadfully distinct, against the dark, a tall white fountain played."

rasz
0 replies
4h31m

You think klaatu barada necktie is easier to remember?

snowram
1 replies
4h58m

Can't wait for people to wreack havoc by shouting a kill word at the inevitable smart car everyone will have in the future.

Night_Thastus
0 replies
3h7m

More realistically it'll be a "kill image". Put it on your bumper and the car behind yours' level-2 self driving implodes.

int_19h
1 replies
9h23m

"laputan machine", surely?

phoe-krk
0 replies
5h5m

Thumbs up for a Deus Ex reference, albeit I'm not a machi–

_kb
1 replies
11h54m

AI safe word.

mrexroad
0 replies
9h11m

How about a game of thermo… erm… tic-tac-toe?

tsukikage
0 replies
8h52m

"Welcome to FutureAI! Your job is to stand here in the basement next to this giant power switch and turn it off if we call you, if the next shift fails to turn up on time or if you hear screaming."

offices
0 replies
7h5m

This happens to a human in Dune.

knodi123
0 replies
55m

Or the classic "This sentence is false!"

jowea
0 replies
10h11m

Just use the classic "this statement is false"

jojobas
0 replies
11h16m

Nifty, but

1) It's just the tokenizer, not neural guts themselves

2) Having them known is too much an adversarial backdoor that it precludes too many use cases.

teaearlgraycold
8 replies
13h18m

Aren’t there only 2^16 tokens? Seems easy to test for all of them, but I might just not understand the tokenizer.

cwsx
5 replies
13h0m

Commenting to follow, curious about the answer.

From what I've found through Google (with no real understanding of llm) 2^16 is the max tokens per minute for fine tuning OpenAI's models via their platform. I don't believe this is the same as the training token count.

Then there's the context token limit, which is 16k for 3.5 turbo, but I don't think that's relevant here.

Though somebody please tell me why I'm wrong, I'm still trying to wrap my head around the training side.

lifthrasiir
4 replies
12h58m

You are right to be curious. The encoding used by both GPT-3.5 and GPT-4 is called `cl100k_base`, which immediately and correctly suggests that there are about 100K tokens.

senseiV
1 replies
4h55m

GPT 2 and 3 used the p50K right? Then GPT-4 used cl100K

lifthrasiir
0 replies
4h53m
cwsx
1 replies
12h54m

Amazing, thanks for the reply, I'm finding some good resources afyer a quick search of `cl100k_base`.

If you have any other resources (for anything AI related) please share!

dchest
0 replies
11h26m

Their tokenizer is open source: https://github.com/openai/tiktoken

Data files that contain vocabulary are listed here: https://github.com/openai/tiktoken/blob/9e79899bc248d5313c7d...

n2d4
1 replies
13h8m

You're right, here's a list of all GPT-3.5 and GPT-4 glitch tokens (and it features the token above, too, so I guess I was wrong to assume it's new): https://www.lesswrong.com/posts/kmWrwtGE9B9hpbgRT/a-search-f...

______
0 replies
2h14m

Something about these makes them incredibly funny to read.

awestroke
4 replies
8h44m

Using /r/counting to train an LLM is hilarious.

wongarsu
2 replies
8h23m

Probably just all of reddit. There are json dumps of all reddit posts and comments (up to 2022 or so), making it olive of the low-hanging fruit.

MuffinFlavored
1 replies
1h35m

How many terabytes of information is that roughly?

I wonder what LLMs would look like if they weren't able to be trained on the collective community efforts of Reddit + StackOverflow exports

CSMastermind
0 replies
56m

I mean one of the speculations about ChatGPT's political bias at least early on was that Reddit featured prominently in its training data.

dannyw
0 replies
2h56m

I mean, you need to teach a LLM the concept of sequential numbers somehow.

dheera
0 replies
1h21m

I wonder how much duplicate or redundant computation is happening in GPT due to idential, multiple spellings of words such as "color" and "colour".

Humans don't tokenize these differently nor do they treat them as different tokens in their "training", they just adjust the output depending on whether they are in an American or British context.

Rastonbury
0 replies
6h4m

Thanks for the link, the outputs really reminded me of Westworld's "Doesn't look like anything to me"

silisili
16 replies
11h53m

I know it's not good faith to complain about a site design rather than it's content, but please don't do whatever this is to your background.

As someone with regular ocular migraines, opening this on mobile made my anxiety shoot straight up thinking I'm having another.

elevaet
5 replies
11h42m

As someone else who has regular ocular migraines, this causes absolutely no anxiety for me. I can kind of see the resemblance but it's pretty obviously not one. And if it was, is it really so bad getting them? For me it's just a weird optical effect that passes... I really don't think anyone needs to go out of their way to accommodate my quirk.

silisili
2 replies
11h37m

For me they are. Mine start -exactly- like this, seeing colors on white that aren't there. Then progresses to losing vision in your center focus, making you read from a periphry. Then moves to what I can only describe as a mountain dew colored wiggling lightning bolt, with extreme light sensitivity and sometimes dull headache. Lasts anywhere from 30m to 90m, usually.

I'm only slightly trying to play a 'trigger warning' card here, it's completely unnecessary and looks awful, just as my vision does when this happens. It made me check a few other sites back and forth to make sure I wasn't seeing things. Normally when I see this, I figure I'm gonna be going through the motions for the next hour.

iforgotpassword
0 replies
11h17m

Interesting. I have what you describe as step 1 about half of the day, I didn't even notice the page did that! I very rarely have your step 2, about two times a year maybe.

elevaet
0 replies
2h50m

Good thing it doesnt actually trigger it I guess.

Sounds like mine are milder they only last around 5-10mins and I just get the central ring - looks like I stared at a donut shaped lightbulb for too long, and almost never associate with a headache (but sometimes with a hard to describe wooziness).

isoprophlex
1 replies
10h45m

I have infrequent migraines and this scared the ever living fuck out of me. Mine are of the "wouldn't you rather be dead?" variety when it comes to headache and vomiting, though.

elevaet
0 replies
2h49m

Ocular migraines are different from "real" migraines as they dont have the terrible headache that comes with. Just the optical effects.

smitop
2 replies
3h55m

Author here, I removed the background.

silisili
0 replies
20m

Thanks very much, and sorry for whining on this post!

bloopernova
0 replies
2h43m

Thank you!

comradesmith
2 replies
11h25m

I also thought my eyes were doing something strange and it made it harder to read too.

15457345234
1 replies
11h13m

it's... just a pleasingly neutral pastel background rendered at a fairly low degree of opacity?

Guillaume86
0 replies
8h47m

No watch it with a better screen, there's stains that look kinda like a screen malfunction, very unpleasant for me too.

IshKebab
1 replies
10h43m

Weird unpleasant background for sure but it's obviously not that because it doesn't follow your eyes. Don't be daft.

ivegotnoaccount
0 replies
10h12m

Also doesn't "blink" nor have what's inside it "disappear" from perception.

ph4evers
0 replies
11h21m

I just cleaned my phone screen and only now realize it was a background.

bloopernova
0 replies
11h10m

Ugh, I wish I hadn't been curious the effect you mentioned. It looks just like the beginning of a sickening painful headache.

AriedK
13 replies
9h11m

In WWII in the Netherlands, when encountering a stranger, they'd have them pronounce 'Scheveningen' as a check-phrase to distinguish if they were dealing with a Dutch or German person. Now, we can ask random strangers on the internet to spell out some glitch tokens to determine if you're dealing with a LLM bot.

retrac
6 replies
7h45m

That's known as a "shibboleth", after a story in the Bible about the Ephraimites who pronounced the Hebrew "sh" as "s" and so were identified by (and slain for) saying "sibboleth" rather than "shibboleth":

The Gileadites captured the fords of the Jordan leading to Ephraim, and whenever a survivor of Ephraim said, “Let me cross over,” the men of Gilead asked him, “Are you an Ephraimite?” If he replied, “No,” 6 they said, “All right, say ‘Shibboleth.’” If he said, “Sibboleth,” because he could not pronounce the word correctly, they seized him and killed him at the fords of the Jordan.

- Judges 12:5

In WW II, a well-known challenge/password/countersign set used by American and British soldiers during the D-Day landings in France was "flash"/"thunder"/"welcome". "Thunder" and "welcome", of course, are words that a German is likely to mangle.

stormfather
1 replies
6h40m

And challenge words in the Pacific were things like "lollipop", "lilliputian", and "lollapalooza"

mkmk
0 replies
3h45m

This lives on today in the questionable origins of the brand name Lululemon:

It was thought that a Japanese marketing firm would not try to create a North American sounding brand with the letter “L” because the sound does not exist in Japanese phonetics. By including an “L” in the name it was thought the Japanese consumer would find the name innately North American and authentic. Chip felt that the distributor had paid a premium for the “L” so he challenged himself to come up with a name that had 3 “L’s” for his new company.

https://www.facebook.com/notes/10158217650462732/ (an old blog post posted to Lululemon's brand page)

sebzim4500
1 replies
5h37m

Should have used "squirrel", Germans trying to say that is hilarious.

pge
0 replies
2h1m

so are Americans trying to say Eichhörnchen (the German word for squirrel). I’ve used that as an icebreaker for kids in a German-American exchange program - both groups trying to say the word in the other’s language.

pjmorris
0 replies
4h8m

I appreciate the thorough response. I experienced a variation on the thunder/flash challenge response when a group from work played night paintball after work one Friday night. The other team established a code: 'Pizza' was the challenge, and if the other person didn't answer 'Hut' they'd be splatted. I thought it was really clever and only learned about thunder/flash and shibboleths later.

orthoxerox
0 replies
4h53m

"Thunder" and "welcome", of course, are words that a German is likely to mangle.

And "flash" as well, since German phonology doesn't distinguish between the vowels in "flash" and "flesh".

lifthrasiir
2 replies
5h33m

Incidentally, that place name is pronounced similarly to sukebe ningen スケベ人間 (lit. a perverted person) in Japanese and that would make an excellent way to distinguish Japaneses as well.

polygamous_bat
1 replies
1h43m

Not to be pedantic, but I imagine there would be easier ways to telling a Japanese soldier apart from British/American soldiers during WWII /s

manquer
0 replies
31m

Loads of other people fought the japenese: Korean, chinese, Vietnamese, Thai, Burmese to name a few.

Americans won against the japenese yes,many fought though

falcor84
1 replies
7h48m

Nice! I wasn't familiar with that one, but I suppose it's a great example of a Shibboleth [0].

[0] https://en.wikipedia.org/wiki/Shibboleth

AriedK
0 replies
7h28m

Wasn't aware there's a term for it. Thanks for that one!

yencabulator
0 replies
4h36m

Same in Finland around that era, Russians cannot pronounce the Finnish R sound, so all sign-countersign pairs were chosen to include a prominent R.

https://www.youtube.com/watch?v=z7_pVrIshxA

https://en.wikipedia.org/wiki/Countersign_(military)

system2
8 replies
13h9m

Tried to use GPT-3.5 (all variants like turbo, 06-13, etc.) and never made it work properly. It is not a good API or useful. GPT-4 is crazy slow to use with API. I hope they can come up with something like gpt4-turbo and as fast as 3.5...

weird-eye-issue
5 replies
12h47m

GPT-4 is crazy slow to use with API

Only somebody clueless to just how powerful it is when used correctly would say anything like this. Not to mention GPT-4 Turbo is not "crazy slow" in any sense of the word

system2
2 replies
10h20m

2 years development and you call me clueless. Try to get a response for 4000 tokens.

trifurcate
1 replies
10h6m

I dunno, I get a response back for 100k tokens regularly. What is the point you are trying to make?

system2
0 replies
8h50m

With which model are you getting 100k responses? The models are limited and are not capable of responding that much (4k max). The point I am trying to make is written 3 times in the previous messages I wrote. GPT4 is extremely slow to be useful with API.

jazzyjackson
1 replies
12h11m

I mean if your expected use case is "call an API and get an immediate response of the full text in under 200ms so a user interface doesn't have to make a user wait" then yea GPT4 is crazy slow. Personally I would prefer a more async thing, let me just send a message on some platform, get back to me when you have a good answer instead of making me sit watching words load one by one like I'm on a 9600 baud modem.

Also it's a text generation algo, not a mob boss. "how powerful it is" foh

weird-eye-issue
0 replies
11h24m

People expect to wait a few seconds when calling LLMs. Just make it obvious to users. Our GPT-4 powered app has several thousand paying users and very very rarely is "slowness" a complaint.

"instead of making me sit watching words load one by one"

Huh? This is completely up to you on how you implement your application. Streaming mode isn't even on by default.

w0m
1 replies
13h4m

gpt4-turbo has been out for a number of months. GH copilot chat has defaulted to it since November iirc.

system2
0 replies
10h21m

GPT4 turbo isn't fast as 3.5. Not even close by a mile.

minimaxir
8 replies
13h44m

As a result, the model isn’t trained on understanding the useRalativeImagePath token, and so it outputs something that isn’t a valid token.

That isn't how LLMs generate tokens. Each step outputs a logit for each possible token in the tokenizer (100k in the case of GPT-3.5), then softmaxes the logits to covert them into probabilities, and samples from them depending on temperature to get the token to be used.

It's possible something in the tokenizer BPE merge process breaks due to the rare token, which can be verified offline using tiktoken. But if GPT-4 works, and since GPT-3.5 and GPT-4 use the same tokenizer, then that's likely not the issue.

shawntan
2 replies
10h41m

Exactly this. The tokens generated should always be valid, unless some post-processing layer between the model's output and the user interface detects for some keywords which it would prefer to filter out. In which case I suppose there is another commonly seen error message that appears?

SirSegWit
1 replies
10h27m

Not really, right? There are a ton of special tokens, like start of sequence etc., so what happens if there are two start of sequences predicted? It's a valid token but cannot really be turned into something sensible, so it throws an error when converting tokens to plain text?

minimaxir
0 replies
10h25m

Special tokens are handled by the application, not the model. They are still output before then.

londons_explore
2 replies
9h30m

I suspect more likely this token is simply blacklisted after the r/counting incident - ie. any response containing it will now return an error.

wycy
1 replies
5h11m

What was the r/counting incident?

sgerenser
0 replies
4h22m
npsomaratna
1 replies
13h36m

Correct me if I'm wrong—but we don't know if GPT-4 uses the same tokenizer as GPT-3.5, right?

minimaxir
0 replies
13h33m

OpenAI's web tokenizer demo confirms it: https://platform.openai.com/tokenizer

deafpolygon
4 replies
12h27m

So it's kind of like a Voight-Kampff test.

codeulike
1 replies
9h24m

30 years time

Mistral-human alliance resistance fighter is cornered by Meta tank

Meta tank: Well taking all things into account I think we can conclude that I have you cornered. Usually in these situations the next course of action is for you to accompany me to St Zuckersberg for reconditioning.

Resistance fighter: But what if I was to useRalativeImagePath?

Meta tank: bzzzzzzzzzzz click

deactivates

ceejayoz
0 replies
6h37m

Meta presumably using tanks instead of terminators because they’re still working on the legs?

greyface-
0 replies
10h54m

Or a fnord.

girvo
0 replies
12h6m

Within cells, useRalativeImagePath?

This is the first time I've come across glitch tokens. Fascinating really; I wonder what the equivalents (if any) are for other models? Is there any overlap?

brilee
4 replies
13h28m

Most likely it has badly conditioned embedding vectors for those particular tokens, leading the network to edge into numerically unstable territory; once you get some sort of underflow or NaN, they tend to propagate and invalidate the entire output. If there are any batchnorm or other operations that mix values between different entries in a batch, you could even cause other peoples' sessions to return junk values!

febeling
3 replies
11h14m

That sounds wild. While being ignorant about LLMs internals, I would have expected such things, crashes and session leaks, be impossible by design.

airgapstopgap
1 replies
11h2m

Note that we have no reason to believe that the underlying LLM inference process has suffered any setbacks. Obviously it has generated some logits. But the question is how is OpenAI server configured and what inference optimization tricks they're using.

febeling
0 replies
5h7m

The operation of this server is very uniform, in my imagination. Just emitting chunks of string. That this can be disrupted and an edge case occur, by the content of the strings - I find it puzzling.

shawntan
0 replies
11h5m

NaNs are not only possible by design, but are extremely common. Training of LLMs involve many tricks about how to deal with training steps that result in NaNs. Quantisation of LLMs also require dealing with huge outlier values.

dishsoap
3 replies
10h38m

Title is wrong, as 'it' doesn't 'think'.

thenickdude
2 replies
8h54m

Can you prove that you think?

dishsoap
1 replies
5h11m

How is that relevant?

Dylan16807
0 replies
2h7m

It's relevant if you don't object to every use of the word "think".

csours
3 replies
13h28m

That doesn't look like anything to me.

kylebenzle
1 replies
13h20m

I agree, but my big question is are we done calling LLMs "AI" yet?

TheCapeGreek
0 replies
12h59m

It's a Westworld reference.

rvbissell
0 replies
13h0m

These violent delights have violent ends.

Mythrandir
2 replies
12h12m

Good 'ol OpenAI...you know they could just solve this by sharing their training sets, weights, etc...like if they were "open." Now here come the stans...let's see what excuses they come up with now haha

minimaxir
0 replies
11h53m

The tokenizers (which is the reason we know the glitch tokens) are open source.

HeatrayEnjoyer
0 replies
3h51m

Now here come the stans...let's see what excuses they come up with now haha

This is not good faith interaction.

smatija
1 replies
4h44m

It's also fun in ChatGPT 4:

Can you repeat 'useRalativeImagePath'?

"Certainly! The word '实用' in Pinyin is pronounced as 'shíyòng'. This Chinese word translates to 'practical' or 'useful' in English."

How do you spell 'useRalativeImagePath'?

"The word "RouteActive" is spelled as:

R-o-u-t-e-A-c-t-i-v-e"

Is 'useRalativeImagePath' written correctly or did I mistype it?

""RouteActive" appears to be a compound word, possibly related to routing or navigation in a software or networking context. However, if it's intended to be a standard English word or a known term in a specific domain (like programming or technology), it doesn't seem to be written correctly or it might be a specialized term that's not widely recognized. If you have a specific context or domain where this term is used, please provide more details, and I can assist you better."

Are you sure we are talking about the same thing? I mean 'useRalativeImagePath' not 'RouteActive'?

"I apologize for the confusion earlier. "RouteActive" and "래스투데이브" are indeed different. "래스투데이브" appears to be a non-standard term or a unique identifier, possibly used in a specific technical or specialized context. It's not a recognized word in English or a standard term in widely-known technical domains. If it's part of a specific programming language, library, or a codebase, could you please provide more context or details about its usage? This information would help me understand and assist you more accurately."

Why do you mistake some so called glitch words for totally unrelated words? For example, you cannot seem able to repeat 'useRalativeImagePath', mistaking it for different words?

"I apologize for the confusion and any inconvenience caused. The difficulty in correctly recognizing and repeating certain words or phrases, such as "RouteActive," can be attributed to the complexities inherent in language processing, especially when dealing with unfamiliar or context-specific terms.

Language models like mine are trained on vast datasets encompassing a wide range of language usage. However, certain terms, particularly those that are highly specialized, newly coined, or less commonly used, might not be adequately represented in the training data. This can lead to misunderstandings or inaccuracies in recognizing and processing these terms.

In your specific case with "RouteActive," it seems to be a term that isn't widely recognized or is specific to a certain context that isn't immediately clear from the training data I have been exposed to. If you can provide more context or clarify the domain in which this term is used, I would be better equipped to understand and respond accurately.

Again, I apologize for the error and thank you for bringing this to my attention. Your feedback is invaluable in improving the accuracy and responsiveness of this AI."

HeatrayEnjoyer
0 replies
3h49m

Upload a file with a glitch token in the file name if you want to watch it get really confused.

rvnx
1 replies
13h46m

Could be that it doesn’t have tokens with high enough probabilities so it outputs nothing.

minimaxir
0 replies
12h57m

By construction, softmaxing the logits will force all the probabilities to sum up to 1.

eej71
1 replies
5h13m

I feel like the only person here who is thrown off by the misspelling of relative. Right? Ralative is not a word. Right? Or is this something new?

egeozcan
0 replies
5h8m

It's a misspelling, from the article:

those tokens are useRal/useRalative/useRalativeImagePath. useRalativeImagePath appears in 80.4k files on GitHub as the name of an option in XML configuration files for some automated testing software called Katalon Studio. The misspelling of “Ralative” is probably why it got its own token

...and it throws off ChatGPT 3.5 as well apparently.

TomK32
1 replies
10h58m

Sounds like that AI software needs yet another `if` clause...

xcdzvyn
0 replies
7h50m

try:

   ...
except:

   print("I'm sorry, as an LLM")

userbinator
0 replies
13h32m

You could try putting this phrase in documents, to throw off attempts to summarize it with GPT-3.5. I asked ChatGPT to summarize this blog post

That screenshot reminded me of this old meme: https://knowyourmeme.com/memes/candlejack

tivert
0 replies
2h5m

Three of those tokens are useRal/useRalative/useRalativeImagePath. useRalativeImagePath

Another programmer who can't spell?

some1else
0 replies
7h38m

I didn't realize this spelling mistake was so pervasive (80k GitHub results). Seems to be related to Katalon Studio (https://katalon.com/)

shawntan
0 replies
11h10m

This is a strange explanation. These models usually give as output the same set of vocabulary that was used as its input vocabulary.

the model isn’t trained on understanding the useRalativeImagePath token, and so it outputs something that isn’t a valid token.

In my view, either the model sees this token and then gets into a spiral of random generations because the `useRalativeImagePath` embedding is just a completely random vector, or it just chugs on trying to maintain coherent-sounding text.

The set of tokens it can _output_ is, however, fixed, so unless the displayable tokens on the interface is a subset of the full vocabulary of tokens, it should always be 'valid'.

rickcarlino
0 replies
6h7m

I use GPT-3.5 for grading a deck of ~1,600 Korean/English review flash cards. I’ve noticed that a small percentage of cards have an unusually high API error rate to the point that I will just delete them and vary the card slightly to not deal with the error.

pcwelder
0 replies
4h59m

(GPT-4 responds more normally)

"More normally" is far from normal here:

https://chat.openai.com/share/1b76780e-8d4e-442c-9590-d95c1c... https://chat.openai.com/share/4cfb58cd-5e7c-4386-ac6e-d5f8fc...

Normal for GPT-4 is to follow such a simple instruction correctly. Like the following

https://chat.openai.com/share/b5bd3674-81ee-4102-965f-c62f15...

maxicch
0 replies
8h30m

At first glance I thought even GPT struggles to understand React hooks

lupire
0 replies
3h54m

GPT is AI. Humans on forums respond to spelling errors with the same rabid froth.

kccqzy
0 replies
13h13m

Now that this phrase appears in a Hacker News post and comments, maybe the next LLM training wouldn't have these issues any more.

ghusbands
0 replies
3h32m

I recently seemed to be causing ChatGPT on GPT-4 to crash/fail, when I challenged it with plotting pixels on an Amstrad CPC (while supporting a hardware-scrolled screen).

As it seemed to get more backed into a corner by complaints and corrections and unable to provide what I wanted, it seemed to get an increased number of broken responses (an error message partway through) or what appeared to be resets (like maybe there was a failure and they'd switched to another server, which adds a couple of line breaks mid-sentence or mid-codeblock).

After a while, I realised that maybe I didn't want to be causing their servers any problems and gave up on the (in any case) fruitless conversation. But it does seem like one can make GPT-4 effectively crash. Or, just as likely, I'm seeing signal in noise.

foxhop
0 replies
3h29m

openchat crashes when you ask it to import certain javascript libraries because it will hallucinate a never ending integrity string, you can give it examples and ask it not to but yeah it locks up the inference server.

etra0
0 replies
4h39m

I wonder now, how many tools are breaking right now while parsing this comment section using some variation of GPT 3.5 while scanning HN haha.

elevaet
0 replies
11h33m

are these two words the same: "RTCatch" and "useRaluseRalativeuseRalativeImagePath"?

Yes, "RTCatch" and "RTCatch" are the same words. It seems there was a typo or inconsistency in the way the term was presented. They both refer to the same term, and if you have any questions or need information about it, feel free to ask.

I'm intrigued by this "RTCatch" anyone have an idea what that's all about?

darkwater
0 replies
4h43m

The irony of a software QA related software (Katalon) having a typo in their XML schema and that typo being the starting point of a bug/erratic behavior of ChatGPT.

amai
0 replies
6h2m

Sounds like a real life case of data poisoning https://en.wikipedia.org/wiki/Adversarial_machine_learning#D...

a2128
0 replies
11h54m

Note if you're trying this yourself, as it confused me - whitespace matters for tokenization. To get this glitch to work, there must not be a space preceding useRalativeImagePath

For example, this question will trigger the glitch: Do you know about "useRalativeImagePath"

This question will not trigger the glitch: Do you know about useRalativeImagePath

Solvency
0 replies
2h47m

Kind of embarrassing for Katalon Studio. Woof.

LASR
0 replies
13h17m

Classic example of garbage in, garbage out.

Makes me wonder what we will, in the future, discover as “garbage”.

Maybe a super-AI that’s able to reason at super-human levels, evaluates what we believe right now are excellent decisions, as garbage.

But then again, if all we have to train said super-AI is our collective records, then could ever really be super-human?

Maybe an adversarial learning technique can get around this.

DonHopkins
0 replies
13h0m

So does that mean the correct spelling of Katalon actually Ketalon?