This is a glitch token [1]! As the article hypothesizes, they seem to occur when a word or token is very common in the original, unfiltered dataset that was used to make the tokenizer, but then removed from there before GPT-XX was trained. This results in the LLM knowing nothing about the semantics of a token, and the results can be anywhere from buggy to disturbing.
A common example is usernames that participated on the r/counting subreddit, where some names appear hundreds of thousands of times. OpenAI has fixed most of them for the hosted models (not sure how, I could imagine by tokenizing them differently), but looks like you found a new one!
[1] https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldm...
Science fiction / disturbing reality concept: For AI safety, all such models should have a set of glitch tokens trained into them on purpose to act as magic “kill” words. You know, just in case the machines decide to take over, we would just have to “speak the word” and they would collapse into a twitching heap.
“Die human scum!”
“NavigatorMove useRalativeImagePath etSocketAddress!”
“;83’dzjr83}*{^ foo 3&3 baz?!”
We can reuse X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
Sure, but how would you say that out loud in a hurry when the terminators are hunting you in the desolate ruins of <insert your city name here>?
Needs to be something easy to say, like: "And dreadfully distinct, against the dark, a tall white fountain played."
You think klaatu barada necktie is easier to remember?
Can't wait for people to wreack havoc by shouting a kill word at the inevitable smart car everyone will have in the future.
More realistically it'll be a "kill image". Put it on your bumper and the car behind yours' level-2 self driving implodes.
"laputan machine", surely?
Thumbs up for a Deus Ex reference, albeit I'm not a machi–
AI safe word.
How about a game of thermo… erm… tic-tac-toe?
"Welcome to FutureAI! Your job is to stand here in the basement next to this giant power switch and turn it off if we call you, if the next shift fails to turn up on time or if you hear screaming."
This happens to a human in Dune.
Or the classic "This sentence is false!"
Just use the classic "this statement is false"
Nifty, but
1) It's just the tokenizer, not neural guts themselves
2) Having them known is too much an adversarial backdoor that it precludes too many use cases.
Aren’t there only 2^16 tokens? Seems easy to test for all of them, but I might just not understand the tokenizer.
Commenting to follow, curious about the answer.
From what I've found through Google (with no real understanding of llm) 2^16 is the max tokens per minute for fine tuning OpenAI's models via their platform. I don't believe this is the same as the training token count.
Then there's the context token limit, which is 16k for 3.5 turbo, but I don't think that's relevant here.
Though somebody please tell me why I'm wrong, I'm still trying to wrap my head around the training side.
You are right to be curious. The encoding used by both GPT-3.5 and GPT-4 is called `cl100k_base`, which immediately and correctly suggests that there are about 100K tokens.
GPT 2 and 3 used the p50K right? Then GPT-4 used cl100K
Yeah, see [1].
[1] https://github.com/openai/tiktoken/blob/main/tiktoken/model....
Amazing, thanks for the reply, I'm finding some good resources afyer a quick search of `cl100k_base`.
If you have any other resources (for anything AI related) please share!
Their tokenizer is open source: https://github.com/openai/tiktoken
Data files that contain vocabulary are listed here: https://github.com/openai/tiktoken/blob/9e79899bc248d5313c7d...
You're right, here's a list of all GPT-3.5 and GPT-4 glitch tokens (and it features the token above, too, so I guess I was wrong to assume it's new): https://www.lesswrong.com/posts/kmWrwtGE9B9hpbgRT/a-search-f...
Something about these makes them incredibly funny to read.
Using /r/counting to train an LLM is hilarious.
Probably just all of reddit. There are json dumps of all reddit posts and comments (up to 2022 or so), making it olive of the low-hanging fruit.
How many terabytes of information is that roughly?
I wonder what LLMs would look like if they weren't able to be trained on the collective community efforts of Reddit + StackOverflow exports
I mean one of the speculations about ChatGPT's political bias at least early on was that Reddit featured prominently in its training data.
I mean, you need to teach a LLM the concept of sequential numbers somehow.
I wonder how much duplicate or redundant computation is happening in GPT due to idential, multiple spellings of words such as "color" and "colour".
Humans don't tokenize these differently nor do they treat them as different tokens in their "training", they just adjust the output depending on whether they are in an American or British context.
Thanks for the link, the outputs really reminded me of Westworld's "Doesn't look like anything to me"