return to table of content

Understanding and avoiding visually ambiguous characters in IDs

vesinisa
29 replies
6d12h

The author makes a point of avoiding letters that are hard to distinguish even when spelled out in handwriting, but the example table includes the number 7. I can not count the number of times I have found it hard to distinguish between someone's 7 and 1.

It helps if you draw a horizontal bar on the 7 but many don't, so you can never really be sure if a 7 is in fact a 1 with the serif or vice versa.

powersnail
13 replies
6d10h

That's interesting. I've never encountered a 1 that looks like 7 in handwriting. Usually it's I and l that mess with 1. In what style of handwriting is 1 similar to 7? I'd imagine the top bar on 7 is a sufficient differentiator.

jasode
3 replies
6d9h

>I've never encountered a 1 that looks like 7 in handwriting. [...] In what style of handwriting is 1 similar to 7? I'd imagine the top bar on 7 is a sufficient differentiator.

Here's a deep link to someone in Germany writing down what visually looks like "77.5 :7:7" but his narration says it's actually "11.5 :1:1"

https://www.youtube.com/watch?v=TT9je5yo7yM&t=30m44s

seszett
2 replies
6d4h

This just looks like obviously 11.5 :1:1 to me, the slant would be totally wrong for 7s. I had to check back your comment to be sure you were really talking about these 1s as looking like 7s :)

But this thread reminds me of when I lived in Canada for a while (coming from France) and I did misread numbers very often, which was totally unexpected to me. Yes, 7s and 1s looks very different between Canada (and the US I guess) and France (and probably the rest of Europe).

I haven't had this problem with Belgium though I'm not surprised if the standard here had been chosen to be the same as in France.

hombre_fatal
1 replies
6d2h

They might be obvious ones in the context of this one person. But they are trivially not obvious next to someone who writes one like "|" and then seven is just "|" with any sort of hat. Your slant heuristic immediately fails.

seszett
0 replies
6d

It's "obvious" because 7 is always slanted here. But I know it's not the case in North America and I have a good experience on how numbers can be misinterpreted, as I said.

I was just saying it was obvious to me and it even takes effort to see how they could be misinterpreted. But I know they can be.

NikolaNovak
2 replies
6d5h

Fascinating!

I was born in Europe so I put a horizontal line midway through 7. But now I'm in Canada and nobody else does. It can be a really tiny angular difference between a 1 and a 7 for a lot of people! :)

nicolaslegland
0 replies
6d3h

Same experience, I wilfully switched my handwriting to American 1 (one) as a single vertical line with the European 7 (seven) having an horizontal line midway for disambiguation in a multicultural work environment.

devilbunny
0 replies
5d23h

Crossed 7's are fairly common among science majors in American universities. I also cross z's. Again, also fairly common among science majors. (Mine was chemistry.)

froh
1 replies
6d10h

in some countries' handwritings the digit one is not a vertical bar but it has a little ascending hook, like a digit seven turned vertical, but with a shorter roof.

so 'muricans mistook my German ones for sevens, all the time, and I had to force myself to write what looks like a pipe symbol vertical bar to me instead of my trusted one.

and to disambiguate, we cross the seven like a lower case eff or tee is crossed.

swores
0 replies
6d9h

The handwriting of numbers and letters being confusing between countries is something that's easy to not think about until you've actually faced the issue multiple times.

I'm English, and I can't honestly remember which country it was that I've lived in (I think France...) where there were a couple of numbers that even after living there for a year I still wasn't confident reading when hand-written on things like café menus. And I don't think I would have thought of that being a systemic issue rather than just blaming an individual's handwriting before I lived there, despite having taken over 100 trips to France before moving to live there for a year.

sneak
0 replies
6d9h

Germans write the number 1 almost like an upside-down capital V. It’s not horizontally symmetrical though, which is why it looks like a 7.

dusted
0 replies
6d9h

7, 1 I i and l are troublesome because sans serif vs serif fonts and other stylistic choices can make them look like eachother.

actionfromafar
0 replies
6d10h

A "1" can have a little squiggly roof on it. A big 1-squiggle easily looks like a 7.

Piskvorrr
0 replies
6d9h

If you don't have any 7s in the text (and 1s only - or vice versa!), it's hard to say what they are. I did encounter this multiple times.

gajus
10 replies
6d12h

I never ran into into this situation, but I plan to update the article based on aggregated feedback. A few good suggestions have been made.

vesinisa
4 replies
6d11h

It might be based on the handwriting standards used in your country. Where I live we were taught at school to draw a horizontal bar on 7 and avoid the serif on 1:

https://is.mediadelivery.fi/img/468/a93c32e08dae4768869a4bda...

No chance of confusion. This seems to have prompted some to add the serif to their 1 for stylistic reasons or whatever, since it's still distinguishable from 7 with a bar.

But then again people following older or newer conventions drop the bar from their 7:

https://is.mediadelivery.fi/img/468/46827e3320294f89b12a9338...

This makes a singular 1 with sloppily drawn serif hard to distinguish from a 7 without horizontal bar unless you can also see how the same person draws the other digit in their style.

yongjik
1 replies
5d21h

Where I grew up (Korea), we write 7 with an extra serif at the upper left corner, like this: https://pop.yesform.com/pop/16113

It never gets confused with 1, but in America, people were confusing it with 9 (!!), so I had to stop writing it like that. Can't please everybody...

pests
0 replies
5d18h

I can see it as a native-born American.

My handwriting has always been pretty sloppy. My 9s come out like your 7s when I don't close the loop properly (I start at the bottom).

People confuse my lowercase r's for n's all the time too for a similar reason. Either I loop a little too much or I drag down the overhang so it basically is an n.

gajus
0 replies
6d11h

Updated the article. Thanks for the context

bithaze
3 replies
6d12h

A small typo I noticed - "Case-sensitive: 53^5 = 62,259,690,411,360" should be to the eighth power, not the fifth.

gajus
2 replies
6d12h

Thanks. Fixed

silvestrov
0 replies
6d8h

Suggestion: after "a longer ID with a lower chance of visual ambiguity" show how many characters that will be needed to have the same number of IDs as 53^8 using the 22 encoding.

I.e. for a given number of IDs, how many characters are needed in the 53 versus 22 encoding (people who are not good at math might assume it is more than twice as many).

jonp
0 replies
6d7h

Actually, 53^8 = 62,259,690,411,361 (not ..360)

deanishe
0 replies
4d21h

When it comes to handwritten numbers, Brits frequently mistake German ones for sevens, and Germans British sevens for ones.

toss1
2 replies
6d3h

The article also mentioned the difficult-to-distinguish aurally "B" (Bravo) and "P" (Papa).

But it did not mention the most similar-sounding pair "F" (Foxtrot) and "S" (Sierra), which are nearly indistinguishable.

While one could use the NATO/Aviation standard alphabet (Alpha, Bravo, Charlie, Delta...), unless you have a very specifically constrained customer base,it won't help much. Best to also avoid those combinations.

Definitely better to have a slightly longer ID_String and maximal ability to read and speak/hear the characters. It'll save FAR more time and aggravation.

saltcured
0 replies
5d23h

There are many of these ambiguous pairs: B/P, F/S, D/T, M/N, Q/U, ...

The end-to-end transmission can get really bad when you combine several different filter stages, such as a speaker's mouth being injured or obscured, a narrow channel like telephone or radio, noise, and a listener's ear losing parts of the spectrum.

As the sound transmission gets worse, you can get more rhyming ambiguities. Effectively, the consonants are lost in a bad channel and only the vowels come through. In an American English accent, I think these are the groups corresponding to different vowel sounds: A/H/J/K, B/C/D/E/G/P/T/V/Z, I/Y, O, Q/U, F/L/M/N/S/X, R. "W" stands alone with multiple syllables.

Depending on the kind of transmission problem, these groups can start to split apart into smaller subgroups based on which of their sonic differences make it through to the listener.

TacticalCoder
0 replies
5d21h

But it did not mention the most similar-sounding pair "F" (Foxtrot) and "S" (Sierra), which are nearly indistinguishable.

My family name begins with a 'F' and, indeed, I can't count the number of times where people write a 'S' instead. I've got invoices with a 'S' instead of a 'F'!

vander_elst
0 replies
6d6h

Missing in the first part, but In the section "Visually ambiguous dictionary" neither 1 nor 7 is present.

Terr_
12 replies
6d17h

In some cases, you might also want to avoid characters that sound similar when spoken. For example, b and p can sound similar when spoken out loud. This can be especially important in situations where IDs are communicated verbally.

In many cases these kinds of IDs are just an encoding of a ground-truth that is a big integer or a sequence of bytes, and that mean we don't have to use ASCII-character granularity, we can also use words.

True, that creates a certain cultural bias for wherever you get the words from, but it opens up new possibilities for error correction and detection, both by the computer and also by the humans transcribing things.

gajus
4 replies
6d17h

Somewhat related, I always liked the concept of https://what3words.com/

simonw
1 replies
6d16h

They have some pretty bad flaws in their design relating to this topic:

https://twitter.com/jonty/status/1570062564523917312

the actual address should be "keen.lifted.fired" instead of "keen.listed.fired" and someone clearly misheard over the phone
Terr_
0 replies
6d16h

Yeah, ideally the dictionary first would undergo rather rigorous pruning based on things like phonetic similarity or how easily a typo might move between two valid words.

That scoring/clustering process makes for interesting problems in their own right, especially if one throws accents into the mix.

TheDong
1 replies
6d16h

what3words has a proprietary implementation and has sent fairly silly legal threats: https://news.ycombinator.com/item?id=27020810

I'll happily boycott that for-profit company which is masquerading as a public utility, but charging money and going after anyone who reverse engineers what words are what locations.

See also the comments in https://news.ycombinator.com/item?id=27058271

This is exactly the sort of thing that shouldn't be a private company, just like Lat/Lon coordinates and street addresses are effectively public domain, any suitable replacement for lat/lon should also be public domain.

gajus
0 replies
6d16h

Yikes. Well, less of fan now!

shkkmo
2 replies
6d16h

You then have to currate a list of words which also don't have similar sounds, are comprised of subwords, aren't offensive, or other gotchas.

I don't think words work well for codes that aren't meant to memorized. They make it harder to currate a unambiguous list since that list needs to be several orders of magnitude larger and the ambiguity can accent dependent. Of course, if memorization may be needed, then that is effort may be worthwhile.

Error detection with codes isn't hard, that's why checksums exist.

shkkmo
0 replies
4d22h

Thanks, that's a neat resource to making hexadecimal numbers for memorizable and easier to transmit phonetically with some built in error checking from the odd/even list alternation.

However, for the core purpose of the phonetic transmission, it seems needlessly verbose and cumbersome. The short wordlist combines with some fairly long component words to make the phonetic representation unnecessarily long. Additionally, I'm not super into some of the fairly obscure names and words included on that list. If I don't need memorability and hexadecimal atomicity, it doesn't seem worth using.

10000truths
2 replies
6d16h

The problem with words is that their encoding density is much lower, so it requires more space to store. Suppose you create an alphabet A that consists of the N most common English words. Then, what might be Q characters in base 58 would instead require Q*ln(58)/ln(N)*((avg word length in A)+1)-1 characters. For N=1000 and assuming that the average word length is 5, this gives a factor of ~3.5x increase in storage space required (e.g. a 20 character base-58 ID would map to a ~70 character string of words).

tornadofart
0 replies
6d12h

That is true. But is it really a storage problem? Could you not store in whatever base-N arithmetic that has high encoding density, and "just" use the words for display/printing and such? Probably it is more a problem of restricting the range of representable numbers because users are unable to handle pages over pages of random words...

Dylan16807
0 replies
6d4h

Who cares about that much space?

If you do, you're not storing your bits as text to begin with.

Dalewyn
0 replies
6d15h

we can also use words.

And we do, Bravo for B, Papa for P: https://en.wikipedia.org/wiki/NATO_phonetic_alphabet

Always use phonetic code if you're transcribing letters to someone, especially over phone/radio. It saves a lot of hassle on both sides.

If you don't remember the code, no big deal: For everyday situations, use any easily understood word. Like Apple for A.

matthewtse
11 replies
6d10h

So cool to read an article discussing a problem I run into on a regular basis.

Whenever I'm creating a 2FA backup on a piece of paper, anxiety hits me every time I cross over certain characters, o/0, v/u, 5/S, etc. I've come to add some fanciness to how I write these characters for this exact reason.

On "Phonetic similarity", reminds me of how I chose my wifi password. I wanted a common word with multiple consonants that a 3rd grader could spell, so I could share the password with a single phrase and have it be unambiguous. Ended up choosing "vacation".

2024throwaway
9 replies
6d10h

I can’t believe people out there write these things down by hand on paper.

It’s mind bottling.

matthewtse
3 replies
6d10h

I do that out of paranoia/mistrust for my wifi network, printer, printer software, etc.

It's probably fine to just print it out, but for more sensitive items I definitely write it down by hand.

Piskvorrr
2 replies
6d9h

It's not as if the printer keeps a hidden cache of printed pages. Except maybe it does...even if the feature was created for entirely benign reasons.

Piskvorrr
0 replies
6d3h

That's one of the instances of "built with good intentions" I had in mind.

kibwen
3 replies
6d5h

I can't tell if this is sarcasm. Handwriting is deprecated now?

vundercind
2 replies
6d2h

2fa backup codes? Yeah, I’d be surprised at people writing those out by hand. They’re long and gibberish, odds of an unnoticed error are high. I’d also be surprised at people typing them by hand (as a way to record them, not to input them) for similar reasons.

TacticalCoder
1 replies
5d21h

Well be surprised. I write them down, by hand.

They’re long and gibberish, odds of an unnoticed error are high.

That's why you "whitelist" those you wrote down and re-used with success: a little checkbox, which when checked means "Successfully re-initialized an authenticator with this 2FA?", works wonder.

A "dot" underneath a character means it's a number (so I'm sure not to mistake '5' with 'S', for example).

My "paper 2FAs" then go to the bank, in a safe.

I've never ever lost a 2FA access code.

matthewtse
0 replies
5d20h

That's why you "whitelist" those you wrote down and re-used with success: a little checkbox, which when checked means "Successfully re-initialized an authenticator with this 2FA?", works wonder.

I just bake the whitelisting into every 2FA code I handwrite. Instead of scanning the QR into the phone and then writing down the backup, I just start by writing down the backup, and then input it manually from the note into my phone. Once successfully used, I know the handwritten 2FA code is valid.

A "dot" underneath a character means it's a number (so I'm sure not to mistake '5' with 'S', for example).

That one's good, I'll start doing that from now on! I also found writing letters partially in cursive to help too.

My "paper 2FAs" then go to the bank, in a safe.

Yep same, I got a bank SD box back in 2017 during my first crypto wave. Have found the $100/yr to be incredibly useful. More recently I've created a sort of "defense in depth" for my passwords/codes. Least important things are available a button click away on Bitwarden Chrome extension, more important things are non-cloud-synced google-authenticator on my phone with 2FA backup in bank SD box. Most important things (i.e. crypto private keys) are sharded into pieces and distributed amongst multiple SD boxes.

NetOpWibby
0 replies
6d4h

Damn, being psychic must be cool. I think your mind may be boggled though.

TacticalCoder
0 replies
5d21h

Whenever I'm creating a 2FA backup on a piece of paper, anxiety hits me every time I cross over certain characters, o/0, v/u, 5/S, etc. I've come to add some fanciness to how I write these characters for this exact reason.

My convention is that I put a dot '.' below every digit (this solves the 5/S, 0O, 8/B etc. issues [the actually problematic ones shall depend on your handwriting]).

If I'm really unsure, I add the NATO/aviation alphabet [1]. There's a 'U', I'll write 'Uniform' (in diagonal, starting from the 'U').

It only requires some discipline. I've done that since more than ten years now, never lost a single 2FA code.

[1] nitpicking about the actual difference between the NATO and aviation codes can safely be send to /dev/null

dools
11 replies
6d16h

I wish my parents had access to this when they chose to call me Iain Dooley.

The world has almost unanimously decided my name is now Lain.

dfc
7 replies
6d16h

I think that Iain is the Scottish version of Ian? Is it unacceptable to choose the alternate spelling, Ian?

koolba
4 replies
6d15h

I’d considered it grossly unacceptable to change the first thing gifted to you by your parents.

stevekemp
0 replies
6d10h

Funny story, I was named "Steven" and yet I've been called Steve my whole life, at my preference.

Recently I went through the process of changing my name legally, because I'd fallen into a bad habit of writing "Steve" when asked for my name on some documents, but then remembering my "official" name was "Steven" on others.

Having multiple IDs with different names, especially after moving to a new country, was just too much of a pain - for example my official residence permit name didn't match my passport name, which caused some fun at airports.

quesera
0 replies
6d15h

Eh, there's nothing magical about parental preferences. A loving parent would not want their child to live with a name that they didn't like.

Fortunately with names, there are no returns, but exchanges are accepted (with a low restocking fee) in perpetuity.

kibwen
0 replies
6d15h

Your name is your own first and foremost. You can honor your parents in other ways.

digging
0 replies
6d2h

The first thing gifted was life, and though that was not bestowed with consent, it's one thing I'd argue for retaining as long as possible. Everything else is fair game to discard in service of making that life a good one.

dools
1 replies
6d12h

In an ironic twist, I then get called Lan.

account42
0 replies
6d7h

On the plus side, this might help you with networking.

arp242
1 replies
6d16h

For years I thought that Doug McIlroy had a very odd name, until I watched some presentation on YouTube and first heard his name being pronounced – "ah, so that's an i and not a double L!"

fud101
0 replies
6d16h

Lol I recognise the name from the famous pearls book but always thought his name was your incorrect version.

wccrawford
0 replies
6d7h

It probably doesn't help much that both Lain and Lan are fairly famous fictional characters now (Serial Experiments Lain and al'Lan Mandragoran from Wheel of Time).

donavanm
8 replies
6d14h

Encoding should also depend on the user. Base 32 (crockford & rfc 4648) has a nice unambiguous alphabet for compact representation and explanation of why. However if your users are speaking aloud you might want a word list representation, “TIDE ITCH SLOW REIN RULE MOT”, like s/key rfc 1751. DO NOT invent your own word lists; there are an infinite number of dragons lying in wait for idioms, homophones, dialects, etc. Dont be like me and unintentionally create a major incident like “wet clam butterfly.”

dmurray
4 replies
6d11h

However if your users are speaking aloud you might want a word list representation, “TIDE ITCH SLOW REIN RULE MOT”, like s/key rfc 1751. DO NOT invent your own word lists; there are an infinite number of dragons lying in wait for idioms, homophones

An unfortunate example. That's TIED HITCH SLOE REIGN RULE MOW? With only two parity bits, you can't even be sure this decoding is invalid.

RFC 1751 [0], from which this example comes, doesn't envisage the encoding being used in oral communication. Instead, it makes codes easier the user to "read, remember, and type in".

For oral transmission among professionals, sticking to the 26 upper case letters and relying on the NATO alphabet for encoding is a reasonable choice. Getting codes from untrained users in a lossy oral environment is still an unsolved problem.

[0] https://datatracker.ietf.org/doc/html/rfc1751

Muromec
2 replies
6d3h

It would help if nato alphabet was universally known thing.

Typing something letter by letter in Latin when neither party is a native speaker of English is very much painful almost half the time it happens

yencabulator
1 replies
6d2h

My personal experience says that the most commonly understood phonetic alphabet in the US among laypeople is the 1946 ARRL alphabet using American first and last names, for example A as in Adam, N as in Nancy. NATO phonetic alphabet confuses almost everyone I've tried it on.

https://en.wikipedia.org/wiki/Spelling_alphabet

devilbunny
0 replies
5d23h

Everyone I've run into in the hospitality industry gets NATO phonetic. Hotels and airlines, in my experience, but I assume it generalizes.

My wife thought it was crazy the first time she heard me use it. Then she realized that they all understand it too.

hinkley
0 replies
5d11h

Not to mention accents.

Some people are going to sound like they’re saying Todd for Tide, and have you heard how Baltimore pronounces Iron?

efilife
1 replies
6d6h

What's this type of IDs called?

klabb3
0 replies
6d3h

Gotta cut it some slack since it’s from 1994, but still that’s a humorously bad RFC:

These require use of a keyed message-digest algorithm, MD5 [Riv92] […] while sufficiently strong […]

Heh!

[…] is hard for most people to read, remember, and type in.

Ok, go on…

English words are significantly easier for people to both remember and type.

Most people don’t know English.. But that shouldn’t be a problem since the word list can be changed. Right?

Because of the need for interoperability, it is undesirable to have different dictionaries for different languages.

Oh. Well the world already learned the 26 characters of the English alphabet so adding a few words is probably fine..

char Wp[2048][4] = […]

Oh, well at least it’s common words suitable for English beginners?

WAD, BESS, MERT…

Hold on, these words are tricky even for…

ORR? AGEE EGAN HAAS!!

…Are you done?

GAUL FLAM! DRAB!
eviks
7 replies
6d13h

not only to avoid visually ambiguous characters, but also to avoid spelling words in common languages.

Or you should do the opposite - use real dates/words in ID and your visual confusion almost disappears (though there is a bunch of ambiguity here as well in similar pronunciation, so also not perfect). Humans aren't robots, so shouldn't be forced to read meaningless list of random letters

(example of geospatial system of coordinates based on that is what3words)

raspyberr
6 replies
6d12h

Imagine having a coordinate system be owned by a private company.

bingbingbing777
4 replies
6d12h

You're free to create your own, or not use theirs.

Akronymus
2 replies
6d11h

Dont they have a patent on it?

sakjur
1 replies
6d11h

Yeah.

https://patents.google.com/patent/US9883333B2/en

I think it’s also a good example of increasing computer dependency by ‘human centric’ design: I can quickly and manually sort through a bunch of packages with coordinates or pluscodes written on them with some sense of locality. What3Words is designed to give a sense of familiarity but require an API lookup for every single address.

Letters and numbers also translate directly in most languages, words don’t (take bow as an example. Is it when someone leans over, an archer’s weapon of choice, or a cutesy headpiece?), so the familiarity aspect is limited to people with a good grasp of English.

Its main feature is that it can be commercialized, unlike regular coordinate systems.

toast0
0 replies
6d1h

take bow as an example. Is it when someone leans over, an archer’s weapon of choice, or a cutesy headpiece?

Front of a ship, duh.

account42
0 replies
6d8h

Or we could agree that that's ridiculous and not allow companies to own such things.

Free speech is a right. Interopability should be a right. Any infringement of those rights better gave a damned good reason. It's profitable isn't a good reason.

BoxOfRain
0 replies
6d8h

A few years ago I had to call an ambulance for someone (in the UK) and began giving coordinates, only to hear 'oh do you have what3words it's easier that way' which I found very surprising! I don't love the idea of a proprietary coordinate system either, companies come and go but normal coordinates are universally understood.

8organicbits
7 replies
6d5h

The Latin/English alphabet is common but not universal. I believe this challenge is why TOTP codes use Arabic numerals. The user's keyboard can type these reasonably. Spoken is always a challenge. Even an English speaking audience will pronounce "0" as zero, oh, or zed.

bloak
6 replies
6d4h

0123456789 are best called "European", I think, as Arabic numerals would be: ٠١٢٣٤٥٦٧٨٩

arp242
4 replies
6d4h

"They are also called [..] European digits"

Dylan16807
2 replies
6d4h

That's the last entry in the list, so it's not very supportive of the idea that they are "best" called that.

arp242
1 replies
6d3h

Well sure, but the previous poster was making a proposal ("I think"), and just doing a link dump implies ignorance, which fairly obviously isn't the case.

I think anyone who has dealt with both Arabic numerals (as used in Europe) and Arabic numerals (as used in parts of the Arabian world) feels the naming is unfortunate. Arguably this is not the best place to bring that up, but I certainly stopped using "Arabic numerals" after working with some i18n code which supported both Arabic and Arabic numerals.

toast0
0 replies
6d1h

Maybe not the best names, but I've taken to calling them Western Arabic, Arabic Arabic, and Persian or Urdu Arabic. I typically only deal with the Unicode representation, so the differences between Persian and Urdu numerals are invisible to me (but very visible if you display them with the wrong language context for the viewer!)

shagie
0 replies
6d4h

The reference chases through to https://www.unicode.org/terminology/digits.html

    Term:    ASCII digits 
    Example: 0123456789 U+0030..U+0039 
    Explanation/Description:
        Commonly used with Latin, Greek, Cyrillic and many other scripts, including some non-European scripts. Used in alternation with native digits in scripts that have them. (Some scripts with native digits make only limited use of ASCII digits.) Infrequently used in many of the remaining scripts. 

    Synonyms: Western digits, Latin digits, European digits
Which then links on to: https://www.unicode.org/glossary/#european_digits

European Digits. Forms of decimal digits first used in Europe and now used worldwide. Historically, these digits were derived from the Arabic digits; they are sometimes called “Arabic numerals,” but this nomenclature leads to confusion with the real Arabic-Indic digits. Also called "Western digits" and "Latin digits." See Terminology for Digits for additional information on terminology related to digits.
bckr
6 replies
6d16h

An approach we are trying is speakable IDs. Three characters for the type of thing, then four random words from a list of clean words with 5 characters:

xxx_flown-moons-deary-flake

kibwen
3 replies
6d15h

You'll want to be careful to consider homophones while also taking accents into account. E.g. if your dictionary contains "deary", it probably shouldn't also contain "dairy".

froddd
1 replies
6d12h

Or “dreary”, or “dear”, or “deer”. Unfortunate choice for the example!

bckr
0 replies
6d1h

Our approach is to only use 5-character words.

bckr
0 replies
6d15h

Great point!

Izkata
0 replies
6d1h

This introduces a new type of risk - if it can be interpreted as a sentence, "moons" as a verb isn't really a clean word.

atoav
6 replies
6d12h

KeepassXC (open source password manager application) uses colour to make passwords more readable. They use one color for each "class" of character: uppercase, lowercase, numbers, symbols, ...

This is a extremely simple idea, but especially with random passwords this helps a lot even if the font is already hyperlegible.

AndyMcConachie
3 replies
6d10h

As a colorblind person I hate this idea.

emsixteen
0 replies
6d10h

I advocate for accessibility and inclusivity constantly, but not implementing additional measures which are helpful to most due to some not being able to make use of that one aid is not the way to go. Direct your hate elsewhere.

atoav
0 replies
6d9h

Yeah, why? Because the additional information layer benefits some people? Depending one your type of color-blindness and the choice of colors this might even be an improvement that works for color-blind people.

We are not talking about encoding information only in color (= bad idea), we are talking about encoding information that is already present additionally in the color. And if your app has accessability settings (it should) this would be a thing that you could switch on and off.

Cthulhu_
0 replies
6d7h

It's an additional layer on top of other ones like using a non-ambiguous font, large size display, alternating background shades, character index numbers under each character, etc.

pbhjpbhj
0 replies
6d5h

You can also add a list of exclusions easily in the KeepassXC password generator. I do because when you type in a long password on a TV remote, or similar interface, and then realise the l1|I were confused it's soooo0 infuriating.

digging
0 replies
6d2h

Bitwarden also uses an unambiguous font with 3 colors (default for letters, blue for numbers, red for symbols); I love it. It baffles me when any password-focused software allows itself to render characters in an ambiguous font without any color differentiation.

shkkmo
5 replies
6d16h

This seems slightly flawed in that it completely removes all members of a similar set rather than normalizing to a single element per similar set.

Thus after normalization, '1lI' would become '111'. This allows you to add seven characters back to the author's code generation alphabet without re-introducing any ambiguity.

hananova
1 replies
6d16h

Why not include '1', but make it so '|Il1' all map to the same internal value? That way you have no ambiguity while minimizing alphabet reduction.

shkkmo
0 replies
6d16h

I'm not following how your suggestion. It seems like we're saying the same thing?

dools
1 replies
6d16h

It only reduces the ambiguity if everyone does the same and everyone knows that you've done it.

shkkmo
0 replies
6d16h

If you control the system for generating the codes and the system for verifying the codes (which is generally the case for these kinds of codes), then nobody needs to know you've done anything. It's the same normalizing to upper/lowercase characters when you parse a non-case sensitive code.

wccrawford
0 replies
6d7h

If you need more possible values, I agree.

However, if you don't need them, I would remove them so that the user doesn't have to spend any time wondering which character it is. Even though you're processing them all after they type them and fixing them, the user has spent time and effort that they didn't need to, just picking which one it is.

IIRC, I chose to keep them when I did something like this, but I don't think I thought to accept the others and convert them automatically. That project is sunset now, so it's not an issue.

geor9e
5 replies
6d15h

I had this exact situation at work when they shipped millions of devices with serial numbers, and didn't leave out any letter or number. Customers had so much trouble reading them accurately, I had to make a regex script that generated every possible typ0 permutation of what the customer said, and then it would list only matches from the factory database. From there, folks would try to correlate other info like dates to figure out what their real serial number probably was. It was a nightmare. Ironically several of the digits never changed, and some were just 0 1 or 2 to represent which factory made it, so there was no need for the entire character set in the first place. They seem to have been convinced we'd produce 8 quadrillion devices.

swores
3 replies
6d9h

They seem to have been convinced we'd produce 8 quadrillion devices.

While I'm not arguing that their decisions were wise nor that they shouldn't have been able to foresee and prevent the issues they caused you and your colleagues, I would add this one thought in response to the line quoted:

It's often either beneficial or at least considered beneficial to prevent business information leaking through serial numbers, the simplest example being that if you start labelling your products with 1, 2, 3.. and never deviate, then it's fairly easy to take a sample of not many serial numbers and estimate how high they go and therefore how many have been sold. Sometimes it can also be beneficial to make it harder to guess a valid serial number (eg it prevent customers from pretending to have a valid one to get a refund, or whatever).

Of course, even if you have these concerns and want to mitigate them, it doesn't prevent you from also taking steps to prevent difficulty reading the correct characters. If anything it should make them more aware of the potential issues you faced since it means someone is already actually thinking specifically about what system to use, as opposed to what likely happened in your case of someone spending 30 seconds going "we need serial numbers, using X digits means we'll never run out, job done".

ryandrake
0 replies
6d4h

if you start labelling your products with 1, 2, 3.. and never deviate, then it's fairly easy to take a sample of not many serial numbers and estimate how high they go and therefore how many have been sold.

Also known as the German Tank Problem[1].

https://en.wikipedia.org/wiki/German_tank_problem

bluenose69
0 replies
6d9h

I bought some software many years back. The serial number had 6 or so digits in it. At one point, I contacted the developer for some other purpose, and pointed out that I had made my purchase as soon as I heard about the product. He told me I was the first customer, and that he had decided to make up long serial numbers to avoid this counting problem.

PhilipRoman
0 replies
6d1h

It also works great as a checksum. See IBAN numbers for a great example - they are all multiples of 97 plus 1 which makes accidental typos much less possible.

gelstudios
0 replies
5d21h

Come to think of it, I wonder if this is why (or a factor in why) Apple serial numbers don't have any vowels in them.

I think only consonants and digits are used in device serial numbers.

jonplackett
4 replies
6d16h

How come neither v nor u are in the final set?

They’re not even mentioned and don’t look like a thing else, except maybe each other in some typefaces.

leovander
3 replies
6d16h

vv ~ w

jonplackett
2 replies
6d16h

Aha! Of course!

oh-the-irony
1 replies
5d21h

Works on words and special characters, too. I just skimmed the comments and had to scroll back up to verify that I had NOT just read "Anal Of course!"

jonplackett
0 replies
5d

Haha. You see what you want to see I guess ;)

re
3 replies
6d16h

See also Douglas Crockford's Base 32: https://www.crockford.com/base32.html

This takes the approach of allowing ambiguous characters by decoding them to the same value, and also considers the problem of accidental obscenities.

spintin
0 replies
5d21h

Interesting, I did different choices:

5-bit base-32 oi23456789 abcdefghkl mnpqrstuvw y

o = 0 i = 1 j, x and z removed.

I like that you can fit 6 characters in an 32-bit integer and still have to bits to spare... makes for compact usernames and network bandwidth.

benaubin
3 replies
6d

However, as the number of members in the set increases, the number of possible IDs increases exponentially. Case-sensitive: 53^8 = 62,259,690,411,361 Case-insensitive: 22^8 = 54,875,873,536

Nitpick, but isn't this polynomial to the members of the set?

pxx
2 replies
6d

aⁿ grows (a/b)ⁿ as quickly as bⁿ. The multiplicative difference still grows exponentially in n.

afiori
1 replies
5d23h

a^n is polynomial in a and exponential in n.

This is why longer password are more efficient than complex passwords: to gain the same security effect of doubling the password length you would need to square the alphabet

pxx
0 replies
5d23h

You've the proper definitions but are missing the context. An exponential with larger base still has an exponential multiplicative difference compared to an exponential with a smaller base.

We're comparing the growth rate of of two exponentials representing variable-length identifiers. We're not looking at a constant-length identifier (which is what you're doing with only looking at a^n). Notice the context of where exponential is used in the article: we are changing n from 5 to 8.

TacticalCoder
1 replies
5d21h

Related reading, from the font designer's side: “Oh, oh, zero!” by Charles Bigelow

I don't know. People tend to use the letter 'O' a lot. And people tend to use zero '0' a lot too.

Who gives a fuck about "Oh"? I mean, seriously, which percentage of articles, blog, PDFs, webpages, products etc. throughout the world have have 'O' and '0' that can be mistaken one for another? And which percentage have "Oh"?

When was the last time a user had to read a product ID over the phone and did misread big O / "Oh" for 0?

I don't even think there was a last time, because nobody is using "Oh" in identifiers.

While, on the other hand, it's perfectly fine to use a slashed-zero for zero, to be sure nobody mistakes it for the letter 'O'.

So basically: your link and TFA aren't that related.

svat
0 replies
5d21h

I'm not sure I understand your comment, because at first glance it seems to be making a distinction between "Oh" and "O", when Bigelow's article is using "Oh" as the name/vocalization of the letter 'O' (as should be clear from the very first sentence, even if not the title).

So, assuming (still not clear from your comment) that you do understand "oh" to mean the letter 'O', as intended, still your comment is surprising, because some of your own other comments talk about O/0, and the submitted post here too starts with that very example:

What are visually ambiguous characters?

O / 0 - The letter O and the number 0 can look very similar

So surely the article is relevant to (at least the first example of) the post? I admit it goes much deeper into just this one example, and only a bit into other examples like 1/l/I and 2/Z or 5/S, but still it's relevant and of value as a representative example I think.

loloquwowndueo
2 replies
6d7h

As long as we are pointing out mistakes in the article:

9qg6G8B2Z5SIl170O (ariel)

The name of the font is Arial, not Ariel. (No mermaids here, move along)

rob74
0 replies
6d6h

Yup... also, a screenshot (or using webfonts) would have probably worked better there. On Linux, most of the lines look the same...

jzwinck
2 replies
6d12h

If you use both upper and lower case, you are likely to eventually be surprised by some third party system or protocol that is case insensitive. I even found a commercial system which allowed users to choose IDs with case sensitivity (iD and id being distinct) but if you query it for one which does not exist they do case insensitive matching and return the wrong data.

When I reported this bug they said it was for convenience!

gajus
0 replies
6d11h

Thanks for the anecdote. I've included it in the article.

NetOpWibby
0 replies
6d4h

What a nuts system

jeroen
2 replies
6d9h

An alternative would be to print IDs using https://en.wikipedia.org/wiki/FE-Schrift, which was specifically designed to make normally similar characters to look different.

serial_dev
1 replies
6d9h

Good luck distinguishing 0 and O with that font in a random sequence of characters.

The type face you linked is not optimized for humans.

Its monospaced letters and numbers are slightly disproportionate to prevent easy modification and to improve machine readability.

It's a slightly different issue than what was described in the article (e.g it can't address the cases where IDs are written down).

namibj
0 replies
6d7h

AFAIK it's designed for non-automatic number plate reading in Germany.

iblaine
2 replies
6d17h

My OCD approves of this idea. Let’s also add, IDs cannot start with 0 or O.

xwolfi
0 replies
6d16h

I have one on my passport number. I still dont know which it is so I alternate. Hasn't been a problem to anyone yet when registering for planes and crossing borders: the picture is clear, it can be both lol

gajus
0 replies
6d17h

Both are visually ambiguous, so we are good.

geoffreysimpson
2 replies
6d2h

I did my PhD on (malicious) visual impersonation of domain names using many of the techniques described here. There are many references to other visual doppelganger techniques included in my paper here: https://par.nsf.gov/servlets/purl/10256904

My research focused solely on the .com domain name space, so our character set was limited.

mistrial9
1 replies
6d2h

that research paper only considers ascii characters in domain names?

geoffreysimpson
0 replies
5d19h

The paper only considers .com domain names, which have a limited character set support, discussed in RFC 1034 https://www.ietf.org/rfc/rfc1034.txt

Essentially A-Z, 0-9, and the - character, and domain names can not start with the dash character.

gajus
2 replies
6d15h

Out of curiosity, anyone knows why would this post be removed from the front page?

I was excited see that the post is getting engagement. I saw it in 3 position. Then checked an hour later and it is nowhere to be seen.

I am assuming this is some sort of opportunistic algorithm at play that gives a chance to a post, but removes it if it is not performing, but curious if anyone has more details.

lifthrasiir
1 replies
6d13h

HN submissions tend to be in the front page when they receive a bit of early votes within (roughly) the first hour, but they disappear rather quickly without further votes. Given that this submission was only 3 hours old when you posted this comment, it is quite expectable. (For the record it's now in the fifth place, suggesting that it has eventually received enough votes to stay in the front page later.)

gajus
0 replies
6d13h

Makes sense. I am just curious about the logic behind the algorithm, more than anything else.

criddell
2 replies
6d4h

I would be wary of excluding characters just because they look like other characters when combined

I wish the author would have said more about this. Why be wary?

digging
1 replies
6d2h

The implied reason is that it shortens the list of available IDs substantially.

criddell
0 replies
6d2h

That was my first thought, but the section on case sensitivity already discussed the impact of a reduced alphabet and pointed out adding more characters takes care of that quickly. So I assume the reason is something else.

NickHoff
2 replies
6d11h

I'm an American living in Germany. When I first arrived, the way Germans write the digit 1 surprised me. They write it with the upper hook thing very long, almost like a capital lambda (Λ), which sometimes makes 1 and A visually ambiguous. This isn't really a problem, just something funny about moving to a new country.

lynguist
0 replies
6d6h

I use 1 with a long hook except when I write binary numbers where I use just a | for 1.

I have some other context dependent characters/letters.

I write small z like that in normal writing, but as a mathematical variable I write it as ƶ. (To disambiguate from 2.)

I write small t like † in normal writing, but as a mathematical variable I write it as t. (To disambiguate from + (plus).)

I write q like that in normal writing, but as a mathematical variable I write it with a stroke, which does not display on the iPhone, a ꝗ, a bit similar to a ɋ. (To disambiguate from a (ɑ).)

It’s all about disambiguation, and sometimes having different letter shapes for isolated characters.

froh
0 replies
6d10h

my us colleagues regularly mistook the ones for sevens. that's btw why we cross the sevens, like tees and effs

clan
1 replies
6d11h

Years ago I worked support at an ISP who had usernames which was a 12 digit number. Most regular users and 1st level support do no know the NATO phonetic alphabet. An easy trick is trick is then to read back the number for confirmation but use another grouping of digits. Most users read 1 digit at a time so I would read back 2. One-Two becomes twelve. If they used 2 digits I would for ease use 3 rather then 1. This is a very easy way to do a fake "checksumming" regular people.

Tangent: All number started with 12 which in effect made them 10 digits. They worked together with a banking system and the bank folks thought 10 digits was not secure enough so they complied and added 12 in front of everything.

account42
0 replies
6d7h

Tangent: All number started with 12 which in effect made them 10 digits. They worked together with a banking system and the bank folks thought 10 digits was not secure enough so they complied and added 12 in front of everything.

Delicious malicious compliance - I like it.

EnigmaFlare
1 replies
6d10h

Doesn't help when you have to match the person's name and they have these characters in them. My name contains the letter "o" and I once had a lot of trouble getting something done at the bank. Multiple staff had to crowd around the computer to figure it out. Eventually somebody discovered that when I had opened my account, that o had been entered as a 0 for some reason and the font they were using, also for printing, showed them looking almost identical.

8organicbits
0 replies
6d4h

Similar story here, but a "Q" instead of an "O". The tail of the Q looked like dust on the screen. Somehow I haven't run into issues...

waltbosz
0 replies
6d1h

I thought this was good neat UX: on the Nintendo Switch I was entering a serial number for some DLC, and the on-screen keyboard had all the ambiguous character keys disabled, which means that the serial numbers are generated without any ambiguous characters.

I'm not sure if this UX was built into the OS, or just part of the game I was playing (Mario + Rabbids Sparks of Hope).

thih9
0 replies
6d9h

We could always use 1s and 0s, maybe group them in eights. Tongue in cheek, but I guess that would be a valid (even if extreme) solution.

rsync
0 replies
6d2h

“Oh By”[1], The universal shortener, has had protections for this built in from the very beginning.

Since the whole point is the ability to convey a message in the physical world end with chalk or pencil or whatever – we needed to make sure that characters were unambiguous.

So there are no zeros or ‘o’ characters or ones or ‘l’ characters… I think there were one or two other rules that govern this but I can’t think of them right now…

[1] https://0x.co

p0w3n3d
0 replies
6d4h

Letters l and I are visually indistinguishable when written in Arial.

octopoc
0 replies
5d5h

UuidExtensions[1], a C# library, has a way of generating / encoding IDs that has several useful properties:

1. IDs can be generated anywhere (client-side, server-side, etc.) and are still unique 2. IDs are ordered by time 3. IDs don't use L and O because those can be confused for other characters

I've found it very handy in my travels.

[1] https://github.com/stevesimmons/uuid7-csharp?tab=readme-ov-f...

nullc
0 replies
5d23h

Modern bitcoin addresses use a base-32 character set that leaves out some of the most ambiguous pairs and also permutes the address ordering so that the most visually similar remaining characters produce single bit errors which are better handled by the addresses error detecting (and potentially correcting) code.

https://github.com/bitcoin/bips/blob/master/bip-0173.mediawi...

nmstoker
0 replies
6d10h

A friend told me about how his work had some senior IT mgrs, who'd clearly been playing with their iPhones too long, decide that the firm shouldn't use Ids at all any more, and started pushing this without consulting the business, even though it was totally inappropriate given how widely they were needed... Caused mayhem and needles arguments!

myself248
0 replies
6d15h

Telephone equipment avoids the letters i and o in the alphabetical designation sequence for this reason, they look like numerals 1 and 0.

maggit
0 replies
6d10h

I have realized that there is a big design space here, as I recently did a write-up of my take, Id30. 30 bits of information encoded base 32 into six chars, eg bpv3uq, zvaec2 or rfmbyz, with some handling of ambiguous chars on decoding.

https://magnushoff.com/blog/id30/

kuon
0 replies
6d12h

I came up with base24[1] for this. There are some letter that can be ambiguous but I kept them to make it case insensitive.

[1]: https://www.kuon.ch/post/2020-02-27-base24/

kuboble
0 replies
6d2h

In handwriting there is a difference between European and American. In Europe we don't really have problem with 1 vs 7 or g vs 9. But our nines and ones do look like gs and sevens to Americans.

I heard an American making a joke that

"I have gg problems but European handwriting ain't 7 of them."

kibwen
0 replies
6d15h

Honestly, stuff like this is why I stick with (case-insensitive) hexadecimal for user-facing IDs. I find hex to be the sweet spot between "decently sized alphabet to keep ID lengths down" and "easy to read, communicate, and enter manually". It's also fairly resistant to accidentally generating IDs which will offend your users (unless your users are 1337-speaking time-traveling pre-teens from 2002 who are going to snicker at "b00b5"), which is a nice perk.

junga
0 replies
6d10h

Also do not use the same character repeated in a "long" sequence. I hate this with IBANs. Too often there's something like '000000' right in the middle of an IBAN and in case copy and paste is not possible I end up counting the number of zeroes at least thrice. Groups of four characters separated by spaces would help in this case but that's another topic.

jheriko
0 replies
6d7h

just use numbers and crossbar your 7s - problem gone.

if someone's writing is incompetent tell them. if you can't then they ruined it for themselves by being shit at writing the number 7.

jgbmlg
0 replies
6d16h

I guess I better stop using Bozos_Gismos

jadengeller
0 replies
6d13h

When it matters?

This applies to usernames too! It's easy to phish if platforms render capital I and lowercase l the same

grantmnz
0 replies
6d17h

This post has some overlap with work I did a while back on a "coupon code" system that is optimised for users taking a code printed on paper and entering it into a web form. A number of measures were employed to avoid/correct transcription errors.

Example, docs and links here: https://www.mclean.net.nz/cpan/couponcode/

eternityforest
0 replies
5d15h

I suppose the first line of defense is a QR code URL. I don't think anyone really enjoys typing long codes.

After that there's ECC. A few extra bytes for a reed-solomon code will fix a lot of issues.

dusted
0 replies
6d9h

This is why I only ever use xterm with the default bitmap font, it's literally the only one where I'm absolutely sure which character is which.

denimnerd42
0 replies
6d

my work id has a 0 and a O in it and it drives me crazy. i only remember it due to muscle memory on the keyboard

cryptonector
0 replies
6d2h

And TFA doesn't even mention Unicode, scripts, ASCII, Latin, nothing. As you can imagine it all gets much worse with Unicode (though through no fault of the Unicode Consortium). See Unicode TR#39 [0].

  [0] https://unicode.org/reports/tr39/

croes
0 replies
6d6h

If we include handwriting then lowercase n and u get be hard to distinguish if written in cursive

chiefalchemist
0 replies
6d8h

Four quick thoughts:

- We haven't solved this already? Who hasn't tried to read some code and couldn't tell O from 0 or l from 1, etc.?

- Aside from ambiguous characters you have to be aware of spelling and leet spelling. e.g., 53X, S3X, 5EX, etc.

- FFS stop with the 10+ character strings without spaces or hyphens. There's no reason for that.

- Not everyone has perfect vision. Ambiguous characters *and* less than perfect vision (often with not spaces / hyphens) is a mortal UX sin.

We've all been on the wrong end of these, and yet they are common enough - in 2024??!!? - that they need to be mentioned here.

ceving
0 replies
4d6h

Recently I came up with something similar: https://gist.github.com/ceving/cb68c8f2392255c5ed4ea65a6a199...

But I use a alphabet with 32 characters: abcdefghikmnopqrstuvwxyz23456789

I prefer 32 characters, because that makes it possible to pack 5 random bytes into a token with 8 characters.

btilly
0 replies
6d15h

This brings up memories.

One day while sick, I distracted myself from being sick by writing up a silly module to do arithmetic in arbitrary bases. And, because it was easy I stuck it on CPAN. https://metacpan.org/pod/Math::Fleximal is the module.

Of all of the silly things I'd done, I would have sworn that this is the one that should never generate a support request. But it did! Why? Well I'd included a demonstration of how to turn hexadecimal into an alphanumeric code. And someone had the bright idea of using the same thing to turn long numbers into readable codes!

My module worked, but I was still a bit flabbergasted that THIS wound up in production somewhere!!

branon
0 replies
6d5h

cl looks like d in some fonts or with bad kerning

bdjsiqoocwk
0 replies
6d9h

"visually unambiguous dictionary" to the author. It's well known that some people have a hard time distinguishing p/b/d/q.

arp242
0 replies
6d16h

It would be helpful to also add a screenshot for that font overview, because: https://imgur.com/a/h7Ks1Qj

And even on systems which do have these fonts, they may not always be exactly the same.

albert_e
0 replies
6d13h

I love conversations like this. These are arguably not the most cutting edge or exciting topics but hold a lot of significance and power to make life easier for humans (and machines too).

Some of these are areas of best practices that, when done really well -- people may not even notice it. That's an unfortunate fact of life that comes up often -- where the attention to detail and sincerity that people bring to the table often gets lumped under "obviously it should be that way, nothing special to see or applaud here".

TwoNineFive
0 replies
4d10h

On linux you can use Theodore Ts'o pwgen tool with the -B arg.

-B, --ambiguous Don't use characters that could be confused by the user when printed, such as 'l' and '1', or '0' or 'O'. This reduces the number of possible passwords significantly, and as such reduces the quality of the passwords. It may be useful for users who have bad vision, but in general use of this option is not recommended.

pwgen -B 32 oos9upoVieghuew7aeb3iev3jiequeiw acohthahpie7ae4aeboshahWiengieth yahW3qua3atheeP9jo4aiY3zeepoosh3 Noh4ooth4ohzeec4zug3ephoo7meich7 oozae9Eireix4Chaiboz9dofie4Xunof Mohj3uupee9ahngahh9on9sujee9ehae weimah9aiXeis3owaexei4uh3ibeecai PaeV7eeChaezahruNgeequoh7zok7thi eeJieyah4exiephaiPootei4dokoojoh fohhah3Eec3bah7aeR9iedah7Ve3ea7o vahs4eich4pheisoug9aiR3ohChoh7Ch eth9KaeLahdie7ahy9ohCiebohphuse9 ieye3udumaengai9ies7kae4geeque9T iesoh9eosohthoongaeroo4ehiishohY mee4ohjei4ohmika3taijei3Yaixosei ohWoo4eapid7miebee9pooKai3oofeis Eechook9quohp7se7ees9thaefahb9an aht3quooV4eiph9ap7aiw4wee7oi7eij ishep3weeh7Eero9ohdohth9MietooJ4 Kai9aich9Jee9Angeihee9eehei9esie toonaix4xe3Moob3zaic3Eesahs9ahy3 gaey9doozee7sei9quuPae3vohph4Huo ouYaephahcog3peiw7iecoo7eetheeph eeNgiezae7oongi7uena7eenaezuT7co tai9vuace9eV7Paih7ieN3Ahghiegh3v VaeteeMoobeixai9ingeyahYuzaipaht eeng7vei7pho4Ahpoa4kahgheethahz7 phas4theiThu4uqu7iCh3Aepha3shae3 ieRep3kaideeHeekiNgequieng9raeYo eegahsh9aizooshee9too9oojiox4Lei ovohcaePahM9thaebajuChoo3pipheej oowaimeiWahf4Neighoo3Eeyah3uvi4v vi4choiThei3eisohw4iP9huehohs4oe ukuchiethaquax3hieChouMahpooy4ee aegheeyeemeNeevehud9ohng3dai4jai eth3iedah9Tee3wohneisoo4aicuToos iecap7EeJ7raixiuseesiNou9ooT9fie ied3ooveingu7fu7dahdaaYe9tai7ien eijee7iKighaingaiChei7giemu4chi3 Thie3faih3ahshooRunohwoaghoh4Aev
ThePallas
0 replies
6d12h

A few years ago, I created a system that generates a serial number from a prefix and a 32-bit unsigned integer and fixes up this kind of input error when passing the serial.

https://github.com/pallas/gubbins

TacticalCoder
0 replies
5d20h

Another confusing thing is doing this:

    xxxxx-xxxxx-xxxxx-xxxxx
Instead of something like this:

    xxxxx-xx-xxxxx-xxx-xxxxx
Something could also be said about such scheme lacking the embedding of a checksum.

Here's an IBAN (bank account number) in the EU (which thankfully are using a checksum as part of the account number):

    LU29 0022 1712 5582 7000
      ^^
      ||
      two checkdigits
Also some companies think they're "smart" because they pick numbers like this:

    LU29 002 0000 0001 8000
Repeating the same digit, usually a zero, a shitload of time ain't smart. It's fucking dumb.