HN comments for: 10 > 64, in QR Codes

pimlottc

11 replies

2d4h

2024-04-02 13:44:25 UTC

This is really great, I didn't know you could switch encoding schemes within the same QR code. There's a nifty visualization tool [0] that shows how this can reduce QR code sizes. It can determine the optimal segmentation strategy for any string and display a color-code version with statistics. Very nice!

0: https://www.nayuki.io/page/optimal-text-segmentation-for-qr-...

nick238

7 replies

2d3h

2024-04-02 15:21:28 UTC

Seems like the lede was buried in the article; I know a bit about QR codes: there's different modes for alphanum, binary, kanji, etc, and error correcting capacity...but being able to switch character sets in the middle was new to me.

pclmulqdq

6 replies

2d2h

2024-04-02 15:53:59 UTC

I am not entirely sure why you would want to switch encodings for URLs, personally. If you use alphanumeric encoding and a URL in Base36, you are pretty much information-theoretically optimal.

daxterspeed

4 replies

2d2h

2024-04-02 16:19:32 UTC

The issue is that QR's alphanumeric segments are uppercase only, and while browsers will automatically lowercase the protocol and domain name, you'll have to either have all your paths be uppercase or automatically lowercase paths. On top of that when someone scans the code it will likely be presented with an uppercase URL (if it doesn't automatically open in a browser) and that should alert anyone that doesn't already know that uppercase domains are equivalent to lowercase domains.

Ideally QR codes would have had a segment to encode URIs more efficiently (73-82 characters depending on how the implementation decided to handle the "unreserved marks"), but that ship has long sailed.

pclmulqdq

3 replies

2d1h

2024-04-02 17:12:26 UTC

Many QR code readers will auto-lowercase URLs that are encoded in alphanumeric encoding. The rest will recognize uppercase URLs just fine. Alphanumeric encoding was basically made for URLs.

pimlottc

2 replies

2024-04-02 18:18:18 UTC

The QR alphanumeric input encoding does not include basic URL query string characters like '?' '&' '='

djbusby

1 replies

1d16h

2024-04-03 01:57:48 UTC

I've been putting URL in QR for like a decade, mixed case and query string included. How has it never been an issue?

IAmLiterallyAB

0 replies

1d16h

2024-04-03 02:19:31 UTC

Because you used bytes mode, not alphanumeric mode

planede

0 replies

2d2h

2024-04-02 16:06:18 UTC

you are pretty much information-theoretically optimal

base36 with alphanumeric mode encoding has around 6.38% overhead compared to base10's 0.34% overhead in numeric mode. So numeric mode gets you closer to optimal.

zamfi

2 replies

2024-04-02 18:12:27 UTC

Speaking of visualization...that last figure in this post is super interesting in part because you can actually see some of the redundancy in the base64 encoding on the left, in the patterns of vertical lines.

In general, better compression means output that looks more like "randomness"—any redundancy implies there was room for more compression—and that figure makes this quite clear visually!

klodolph

1 replies

1d19h

2024-04-02 23:14:32 UTC

That’s undoubtedly some redundancy in the underlying data, not in the encoding itself.

dbaupp

0 replies

1d7h

2024-04-03 10:57:01 UTC

Yes, the data is the bytes 00, 01, …, FF repeating, and that pattern is highly visible with a power-of-2 encodings, but not visible with other bases (for similar reasons that 0.1 as a (binary) float doesn’t behave as people expect).

Karellen

8 replies

2d4h

2024-04-02 13:48:51 UTC

I'm not that familiar with QR codes. Anyone know how base16/hexadecimal encoding with 0-9A-F fares in comparison? It seems like an obvious encoding to test, especially for simplicity of implementation compared to base64 and base10, and an odd one to miss for comparison?

komlan

2 replies

2d4h

2024-04-02 13:55:25 UTC

Hex is worse, see here [1] for UUIDs

[1] https://news.ycombinator.com/item?id=39094251

Zamicol

1 replies

2024-04-03 18:13:53 UTC

I'm not confident of the math there.

https://i.imgur.com/cAVbqka.png

Because of quirks, in edge cases decimal is more efficient, but overall alphanumeric is better in QR code.

komlan

0 replies

2024-04-03 18:22:47 UTC

Ah, I was assuming numeric data rendered as hex, like UUIDs. Decimal works wonders for those, because the numeric mode of QR codes is the most efficient.

dbaupp

2 replies

1d22h

2024-04-02 20:15:45 UTC

Ah, it is a good point that it might be worth comparing to, but it is far worse.

Abstractly, it requires approximately log(45)/log(16) output bits per input bit, an overhead of 37%.

Making this more concrete: each input byte is encoded as two hex digits, and two hex digits have to be encoded as two Alphanumeric characters. It thus takes 11 bits in the QR code bit stream to store 8 bits of input.

dbaupp

1 replies

1d19h

2024-04-02 22:32:58 UTC

(I've added an analysis of this and other bases to the article: https://huonw.github.io/blog/2024/03/qr-base10-base64/#fn:ot...)

Zamicol

0 replies

1d1h

2024-04-03 16:30:56 UTC

RFC 3986 says that * is a sub-delim. It cannot be assumed to be URI safe.

A base 38 alphabet is the maximal possible URI unreserved alphabet.

pimlottc

0 replies

2d4h

2024-04-02 13:56:48 UTC

The QR standard does not have a specific encoding mode [0] for hexademical, it would have to use alphanumeric. Since you'd only be using 16 out of 35 possible characters, it would be much less efficient.

0: https://en.wikipedia.org/wiki/QR_code#Information_capacity

Karliss

0 replies

2d4h

2024-04-02 14:17:24 UTC

Most compact QR encoding capable of representing hex symbols is alphanumeric mode which requires 5.5 bits per character. Which means the output will be 5.5/4 = 1.375 times longer than encoded binary data or 37.5% overhead. That's even worse than 8/6 =1.33 you get for doing base64 encoding on top of byte mode.

PanMan

6 replies

2d3h

2024-04-02 15:04:55 UTC

Cool article. What I've wondered, and the article doesn't touch on: In "normal" usage (not damaged QR codes), what's the best error correction to use, with a fixed output size (eg a sticker)? Using a higher level, results in more bytes, and thus a larger QR, which, when printed, results in smaller details. Is it better to have a low error correction, resulting in large blobs, or to have higher error correction, resulting in smaller details, which I guess will be harder to scan, but more room for correction?

dbaupp

3 replies

1d22h

2024-04-02 20:09:05 UTC

Yeah, I had had the same question! One of my earlier articles experiments with this: https://huonw.github.io/blog/2021/09/qr-error-correction/

Figure 8 and its surrounding section are the undamaged case.

derf_

2 replies

1d9h

2024-04-03 08:48:49 UTC

That was a nice read.

> I also tested only one background image, so the behaviour may differ greatly with QR codes contained in different surrounds.

This likely does not matter much. It could theoretically affect binarization near the edges of the code (near module boundaries, depending on how you did the resizing), but in practice as long as the code itself is high-contrast, this is unlikely. The more usual issue is that real images often do not have a proper quiet zone around the code, but that is mostly going to be irrelevant for what you are trying to test here.

> The QR codes are generated to be perfectly rectangular and aligned to the image pixel grid, which is unlikely to happen in the real world.

This is a much bigger deal. A large source of decoding errors for larger versions (for a fixed "field of view") is due to alignment / sampling issues. A lot of work goes into trying to find where the code is in the image and identify the grid pattern, and that is just inherently less reliable for larger versions, particularly if there is projective distortion (so the module size is not constant). The periodic alignment patterns try to keep the number of parameters that can be used to fit this grid roughly constant relative to the number of modules in the grid, but locating those patterns is itself error-prone and subject to false positives (they are not nearly as unique-looking as finder patterns), and the initial global transform estimate has to get pretty close for them to work. I am actually happy that damaging these was not causing you more trouble. This is definitely somewhere that ZBar can be improved. It currently does not use the timing patterns at all, for example. I'm not actually aware of an open-source QR decoder that does.

(I'm the original author of ZBar's QR decoder)

dbaupp

1 replies

1d7h

2024-04-03 10:59:02 UTC

Thanks for the kind words and the insight!

derf_

0 replies

1d5h

2024-04-03 13:25:53 UTC

A slightly more common way to express "field of view" is "module pitch", measured in pixels between adjacent module centers. I went back and tried to express the numbers from Figured 6 as a module pitch, and I think it works out to around 1.6 pixels / module. IIRC, the QR code standard recommends a module pitch of at least 4 pixels. So it is nice that ZBar is able to do around 2.5x better before running into issues (a margin that is a lot bigger than the gains from higher EC levels).

In theory there could still be room for improvement. Right now ZBar estimates finder and alignment pattern locations to quarter-pel precision, but rounds each module location to the nearest pixel so it can sample a binarized version of the image to decide the value of that module. At the extreme limits of small module pitch this effectively turns the resampling filter in whatever you are using to resize your QR code image into a nearest neighbor filter. You can see why that would start to cause issues. Imagine a version 7 code (45x45 modules) sized to be 80x80 pixels. Most of your columns will be 2 pixels wide, but with binarization, somewhere in there you have to have 10 columns that are only 1 pixel wide. Good luck lining up your grid to hit all of them perfectly (without looking at the timing pattern, which would likely only help in the perfectly axis-aligned case). Some kind of sub-pixel integration of the original image before thresholding to decide each module value could probably do better. That would make decoding a lot more computationally expensive, though.

master-lincoln

1 replies

2d3h

2024-04-02 15:19:05 UTC

I guess this is a trade off that depends on your use case: from which distance does the qr code need to be scannable, what cameras do we expect to be used for scanning, how likely is what kind of damage to parts of the qr code, ...

spamatica

0 replies

2d2h

2024-04-02 16:12:39 UTC

Indeed. The local bus transit has a digital ticket system with QR codes for tickets. I haven't actually tried decoding the codes but just seeing them and interacting with them I can tell they have gone WAY overboard with either the amount of data they try to fit or the amount of error correction. Probably both. They are nearly unscannable due to their size and all the bus drivers just wave you along if you don't manage to scan it.

sfmz

5 replies

2d4h

2024-04-02 14:01:45 UTC

I had an idea to embed a webpage in a dataurl and convert that to a QR code; the website would only exist on if you snapped the QR Code. I was dreaming of code-golf, demoscene, nft and weird business card applications, but the web-browsers ruined my fun because they won't display dataURL unless you manually copy/paste it into the URL bar.

https://issues.chromium.org/issues/40502904

ptramo

4 replies

2d4h

2024-04-02 14:08:53 UTC

Good idea. Built https://srv.us/d that does (edited):

    <html><body><script>document.body.innerHTML = decodeURI(window.location.hash.substring(1))</script></body></html>

So you can point to https://srv.us/d#<h1>Demo</h1>

sfmz

3 replies

2d3h

2024-04-02 14:46:21 UTC

Almost, page doesn't have a body yet, so you get a null ref. I thought of spinning it up or hosting on ipfs, but it still won't live forever, somebody will lose interest and stop paying the DNS costs or similar.

ptramo

2 replies

2d3h

2024-04-02 14:48:52 UTC

Fixed, thanks! Yes, longevity is an issue with anything online. On the other hand, in this case recovery is not a huge issue for anybody technical enough.

sfmz

1 replies

1d18h

2024-04-02 23:39:45 UTC

I was thinking at first it would be better if it takes the entire webpage as a datauri instead of the raw html like this basic business card template:

data:text/html;base64,PCFET0NUWVBFIGh0bWw+DQo8aHRtbCBsYW5nPSJlbiI+DQogIDxoZWFkPg0KICAgIDxtZXRhIGNoYXJzZXQ9IlVURi04IiAvPg0KICAgIDxtZXRhIG5hbWU9InZpZXdwb3J0IiBjb250ZW50PSJ3aWR0aD1kZXZpY2Utd2lkdGgsIGluaXRpYWwtc2NhbGU9MS4wIiAvPg0KICAgIDx0aXRsZT5GbGV4IENhcmQ8L3RpdGxlPg0KICAgIDxzdHlsZT4NCiAgICAgIGJvZHkgew0KICAgICAgICBmb250LWZhbWlseTogQXJpYWwsIHNhbnMtc2VyaWY7DQogICAgICAgIG1hcmdpbjogMDsNCiAgICAgICAgcGFkZGluZzogMDsNCiAgICAgICAgYmFja2dyb3VuZC1jb2xvcjogI2YwZjBmMDsNCiAgICAgICAgZGlzcGxheTogZmxleDsNCiAgICAgICAganVzdGlmeS1jb250ZW50OiBjZW50ZXI7DQogICAgICAgIGFsaWduLWl0ZW1zOiBjZW50ZXI7DQogICAgICAgIGhlaWdodDogMTAwdmg7DQogICAgICB9DQogICAgICAuY2FyZCB7DQogICAgICAgIGJhY2tncm91bmQtY29sb3I6ICNmZmY7DQogICAgICAgIGJvcmRlci1yYWRpdXM6IDEwcHg7DQogICAgICAgIGJveC1zaGFkb3c6IDAgMCAxMHB4IHJnYmEoMCwgMCwgMCwgMC4xKTsNCiAgICAgICAgcGFkZGluZzogMjBweDsNCiAgICAgICAgbWF4LXdpZHRoOiAzMDBweDsNCiAgICAgICAgdGV4dC1hbGlnbjogY2VudGVyOw0KICAgICAgfQ0KICAgICAgLmNhcmQgaW1nIHsNCiAgICAgICAgYm9yZGVyLXJhZGl1czogNTAlOw0KICAgICAgICBtYXgtd2lkdGg6IDE1MHB4Ow0KICAgICAgICBtYXJnaW4tYm90dG9tOiAyMHB4Ow0KICAgICAgfQ0KICAgICAgLmNhcmQgaDEgew0KICAgICAgICBtYXJnaW4tYm90dG9tOiAxMHB4Ow0KICAgICAgfQ0KICAgICAgLmNhcmQgcCB7DQogICAgICAgIGNvbG9yOiAjNjY2Ow0KICAgICAgfQ0KICAgIDwvc3R5bGU+DQogIDwvaGVhZD4NCiAgPGJvZHk+DQogICAgPGRpdiBjbGFzcz0iY2FyZCI+DQogICAgICA8aDEgc3R5bGU9ImZvbnQtdmFyaWFudDogc21hbGwtY2FwcyI+RWV5b3JlPC9oMT4NCiAgICAgIDxwPkltYWdpbmVlcjwvcD4NCiAgICAgIDxwPk5pbmJvIGZsb2F0aW5nIGNpdHksIHByZWZlY3R1cmUgOTwvcD4NCiAgICAgIDxwPigxMjMpIDQ1Ni03ODkwPC9wPg0KICAgICAgPHA+am9obkBleGFtcGxlLmNvbTwvcD4NCiAgICA8L2Rpdj4NCiAgPC9ib2R5Pg0KPC9odG1sPg0K

but that's already about half the max bytes for a QR Code, so maybe its not really that interesting.

Gormo

0 replies

1d6h

2024-04-03 12:21:58 UTC

There's no need to base64 encode HTML, since it's already plaintext. You can just omit the encoding declaration in the data URL and include the raw HTML, e.g. 'data:text/html,<html><body><h1>Hello, world!</h1></body></html>'. That should save a bit of overhead.

pclmulqdq

4 replies

2d2h

2024-04-02 15:52:01 UTC

I'm not sure if anyone uses Base36 any more (or its more obscure sister, Base32), but it uses [0-9, A-Z] as its alphabet. It is URL safe and also smaller than base 10 in character count for each number, and is the smallest standard URL-safe encoding that works with alphanumeric QR codes.

I sort of assumed this was common knowledge, but I guess not.

91bananas

1 replies

2d1h

2024-04-02 16:31:06 UTC

Tooling is probably what dictates this more than anything. atob() is everywhere.

knallfrosch

0 replies

1d19h

2024-04-02 22:36:29 UTC

Yeah, I don't get it. Assume I have a standard URL with query params, the web browser doesn't understand the decimal encoding – right?

Let's assume... this: https://news.ycombinator.com/reply?id=39907672&goto=item%3Fi...

The special encoding is just about sending data to the backend?

jbaber

0 replies

1d8h

2024-04-03 10:06:50 UTC

I independently discovered base36 for a personal project recently and was very happy to have an explanation for why Python's base conversion goes up to 36.

dbaupp

0 replies

1d19h

2024-04-02 22:32:31 UTC

I implicitly ignored encoding schemes like base 36 and 32 (and 16, referenced elsewhere in the thread) because they're not as good as the schemes referenced in the post. The best you can get that's fully URL safe with Alphanumeric is a hypothetical base 39, referenced in a footnote, and only using 39 of the 45 possible characters has 3.9% overhead (even ignoring the 50% overhead of the https://www.rfc-editor.org/rfc/rfc9285.html encoding).

I've added an analysis of many more bases to the article: https://huonw.github.io/blog/2024/03/qr-base10-base64/#fn:ot...

londons_explore

4 replies

2d4h

2024-04-02 13:45:23 UTC

Or... Don't encode data in the URL at all. If your data isn't secret or per-user, have it go to https://yoursite.com/gh. If it is security sensitive, go to https://yoursite.com/Qhm4Qr55mS

2 alphanumerics (=4000 links) is plenty to encode a link to all the major pages of your website/service you may want to advertise. 10 alphanumerics (=10^18) is plenty that even if every person in the world had a QR code, nobody could guess one before hitting your rate limiter.

The user experience gained by fast reliable scanning is far greater than that enabled by slightly improved offline support (offline functionality requires that the user already has your app installed, and in that case, you could preload any codes that user had access to).

pimlottc

0 replies

2d4h

2024-04-02 13:48:22 UTC

As the article mentions, they need to include the data so that the app could work offline, at least to some degree.

nneonneo

0 replies

1d21h

2024-04-02 21:16:45 UTC

In the case of vaccine cards, which the OP uses as the case study, it's better to have the entire card offline for both privacy and offline use purposes.

chrisfinazzo

0 replies

2d4h

2024-04-02 14:23:20 UTC

To play devil’s advocate for a moment…

Wouldn’t this break Deep/Universal links which send a user directly to a specific location within an app?

I get that there are potential security/privacy concerns, but if you are in full control of URL schemes, isn’t that purpose of this feature?

Zamicol

0 replies

2024-04-03 18:19:42 UTC

Or just go directly to 2^256 and have enough links for every atom in the observable universe.

More importantly, it's enough links that at the Landauer limit a collision can't happen without consuming ~300,000 solar systems of energy, vastly beyond human technological ability. With this property, each link can also be considered private.

zygentoma

3 replies

2d5h

2024-04-02 13:10:37 UTC

    qrencode -t UTF8 https://www.service.nsw.gov.au/campaign/service-nsw-mobile-app?data=eyJ0IjoiY292aWQxOV9idXNpbmVzcyIsImJpZCI6IjEyMTMyMSIsImJuYW1lIjoiVGVzdCBOU1cgR292ZXJubWVudCBRUiBjb2RlIiwiYmFkZHJlc3MiOiJCdXNpbmVzcyBhZGRyZXNzIGdvZXMgaGVyZSAifQ==

    qrencode -t UTF8 https://www.service.nsw.gov.au/campaign/service-nsw-mobile-app?data=072685680885510189821994892577900638215789419258463239488533499278955911240512279111633336286737089008384293066931974311305533337894591404330656702603998035920596585517131555967430155259257402711671699276432408209151397638174974409842883898456527289026013404155725275860173673194594939

The latter one is actually smaller. TIL

pimlottc

2 replies

2d5h

2024-04-02 13:24:55 UTC

Note that the URL also has to be encoded in two segments, so that the decimal part can use a more efficient QR encoding than the alphanumeric base URL.

I'm not sure that qrencode CLI tool will automatically do this for you.

In a URL, the rest of the URL is not purely numeric, so actually seeing the benefits of this encoding requires using two segments:

* one with the “boring” bits of the URL at the start, likely using the Binary mode

* one with the big blob of base10 data, using the Numeric mode

lifthrasiir

1 replies

2d4h

2024-04-02 13:42:34 UTC

I'm not sure that qrencode CLI tool will automatically do this for you.

If I'm looking at the correct repository, it does [1].

[1] https://github.com/fukuchi/libqrencode/blob/master/split.c

pimlottc

0 replies

2d4h

2024-04-02 13:46:33 UTC

You're right! I briefly searched the code but I missed seeing that since they don't use the term "segment".

planede

2 replies

2d4h

2024-04-02 13:58:51 UTC

base10 can be awkward to work with for large data, one can also consider:

base8 in numeric mode: 8 input bits -> 3 digits -> 10 output bits, 25% overhead

base32 in alphanumeric mode: 5 input bits -> 1 character -> 5.5 output bits, 10% overhead

I would prefer base32 out of these too, but it's interesting that even base8 beats base64 here.

dbaupp

1 replies

1d19h

2024-04-02 22:36:25 UTC

Good point! I've added an analysis of this and other bases to https://huonw.github.io/blog/2024/03/qr-base10-base64/#fn:ot...

planede

0 replies

1d10h

2024-04-03 08:20:27 UTC

Doh! I don't know why I went with per-byte encoding for octal. Yeah, you can do 11.1% overhead with base8, not much worse than base32, surprisingly.

planede

2 replies

2d3h

2024-04-02 14:38:18 UTC

For dealing with larger data I would probably split the input bits into 63 bit chunks, which can be encoded in 19 decimal digits. 63 input bits turn into 19 digits which in turn is encoded in 63.33... output bits on average. This has an overhead of 0.53% instead of 0.34% of pure base10, which I think is acceptable. But then you don't have to bring bignum libraries into the picture, as each chunk fits into a 64bit integral type.

64bit chunks are a little bit worse, with 4.16% overhead, so it might be worth dealing with the little complexity of 63 bit chunks.

I would also output the decimal digits in little-endian order.

edit: If you are willing to go for larger chunks then 93bit chunks would be my next candidate, there the overhead is 0.36%, barely more than pure base10's 0.34%. I don't think it's worth going any higher.

__s

1 replies

2024-04-02 17:47:39 UTC

127 bit integers get 38 digits, which lines up well with 128 bit integers

leni536

0 replies

2024-04-02 17:59:36 UTC

No, they get 39 decimal digits (1.7e38 is 39 decimal digits). 127bit chunks would get you 2.36% overhead, which is not bad. However at 93bit chunks can (barely) be encoded in 28 digits (2^93 ~= 9.9e27) and it's more efficient at around 0.36% overhead. So once you have 128 bit arithmetic, it's still not worth using all or most of those bits per chunk, 93bit chunks is the most efficient under 128 bits.

MrBuddyCasino

2 replies

2d4h

2024-04-02 13:41:44 UTC

We have a similar problem at work right now, but due to different constraints we've settled on Base85. Slightly denser than Base64, but still just plain old printable ASCII characters and the following characters are still "free" so one can use them as field delimiters in a CSV-style format:

    "',/[]\

Incidentally, this also makes them JSON-Safe.

Base94 uses all printable characters, and Base122 uses both printable characters and whitespace.

UUIDs encoded in various alphabets:

    len     algo               value
    24 Base64 padded   wScmB8cVS/K05Wk+nORR8Q==
    22 Base64 unpadded osnQ3DUDTDuUQBc9mBRYFw
    20 Base85          rHoLuTk%W0fgpY+`c>xc 
    20 Base94          d(+H"Q/hP}i}d9<KeAt)%
    18 Base122         @#FoALt`92vSt@

Zamicol

1 replies

2024-04-03 18:16:48 UTC

I'm not following as Base85 isn't JSON safe. For example, { and } carry meaning in JSON.

MrBuddyCasino

0 replies

11h7m

2024-04-04 07:19:01 UTC

I should have specified what I meant by that - if you put "'\ in json strings you have to escape them.

FullyFunctional

2 replies

2d1h

2024-04-02 16:57:20 UTC

This is fascinating, but I was curious about the last two QR codes. The left one is scannable on my iPhone (iOS 17.4.1) leading to http://example.com/AAE..._w8fL whereas the one on the right gets only http://example.com (both Safari and Firefox). Is this an iOS URL length limitation?

dbaupp

0 replies

1d19h

2024-04-02 22:37:15 UTC

Good catch! I should've tested. I've added a paragraph to https://huonw.github.io/blog/2024/03/qr-base10-base64/#extre... about this.

capitainenemo

0 replies

1d16h

2024-04-03 01:37:23 UTC

You probably know this, but Firefox and Chrome don't have the freedom to run their own browser engines on iOS and are little more than browser skins around the webkit core, so listing multiple browsers for a web issue on iOS usually doesn't mean much.

pimlottc

1 replies

2d4h

2024-04-02 14:04:18 UTC

In an ideal world, the QR standard would include a specific URL encoding scheme that exactly matches the URL-safe character set. But I suppose there's no real practical way to make big changes to the QR spec now, what with all the thousands of implementations in the wild.

adrianmonk

0 replies

1d21h

2024-04-02 21:10:47 UTC

I'd rather they had just done entropy coding. It's good at using a smaller number of bits per character to represent a subset of characters, which is what they're trying to do. But it's more general, so you wouldn't be limited to only those characters.

Huffman is probably simple enough. The typical approach is adaptive Huffman, which doesn't compress the initial data very well since it needs to adjust to actual character frequencies. So that wouldn't work well for QR codes since they're short.

But you can start adaptive Huffman with a pre-agreed initial tree (as static Huffman does), which would give good compression from the start. There could be several standard pre-agreed Huffman trees, and instead of using bits in the QR code to select a character set, those bits could select one of a few pre-agreed initial Huffman trees.

bytecodes

1 replies

1d19h

2024-04-02 22:48:39 UTC

1. Pretty neat to switch encoding in the middle of the URL. It does look like it works and it does look like a better encoding. This is cool.

2. I'd have called this base-1000. It's using 3-digit numbers encoded into 10 bits. Base64 doesn't encode into 64 bits, it uses 64 characters encoded into 6 bits. And this encoding uses 000 to 999, encoded into 10 bits. But that messes up the title when you compare apples to apples, 1000 > 64 is just obvious and true.

dbaupp

0 replies

1d19h

2024-04-02 23:09:30 UTC

The base 10 is referring to conversion of bytes into a long decimal (base 10) integer, not that it's being stored in chunks of 10 bits.

But yes, you're right, it would be reasonable to think of this as encoding the bytes in base 1000, where each "digit" just happens to be shown to humans as 3 digits.

buildsjets

1 replies

2d3h

2024-04-02 14:54:46 UTC

I need to re-run the math based on this info, but a while back, I wanted to figure out the maximum density of QR codes that could be reliably printed on a sheet of plain paper with a laser printer, then optically scanned and re-digitized. I recall the answer was about the same as a double-density 5.25" floppy disk, which is 320kb.

Zamicol

0 replies

1d2h

2024-04-03 16:10:23 UTC

https://i.imgur.com/cAVbqka.png

Of course, pure binary (byte) encoding is best, but many systems have the constraint of text characters or non-control characters. With that constraint, alphanumeric encoding is best.

https://zamicol.com/assets/11580064.pdf

YoshiRulz

1 replies

2d4h

2024-04-02 13:58:41 UTC

Thanks to the author's previous post, I instantly recognised the `eyJ` prefix as the start of a JSON object!

JadeNB

0 replies

2d3h

2024-04-02 15:02:54 UTC

The previous post: [Mechanical sympathy for QR codes: making NSW check-in better](https://huonw.github.io/blog/2021/10/nsw-covid-qr).

JadeNB

1 replies

2d3h

2024-04-02 15:01:36 UTC

Why does the title have a "'" that isn't in the document ("'10 > 64, in QR Codes" versus "10 > 64, in QR Codes")?

dbaupp

0 replies

1d20h

2024-04-02 21:55:58 UTC

Hacker News strips leading digits targeted at "listicles" (e.g. "10 ways to fizz buzz" -> "Ways to fizz buzz"), so tricks are required if the digits are actually important.

richardkiss

0 replies

1d17h

2024-04-03 01:03:42 UTC

I discovered the same thing when I was writing a tool the transmit data to a radio-free (no wifi or Bluetooth) airgapped computer and created a de-facto standard called "qrint". The comment in this file has enough text for a blog post.

https://github.com/Chia-Network/hsms/blob/main/hsms/util/qri...

Anyone who wants to use this, feel free.

ptramo

0 replies

2d4h

2024-04-02 13:54:06 UTC

https://zat.is uses uppercase base32 for URL checksums, as alphanumeric QR codes can contain 0–9, A–Z (upper-case only), space, $, %, *, +, -, ., /, :. Overhead is only 10% (5.5 bits / 5 bits). All links fit in a 33⨯33px image, margins included, so little point in improving on that for URLs so short. The tradeoff is that the checksum to URL mapping is stored in a backend and networking is required to learn anything about the real URL.

komlan

0 replies

3d2h

2024-04-01 15:28:42 UTC

This is particularly useful for numeric data that is usually displayed in hex, like UUIDs [1]

I used this for digital QR code tickets [2], and it made the codes so much easier to scan, even with bad lighting.

[1] https://news.ycombinator.com/item?id=39094251

[2] https://workspace.google.com/marketplace/app/qr_code_ticket_...

ingen0s

0 replies

2d3h

2024-04-02 14:49:12 UTC

you had me at hello

chpatrick

0 replies

2d2h

2024-04-02 15:33:55 UTC

Didn't the EU covid passport decide to use the text encoding mode because it's the only one that scanners supported reliably?

Zamicol

0 replies

1d2h

2024-04-03 15:35:07 UTC

I built an open source tool to specifically work on this problem: https://convert.zamicol.com

We know only two open source JS projects that even support alphanumeric, Nayuki and Paul’s. https://github.com/Cyphrme/QRGenJS. We have it hosted here: https://cyphr.me/qrgen

I’ve also done a lot of work on this problem: https://image-ppubs.uspto.gov/dirsearch-public/print/downloa...

Also, regarding alphanumeric, RFC 3986 states that:

An implementation should accept uppercase letters as equivalent to lowercase in scheme names (e.g., allow “HTTP” as well as “http”)

JoshMandel

0 replies

1d18h

2024-04-02 23:33:52 UTC

We used essentially this technique in the SMART Health Cards specification for vaccine and lab result QRs.

https://spec.smarthealth.cards/#encoding-qrs

It's well supported by scanners but can create unwieldy values for users to copy/paste.

For more recent work with dynamic content (and the assumption that a web server is involved in the flow), we're just limiting the payload size and using ordinary byte mode (https://docs.smarthealthit.org/smart-health-links/spec)