HN comments for: WebP: The WebPage Compression Format

BugsJustFindMe

52 replies

23h59m

2024-09-07 18:31:52 UTC

the longest post on my site, takes 92 KiB instead of 37 KiB. This amounts to an unnecessary 2.5x increase in load time

Sure, if you ignore latency. In reality it's an unnecessary 0.001% increase in load time because that size increase isn't enough to matter vs the round trip time. And the time you save transmitting 55 fewer KiB is probably less than the time lost to decompression. :p

While fun, I would expect this specific scenario to actually be worse for the user experience not better. Speed will be a complete wash and compatibility will be worse.

edflsafoiewq

18 replies

23h2m

2024-09-07 19:29:17 UTC

Well, that, and there's an 850K Symbols-2048-em%20Nerd%20Font%20Complete.woff2 file that sort of drowns out the difference, at least if it's not in cache.

lxgr

6 replies

21h18m

2024-09-07 21:12:52 UTC

Now I got curious, and there's also a 400 kB CSS file to go with it: https://purplesyringa.moe/fonts/webfont.css

I'm not up to date on web/font development – does anybody know what that does?

bobbylarrybobby

5 replies

21h7m

2024-09-07 21:24:24 UTC

It adds unicode characters before elements with the given class. Then it's up to the font to display those Unicode characters — in this case, based on the class names, one can infer that the font assigns an icon to each character used.

lxgr

4 replies

21h0m

2024-09-07 21:31:27 UTC

That makes sense, thank you!

So the purpose is effectively to have human-readable CSS class names to refer to given glyphs in the font, rather than having stray private use Unicode characters in the HTML?

lobsterthief

3 replies

20h49m

2024-09-07 21:41:54 UTC

Yep

This is a reasonable approach if you have a large number of icons across large parts of the site, but you should always compile the CSS/icon set down to only those used.

If only a few icons, and the icons are small, then inlining the SVG is a better option. But if you have too many SVGs directly embedded on the site, the page size itself will suffer.

As always with website optimization, whether something is a good option always “depends”.

yencabulator

0 replies

3h45m

2024-09-08 14:46:11 UTC

More reasonable than this class+CSS would be e.g. a React/static-website-template/etc custom element that outputs the correct glyph. The output doesn't need to contain this indirection, and all of the possibilities.

lifthrasiir

0 replies

14h29m

2024-09-08 04:02:37 UTC

I think at least some recent tools will produce ligatures to turn a plain text into an icon to avoid this issue.

alpaca128

0 replies

6h18m

2024-09-08 12:12:53 UTC

Another detail is that this feature breaks and makes some sites nearly unusable if the browser is set to ignore a website's custom fonts.

est

6 replies

15h7m

2024-09-08 03:23:40 UTC

seriously, why can't modern browsers turn off features like remote fonts, webrtc, etc. in settings. I hate when reading a bit then the font changes. Not to say fingerprinting risks.

BugsJustFindMe

4 replies

13h37m

2024-09-08 04:54:24 UTC

You can, and then when someone uses an icon font instead of graphics their page breaks.

Dalewyn

2 replies

13h12m

2024-09-08 05:19:07 UTC

Skill issue.

Pictures are pictures, text is text. <img> tag exists for a reason.

lenkite

1 replies

10h21m

2024-09-08 08:09:45 UTC

Its a convenience packaging issue. An icon font is simply more convenient to handle. <img> tags for a hundred images requires more work.

Icon fonts are used all over the place - look at the terminal nowadays. Most TUI's require an icon font to be installed.

Dalewyn

0 replies

44m

2024-09-08 17:47:00 UTC

<img> tags for a hundred images requires more work.

So it's a skill issue.

est

0 replies

6h51m

2024-09-08 11:40:21 UTC

I believe many Web APIs can be enabled/disabled on a website basis, no?

collinmanderson

0 replies

5h29m

2024-09-08 13:01:46 UTC

iPhone lock down mode turns off remote fonts (including breaking font icons as sibling says)

marcellus23

3 replies

21h2m

2024-09-07 21:28:52 UTC

Wow, yeah. That kind of discredits the blog author a bit.

purplesyringa

1 replies

10h21m

2024-09-08 08:10:37 UTC

It's cached. Not ideal, sure, and I'll get rid of that bloat someday, but that file is not mission-critical and the cost is amortized between visits. I admit my fault though.

marcellus23

0 replies

3h7m

2024-09-08 15:24:10 UTC

Definitely fair and I was a bit harsh. It just seemed a bit nonsensical to go through such efforts to get rid of a few kilobytes while serving a massive font file. But I also understand that it's mostly for fun :)

edflsafoiewq

0 replies

20h53m

2024-09-07 21:38:30 UTC

I mean, it's all just for fun of course.

jsnell

11 replies

23h51m

2024-09-07 18:40:16 UTC

That size difference is large enough to make a difference in the number of round trips required (should be roughly one fewer roundtrip with any sensible modern value for the initial congestion window).

Won't be a 2.5x difference, but also not 0.001%.

BugsJustFindMe

5 replies

23h38m

2024-09-07 18:52:43 UTC

You don't need a new roundtrip for every packet. That would be devastating for throughput. One vs two vs three file packets get acked as a batch either way, not serially.

Also when you get to the end, you then see

The actual savings here are moderate: the original is 88 KiB with gzip, and the WebP one is 83 KiB with gzip. In contrast, Brotli would provide 69 KiB.

At 69 KiB you're still over the default TCP packet max, which means both cases transmit the same number of packets, one just has a bunch of extra overhead added for the extra JavaScript fetch, load, and execute.

The time saved here is going to be negligible at best anyway, but there looks to be actually negative because we're burning time without reducing the number of needed packets at all.

jsnell

4 replies

23h19m

2024-09-07 19:12:12 UTC

Those numbers are for a different page. For the original page, the article quotes 44 kB with this method vs. 92 kB for gzip.

At 69 KiB you're still over the default TCP packet max, which means both cases transmit the same number of packets,

What? No, they absolutely don't transmit the same number of packets. Did you mean some other word?

mananaysiempre

2 replies

22h20m

2024-09-07 20:11:20 UTC

I expect what GP meant is the default TCP window size, so in a situation where bandwidth costs are dwarfed by roundtrip costs, these two cases will end up taking essentially the same time, because they will incur the same number of ACK roundtrips. Don’t know if the numbers work out, but they at least sound plausible.

jsnell

0 replies

19h3m

2024-09-07 23:28:11 UTC

No, there is no way the numbers would work out to the same number of roundtrips. The sizes are different by a factor of 2.5x, and the congestion window will only double in a single roundtrip. The only way the number of roundtrips would be the same is if both transfers fit in the initial congestion window.

BugsJustFindMe

0 replies

21h13m

2024-09-07 21:18:05 UTC

Yes, sorry

codetrotter

0 replies

22h22m

2024-09-07 20:09:35 UTC

They were probably thinking of the max size for packets in TCP, which is 64K (65535 bytes).

However, Ethernet has a MTU (Maximum Transmission Unit) of 1500 bytes. Unless jumbo frames are used.

And so I agree with you, the number of packets that will be sent for 69 KiB vs 92 KiB will likely be different.

pierrec

4 replies

23h6m

2024-09-07 19:25:03 UTC

Interesting, how does that add a round trip? For the record here's what I believe to be the common definition of an additional "round trip", in a web development context:

  - client requests X
  - client gets X, which contains a reference to Y
  - therefore client requests Y

So you're starting a new request that depends on the client having received the first one. (although upon closer inspection I think the technique described in the blog post manages to fit everything into the first response, so I'm not sure how relevant this is)

crote

2 replies

22h8m

2024-09-07 20:23:13 UTC

In reality it's more like:

  - client requests X
  - server sends bytes 0-2k of X
  - client acknowledges bytes 0-2k of X
  - server sends bytes 2k-6k of X
  - client acknowledges bytes 2k-6k of X
  - server sends bytes 6k-14k of X
  - client acknowledges bytes 6k-14k of X
  - server sends bytes 14k-30k of X
  - client acknowledges bytes 14k-30k of X
  - server sends bytes 30k-62k of X
  - client acknowledges bytes 30k-62k of X
  - server sends bytes 62k-83k of X
  - client acknowledges bytes 62k-83k of X
  - client has received X, which contains a reference to Y
  - therefore client requests Y

It's all about TCP congestion control here. There are dozens of algorithms used to handle it, but in pretty much all cases you want to have some kind of slow buildup in order to avoid completely swamping a slower connection and having all but the first few of your packets getting dropped.

notpushkin

1 replies

18h54m

2024-09-07 23:37:27 UTC

client acknowledges bytes 0-2k of X

Doesn’t client see reference to Y at this point? Modern browsers start parsing HTML even before they receive the whole document.

Timwi

0 replies

13h35m

2024-09-08 04:55:47 UTC

Not just modern. This was even more significant on slow connections, so they've kind of always done that. One could even argue that HTML, HTTP (specifically, chunked encoding) and gzip are all intentionally designed to enable this.

jsnell

0 replies

22h51m

2024-09-07 19:40:28 UTC

Unless a resource is very small, it won't be transmitted in a single atomic unit. The sender will only send a part of it, wait the client to acknowledge having received them, and only then send more. That requires a network roundtrip. The larger the resource, the more network roundtrips will be required.

If you want to learn more, pretty much any resource on TCP should explain this stuff. Here's something I wrote years ago, the background section should be pretty applicable: https://www.snellman.net/blog/archive/2017-08-19-slow-ps4-do...

lxgr

5 replies

21h38m

2024-09-07 20:53:20 UTC

That's certainly reasonable if you optimize only for loading time (and make certain assumptions about everybody's available data rate), but sometimes I really wish website (and more commonly app) authors wouldn't make that speed/data tradeoff so freely on my behalf, for me to find out after they've already pushed that extra data over my metered connection.

The tragedy here is that while some people, such as the author of TFA, go to great lengths to get from about 100 to 50 kB, others don't think twice to send me literally tens of megabytes of images, when I just want to know when a restaurant is open – on roaming data.

Resource awareness exists, but it's unfortunately very unevenly distributed.

k__

3 replies

20h52m

2024-09-07 21:38:58 UTC

We need a MoSh-based browser with gopher support.

nox101

2 replies

17h29m

2024-09-08 01:01:46 UTC

and you'd need AI to tell you whats in the pictures because lots of restaurant sites just have photos of their menu and some designer with no web knowledge put their phone number, address, and hours in an image designed in Photoshop

tacone

1 replies

11h40m

2024-09-08 06:51:29 UTC

You're in for bad luck. Some time ago I tried some photos of pasta dishes with Gemini and it could not guess the recipe name.

mintplant

0 replies

21m

2024-09-08 18:09:46 UTC

I've found Gemini to be pretty terrible at vision tasks compared to the competition. Try GPT-4o or Claude-3.5-Sonnet instead.

zamadatix

0 replies

4h31m

2024-09-08 14:00:25 UTC

There is an interesting "Save-Data" header to let a site know which makes sense to optimize for on connection but it seems to be Chrome only so far https://caniuse.com/?search=save-data

I wish there was a bit of an opposite option - a "don't lazy/partially load anything" for those of us on fiber watching images pop up as we scroll past them in the page that's been open for a minute.

Retr0id

5 replies

23h38m

2024-09-07 18:52:42 UTC

Why is there more latency?

Edit: Ah, I see OP's code requests the webp separately. You can avoid the extra request if you write a self-extracting html/webp polyglot file, as is typically done in the demoscene.

BugsJustFindMe

2 replies

23h35m

2024-09-07 18:56:05 UTC

It takes more time for your message to get back and forth between your computer and the server than it takes for the server to pump out some extra bits.

Even if you transmit the js stuff inline, the op's notion of time still just ignores the fact that it takes the caller time to even ask the server for the data in the first place, and at such small sizes that time swallows the time to transmit from the user's perspective.

Retr0id

1 replies

23h19m

2024-09-07 19:12:16 UTC

Here's a demo that only uses a single request for the whole page load: https://retr0.id/stuff/bee_movie.webp.html

It is technically 2 requests, but the second one is a cache hit, in my testing.

BugsJustFindMe

0 replies

23h16m

2024-09-07 19:15:37 UTC

That's fine, but if you're evaluating the amount of time it takes to load a webpage, you cannot ignore the time it takes for the client request to reach your server in the first place or for the client to then unpack the data. The time saved transmitting such a small number of bits will be a fraction of the time spent making that initial request anyway. That's all I'm saying.

OP is only looking at transmit size differences, which is both not the same as transmit time differences and also not what the user actually experiences when requesting the page.

purplesyringa

1 replies

10h18m

2024-09-08 08:12:55 UTC

Hmm? I'm not sure where you're taking that from. The webp is inlined.

Retr0id

0 replies

4h49m

2024-09-08 13:41:41 UTC

Ah, so it is! I was skim-reading and stopped at `const result = await fetch("compressor/compressed.webp");`

sleepydog

2 replies

21h15m

2024-09-07 21:16:04 UTC

Seriously, if you're saving less than a TCP receive window's worth of space it's not going to make any difference to latency.

I suppose it could make a difference on lossy networks, but I'm not sure.

lelandfe

1 replies

20h48m

2024-09-07 21:42:41 UTC

If the blob contains requests (images, but also stylesheets, JS, or worst case fonts), it will actually instead be a net negative to latency. The browser's preload scanner begins fetching resources even before the HTML is finished being parsed. That can't happen if the HTML doesn't exist until after JS decodes it. In other words, the entire body has become a blocking resource.

These are similar conversations people have around hydration, by the by.

rrr_oh_man

0 replies

12h44m

2024-09-08 05:46:51 UTC

> These are similar conversations people have around hydration

For the uninitiated: https://en.m.wikipedia.org/wiki/Hydration_(web_development)

zahlman

1 replies

18h49m

2024-09-07 23:41:43 UTC

Actually, I was rather wondering about that claim, because it seems accidentally cherry-picked. Regarding that post:

This code minifies to about 550 bytes. Together with the WebP itself, this amounts to 44 KiB. In comparison, gzip was 92 KiB, and Brotli would be 37 KiB.

But regarding the current one:

The actual savings here are moderate: the original is 88 KiB with gzip, and the WebP one is 83 KiB with gzip. In contrast, Brotli would provide 69 KiB. Better than nothing, though.

Most of the other examples don't show dramatic (like more than factor-of-2) differences between the compression methods either. In my own local testing (on Python wheel data, which should be mostly Python source code, thus text that's full of common identifiers and keywords) I find that XZ typically outperforms gzip by about 25%, while Brotli doesn't do any better than XZ.

lifthrasiir

0 replies

5h26m

2024-09-08 13:05:01 UTC

XZ was never considered to become a compression algorithm built into web browsers to start with. Brotli decoder is already there for HTTP, so it has been proposed to include the full Brotli decoder and decoder API as it shouldn't take too much effort to add an encoder and expose them.

Also, XZ (or LZMA/LZMA2 in general) produces a smaller compressed data than Brotli with lots of free time, but is much slower than Brotli when targetting the same compression ratio. This is because LZMA/LZMA2 uses an adaptive range coder and multiple code distribution contexts, both highly contribute to the slowness when higher compression ratios are requested. Brotli only has the latter and its coding is just a bitwise Huffman coder.

oefrha

1 replies

17h25m

2024-09-08 01:06:29 UTC

It’s not just decompression time. They need to download the whole thing before decompression, whereas the browser can decompress and render HTML as it’s streamed from the server. If the connection is interrupted you lose everything, instead of being able to read the part you’ve downloaded.

So, for any reasonable connection the difference doesn’t matter; for actually gruesomely slow/unreliable connections where 50KB matters this is markedly worse. While a fun experiment, please don’t do it on your site.

robocat

0 replies

6h2m

2024-09-08 12:29:35 UTC

Other major issues that I had to contend with:

1: browsers choose when to download files and run JavaScript. It is not as easy as one might think to force JavaScript to run immediately as high priority (which it needs to be when it is on critical path to painting).

2: you lose certain browser optimisations where normally many things are done in parallel. Instead you are introducing delays into critical path and those delays might not be worth the "gain".

3: Browsers do great things to start requesting files in parallel as files are detected with HTML/CSS. Removing that feature can be a poor tradeoff.

There are a few other unobvious downsides. I would never deploy anything like that to a production site without serious engineering effort to measure the costs and benefits.

jgalt212

0 replies

23h32m

2024-09-07 18:58:53 UTC

I have similar feelings on js minification especially if you're sending via gzip.

fsndz

0 replies

20h23m

2024-09-07 22:07:43 UTC

exactly what I thought too

next_xibalba

23 replies

2024-09-07 18:08:13 UTC

If only we hadn't lost Jan Sloot's Digital Coding System [1], we'd be able to transmit GB in milliseconds across the web!

[1] https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System

supriyo-biswas

14 replies

2024-09-07 18:22:07 UTC

This claim itself is probably a hoax and not relevant to the article at hand; but these days with text-to-image models and browser support, you could probably do something like <img prompt="..."> and have the browser render something that matches the description, similar to the "cookbook" analogy used in the Wikipedia article.

Lorin

7 replies

2024-09-07 18:24:38 UTC

That's an interesting concept, although it would generate a ton of bogomips since each client has to generate the image themselves instead of one time on the server.

You'd also want "seed" and "engine" attributes to ensure all visitors see the same result.

LeoPanthera

4 replies

2024-09-07 18:30:04 UTC

You could at least push the work closer to the edge, by having genAI servers on each LAN, and in each ISP, similar to the idea of a caching web proxy before HTTPS rendered them impossible.

lucianbr

3 replies

23h51m

2024-09-07 18:40:23 UTC

Push the work closer to the edge, and multiply it by quite a lot. Generate each image many times. Why would we want this? Seems like the opposite of caching in a sense.

sroussey

1 replies

22h57m

2024-09-07 19:34:22 UTC

If you are reading a web page on mars and bandwidth is more precious than processing power, then <img prompt=“…”> might make sense.

Not so much for us on earth however.

roywiggins

0 replies

22h33m

2024-09-07 19:58:16 UTC

This sort of thing (but applied to video) is a plot point in A Fire Upon The Deep. Vinge's term for the result is an "evocation."

bobbylarrybobby

0 replies

21h5m

2024-09-07 21:26:16 UTC

All compression is, in a sense, the opposite of caching. You have to do more work to get the data, but you save space.

onion2k

1 replies

21h41m

2024-09-07 20:49:57 UTC

Unless you don't actually care if everyone sees the same results. So long as the generated image is approximately what you prompted for, and the content of the image is decorative so it doesn't really need to be a specific, accurate representation of something, it's fine to display a different picture for every user.

One of the best uses of responsive design I've ever seen was a site that looked completely different at different breakpoints - different theme, font, images, and content. It's was beautiful, and creative, and fun. Lots of users saw different things and had no idea other versions were there.

semolino

0 replies

20h27m

2024-09-07 22:04:03 UTC

What site are you referring to?

everforward

2 replies

21h3m

2024-09-07 21:27:40 UTC

Similar ideas have floated around for a while. I’ve always enjoyed the elegance of compressing things down to a start and end index of digits of Pi.

It’s utterly impractical, but fun to muse about how neat it would be if it weren’t.

LtWorf

1 replies

18h58m

2024-09-07 23:32:42 UTC

AFAIK it's not been proved that every combination does exist in π.

By comparison, you could easily define a number that goes 0,123456789101112131415… and use indexes to that number. However the index would probably be larger than what you're trying to encode.

everforward

0 replies

10m

2024-09-08 18:20:57 UTC

Huh, I presumed that any non-repeating irrational number would include all number sequences but I think you’re right. Even a sequence of 1 and 0 could be non-repetitive.

I am curious what the compression ratios would be. I suspect the opposite, but the numbers are at a scale where my mind falters so I wouldn’t say that with any confidence. Just 64 bits can get you roughly 10^20 digits into the number, and the “reach” grows exponentially with bits. I would expect that the smaller the file, the more common its sequence is.

7bit

1 replies

23h58m

2024-09-07 18:32:47 UTC

That would require a lot of GBs in libraries in the browser and a lot of processing power on the client CPU to render an image that is so unimportant that it doesn't really matter if it shows exactly what the author intended. To summarize that in three words: a useless technology.

That idea is something that is only cool in theory.

lxgr

0 replies

21h3m

2024-09-07 21:27:52 UTC

At least for LLMs, something very similar is already happening: https://huggingface.co/blog/Xenova/run-gemini-nano-in-your-b...

Currently, we're definitely not there in terms of space/time tradeoffs for images, but I could imagine at least parameterized ML-based upscaling (i.e. ship a low-resolution image and possibly a textual description, have a local model upscale it to display resolution) at some point.

ec109685

0 replies

22h42m

2024-09-07 19:49:14 UTC

Similar to what Samsung does if you take a picture of the moon.

magicalhippo

2 replies

22h22m

2024-09-07 20:09:27 UTC

Reminds me of when I was like 13 and learned about CRC codes for the first time. Infinite compression here we come! Just calculate the 32bit CRC code for say 64 bits, transmit the CRC, then on the other hand just loop over all possible 64 bit numbers until you got the same CRC. So brilliant! Why wasn't this already used?!

Of course, the downsides became apparent once the euphoria had faded.

lxgr

0 replies

21h2m

2024-09-07 21:29:23 UTC

Very relatable! "If this MD5 hash uniquely identifies that entire movie, why would anyone need to ever send the actual... Oh, I get it."

Arguably the cruelest implication of the pigeonhole principle.

_factor

0 replies

22h10m

2024-09-07 20:21:36 UTC

Even better, take a physical object and slice it precisely in a ratio that contains your data in the fraction!

lxgr

1 replies

21h17m

2024-09-07 21:14:21 UTC

The Sloot Digital Coding System is an alleged data sharing technique that its inventor claimed could store a complete digital movie file in 8 kilobytes of data

8 kilobytes? Rookie numbers. I'll do it in 256 bytes, as long as you're fine with a somewhat limited selection of available digital movie files ;)

MBCook

0 replies

18h28m

2024-09-08 00:03:23 UTC

I can shrink any file down to just 32 bits using my unique method.

I call it the High Amplitude Shrinkage Heuristic, or H.A.S.H.

It is also reversible, but only safely to the last encoded file due to quantum hyperspace entanglement of ionic bonds. H.A.S.H.ing a different file will disrupt them preventing recovery of the original data.

Spivak

1 replies

21h47m

2024-09-07 20:44:16 UTC

The real version of this is Nvidia's web conferencing demo where they make a 3d model of your face and then only transfer the wireframe movements which is super clever.

https://m.youtube.com/watch?v=TQy3EU8BCmo

You can really feel the "compute has massively outpaced networking speed" where this kind of thing is actually practical. Maybe I'll see 10G residential in my lifetime.

kstrauser

0 replies

20h56m

2024-09-07 21:34:53 UTC

This messages comes to you via a 10Gbps, $50 residential fiber connection.

The future is already here – it's just not very evenly distributed.

Groxx

0 replies

23h12m

2024-09-07 19:19:02 UTC

Time Lords probably, saving us from the inevitable end of this technology path, where all data in the universe is compressed into one bit which leads to an information-theoretic black hole that destroys everything.

raggi

8 replies

23h27m

2024-09-07 19:04:11 UTC

Chromies got in the way of it for a very long time, but zstd is now coming to the web too, as it’s finally landed in chrome - now we’ve gotta get safari onboard

mananaysiempre

6 replies

22h14m

2024-09-07 20:16:59 UTC

I’d love to go all-Zstandard, but in this particular case, as far as I know, Brotli and Zstandard are basically on par at identical values of decompressor memory consumption.

simondotau

4 replies

19h40m

2024-09-07 22:51:34 UTC

Realistically, everything should just support everything. There’s no reason why every (full featured) web server and every (full featured) web browser couldn’t support all compelling data compression algorithms.

Unfortunately we live in a world where Google decides to rip JPEG-XL support out of Chrome for seemingly no reason other than spite. If the reason was a lack of maturity in the underlying library, fine, but that wasn’t the reason they offered.

madeofpalk

2 replies

18h41m

2024-09-07 23:49:58 UTC

There’s no reason

Of course, there is - and it's really boring. Prioritisation, and maintenance.

It's a big pain to add, say, 100 compressions formats and support them indefinitely, especially with little differentiation between them. Once we agree on what the upper bound of useless formats is, we can start to negotiate what the lower limit is.

simondotau

0 replies

6h8m

2024-09-08 12:23:25 UTC

I qualified it with compelling which means only including formats/encodings which have demonstrably superior performance in some non-trivial respect.

And I qualified it with mature implementation because I agree that if there is no implementation which has a clear specification, is well written, actively maintained, and free of jank, then it ought not qualify.

Relative to the current status quo, I would only imagine the number of data compression, image compression, and media compression options to increase by a handful. Single digits. But the sooner we add them, the sooner they can become sufficiently widely deployed as to be useful.

raggi

0 replies

15h33m

2024-09-08 02:58:21 UTC

It’s not prioritization of the code - it’s relatively little implementation and low maintenance but security is critical here and everyone is very afraid of compressors because gzip, jpeg and various others were pretty bad. Zstd, unlike lz4 before it (at least early on) has a good amount of tests and fuzzing. The implementation could probably be safer (with fairly substantial effort) but having test and fuzzing coverage is a huge step forward over prior industry norms

rafaelmn

0 replies

17h22m

2024-09-08 01:09:07 UTC

How many CVEs came out of different file format handling across platforms ? Including shit in browsers has insane impact.

raggi

0 replies

15h27m

2024-09-08 03:04:07 UTC

Brotli and zstd are close, each trading in various configurations and cases - but zstd is muuuuuch more usable in a practical sense, because the code base can be easily compiled and linked against in a wider number of places with less effort, and the cli tools turn up all over for similar reasons. Brotli like many Google libraries in the last decade are infected with googlisms. Brotli is less bad, in a way, for example the C comes with zero build system at all, which is marginally better than a bazel disaster of generated stuff, but it’s still not in a state the average distro contributor gives a care to go turn into a library, prepare pkgconf rules for and whatnot - plus no versioning and so on. Oh and the tests are currently failing.

CharlesW

0 replies

23h6m

2024-09-07 19:25:06 UTC

Looks like it's on the To Do list, at least: https://webkit.org/standards-positions/#position-168

lxgr

6 replies

21h30m

2024-09-07 21:01:34 UTC

From the linked Github issue giving the rationale why Brotli is not available in the CompressionStream API:

As far as I know, browsers are only shipping the decompression dictionary. Brotli has a separate dictionary needed for compression, which would significantly increase the size of the browser.

How can the decompression dictionary be smaller than the compression one? Does the latter contain something like a space-time tradeoff in the form of precalculated most efficient representations of given input substrings or something similar?

zamadatix

2 replies

18h5m

2024-09-08 00:26:13 UTC

Perhaps I'm reading past some of the surrounding context but that doesn't actually say the problem is about the relative sizes, just that the compression dictionary isn't already in browsers while the decompression dictionary already is.

It's a bit disappointing you can't use Brotli in the DecompressionStream() interface just because it may or may not be available in the CompressionStream() interface though.

lxgr

1 replies

15h10m

2024-09-08 03:20:50 UTC

I'm actually not convinced that there are two different dictionaries at all. The Brotli RFC only talks about a static dictionary, not a separate encoding vs. a decoding dictionary.

My suspicion is that this is a confusion of the (runtime) sliding window, which limits maximum required memory on the decoder's side to 16 MB, with the actual shared static dictionary (which needs to be present in the decoder only, as far as I can tell; the encoder can use it, and if it does, it would be the same one the decoder has as well).

zamadatix

0 replies

14h12m

2024-09-08 04:19:38 UTC

IIRC from working with Brotli before it's not that the it's truly a "different dictionary" but rather more like a "reverse view" of what is ultimately the same dictionary mappings.

On one hand it seems a bit silly to worry about ~100 KB in browser for what will probably, on average, save more than that in upload/download the first time it is used. On the other hand "it's just a few hundred KB" each release for a few hundred releases ends up being a lot of cruft you can't remove without breaking old stuff. On the third hand coming out of our head... it's not like Chrome has been against shipping more for functionality for features they'd like to impose on users even if users don't actually want them anyways so what are small ones users can actually benefit from against that.

lifthrasiir

2 replies

15h5m

2024-09-08 03:25:50 UTC

I believe the compression dictionary refers to [1], which is used to quickly match dictionary-compressable byte sequences. I don't know where 170 KB comes from, but that hash alone does take 128 KiB and might be significant if it can't be easily recomputed. But I'm sure that it can be quickly computed on the loading time if the binary size is that important.

[1] https://github.com/google/brotli/blob/master/c/enc/dictionar...

lxgr

1 replies

14h58m

2024-09-08 03:33:01 UTC

I was wondering that too, but that dictionary itself compresses down to 29 KB (using regular old gzip), so it seems pretty lightweight to include even if it were hard/annoying to precompute at runtime or install time.

lifthrasiir

0 replies

14h39m

2024-09-08 03:52:32 UTC

Once installed it will occupy 128 KiB of data though, so it might be still relevant for the Chromium team.

butz

5 replies

21h45m

2024-09-07 20:46:36 UTC

Dropping google fonts should improve page load time a bit too, considering those are loaded from remote server that requires additional handshake.

kevindamm

4 replies

21h37m

2024-09-07 20:54:14 UTC

..but if enough other sites are also using that font then it may already be available locally.

pornel

2 replies

20h59m

2024-09-07 21:32:30 UTC

This has stopped working many years ago. Every top-level domain now has its own private cache of all other domains.

You likely have dozens of copies of Google Fonts, each in a separate silo, with absolutely zero reuse between websites.

This is because a global cache use to work like a cookie, and has been used for tracking.

madeofpalk

0 replies

18h40m

2024-09-07 23:51:11 UTC

Where "many years ago" is... 11 years ago for Safari! https://bugs.webkit.org/show_bug.cgi?id=110269

kevindamm

0 replies

20h32m

2024-09-07 21:59:18 UTC

ah, I had forgotten about that, you're right.

well at least you don't have to download it more than once for the site, but first impressions matter yeah

SushiHippie

0 replies

20h52m

2024-09-07 21:39:02 UTC

No, because of cache partitioning, which has been a thing for a while.

https://developer.chrome.com/blog/http-cache-partitioning

niutech

4 replies

22h26m

2024-09-07 20:04:47 UTC

This page is broken at least on Sailfish OS browser, there is a long empty space after the paragraph:

Alright, so we’re dealing with 92 KiB for gzip vs 37 + 71 KiB for Brotli. Umm…

That said, the overhead of gzip vs brotli HTML compression is nothing compared with amount of JS/images/video current websites use.

mediumsmart

2 replies

21h47m

2024-09-07 20:43:45 UTC

same on orion and safari and librewolf - is this a chrome page?

zamadatix

0 replies

4h37m

2024-09-08 13:54:06 UTC

Works fine on Safari and Firefox for me.

simmonmt

0 replies

20h59m

2024-09-07 21:32:00 UTC

A different comment says librewolf disables webgl by default, breaking OP's decompression. Is that what you're seeing?

txtsd

0 replies

3h49m

2024-09-08 14:41:47 UTC

Same on Mull

gkbrk

4 replies

23h53m

2024-09-07 18:37:47 UTC

Why readPixels is not subject to anti-fingerprinting is beyond me. It does not sprinkle hardly visible typos all over the page, so that works for me.

keep the styling and the top of the page (about 8 KiB uncompressed) in the gzipped HTML and only compress the content below the viewport with WebP

Ah, that explains why the article suddenly cut off after a random sentence, with an empty page that follows. I'm using LibreWolf which disables WebGL, and I use Chromium for random web games that need WebGL. The article worked just fine with WebGL enabled, neat technique to be honest.

niutech

3 replies

19h22m

2024-09-07 23:09:38 UTC

It isn't neat as long as it doesn't work with all modern web browsers (even with fingerprinting protection) and doesn't have a fallback for older browsers. WWW should be universally accessible and progressively enhanced, starting with plain HTML.

kjhcvkek77

1 replies

8h1m

2024-09-08 10:29:44 UTC

This philosophy hands your content on a silver platter to ai companies, so they can rake in money while giving nothing back to the author.

latexr

0 replies

7h20m

2024-09-08 11:11:36 UTC

I don’t support LLM companies stealing content and profiting from it without contributing back. But if you’re going to fight that by making things more difficult for humans, especially those with accessibility needs, then what even is the point of publishing anything?

afavour

0 replies

17h9m

2024-09-08 01:22:31 UTC

It isn’t a serious proposal. It’s a creative hack that no one, author included is suggesting should be used in production.

lifthrasiir

3 replies

15h21m

2024-09-08 03:10:24 UTC

It is actually possible to use Brotli directly in the web browser... with caveats of course. I believe my 2022 submission to JS1024 [1] is the first ever demonstration of this concept, and I also have a proof-of-concept code for the arbitrary compression (which sadly didn't work for the original size-coding purpose though). The main caveat is that you are effectively limited to the ASCII character, and that it is highly sensitive to the rendering stack for the obvious reason---it no longer seems to function in Firefox right now.

[1] https://js1024.fun/demos/2022/18/readme

[2] https://gist.github.com/lifthrasiir/1c7f9c5a421ad39c1af19a9c...

zamadatix

0 replies

4h40m

2024-09-08 13:51:35 UTC

The key note for understanding the approach (without diving into how to understand it's actually wrangled):

The only possibility is to use the WOFF2 font file format which Brotli was originally designed for, but you need to make a whole font file to leverage this. This got more complicated recently by the fact that modern browsers sanitize font files, typically by the OpenType Sanitizer (OTS), as it is very insecure to put untrusted font files directly to the system. Therefore we need to make an WOFF2 file that is sane enough to be accepted by OTS _and_ has a desired byte sequence inside which can be somehow extracted. After lots of failed experiments, I settled on the glyph widths ("advance") which get encoded in a sequence of two-byte signed integers with almost no other restrictions.

Fantastic idea!

purplesyringa

0 replies

10h16m

2024-09-08 08:14:46 UTC

This technique is amazing and way cooler than my post. Kudos!

lifthrasiir

0 replies

4h3m

2024-09-08 14:27:55 UTC

Correction: It still works on Firefox, I just forgot that its zoom factor should be exactly 100% on Firefox to function. :-)

csjh

3 replies

23h14m

2024-09-07 19:16:46 UTC

I think the most surprising part here is the gzipped-base64'd-compressed data almost entirely removes the base64 overhead.

zamadatix

2 replies

17h48m

2024-09-08 00:42:47 UTC

It feels somewhat intuitive since (as the article notes) the Huffman encoding stage effectively "reverses" the original base64 overhead issue that an 8 bit (256 choices) index is used for 6 bits (64 choices) of actual characters. A useful compression algorithm which _didn't_ do this sort of thing would be very surprising as it would mean it doesn't notice simple patterns in the data but somehow compresses things anyways.

ranger_danger

1 replies

17h44m

2024-09-08 00:46:57 UTC

how does it affect error correction though?

zamadatix

0 replies

16h40m

2024-09-08 01:51:32 UTC

Neither base64 nor the gzipped version have error correction as implemented. The extra overhead bits in base64 come from selecting only a subset of printable characters, not by adding redundancy to the useful bits.

astrostl

3 replies

20h35m

2024-09-07 21:55:41 UTC

Things I seek in an image format:

(1) compatibility

(2) features

WebP still seems far behind on (1) to me so I don't care about the rest. I hope it gets there, though, because folks like this seem pretty enthusiastic about (2).

mistrial9

1 replies

17h56m

2024-09-08 00:35:23 UTC

agree - webp has a lib on Linux but somehow, standard image viewers just do not read it, so -> "FAIL"

sgbeal

0 replies

9h7m

2024-09-08 09:24:14 UTC

webp has a lib on Linux but somehow, standard image viewers just do not read it

That may apply to old "LTS" Linuxes, but not any relatively recent one. Xviewer and gimp immediately come to mind as supporting it and i haven't had a graphics viewer on Linux _not_ be able to view webp in at least 3 or 4 years.

jfoster

0 replies

13h30m

2024-09-08 05:00:43 UTC

The compatibility gap on WebP is already quite small. Every significant web browser now supports it. Almost all image tools & viewers do as well.

Lossy WebP comes out a lot smaller than JPEG. It's definitely worth taking the saving.

Retr0id

3 replies

23h40m

2024-09-07 18:51:20 UTC

I've used this trick before! Oddly enough I can't remember what I used it for (perhaps just to see if I could), and I commented on it here: https://gist.github.com/gasman/2560551?permalink_comment_id=...

Edit: I found my prototype from way back, I guess I was just testing heh: https://retr0.id/stuff/bee_movie.webp.html

lucb1e

1 replies

20h9m

2024-09-07 22:22:20 UTC

That page breaks my mouse gestures add-on! (Or, I guess we don't have add-ons anymore but rather something like script injections that we call extensions, yay...) Interesting approach to first deliver 'garbage' and then append a bit of JS to transform it back into a page. The inner security nerd in me wonders if this might open up attacks if you would have some kind of user-supplied data, such as a comment form. One could probably find a sequence of bytes to comment that will, after compression, turn into a script tag, positioned (running) before yours would?

Retr0id

0 replies

19h15m

2024-09-07 23:16:15 UTC

Yeah that's plausible, you definitely don't want any kind of untrusted data in the input.

Something I wanted to do but clearly never got around to, was figuring out how to put an open-comment sequence (<!--) in a header somewhere, so that most of the garbage gets commented out

lifthrasiir

0 replies

15h13m

2024-09-08 03:17:58 UTC

In my experience WebP didn't work well for the general case where this technique is actually useful (i.e. data less than 10 KB), because the most additions of WebP lossless to PNG are about modelling and not encoding, while this textual compression would only use the encoding part of WebP.

Dibby053

3 replies

22h15m

2024-09-07 20:16:29 UTC

I didn't know canvas anti-fingerprinting was so rudimentary. I don't think it increases uniqueness (the noise is different every run) but bypassing it seems trivial: run the thing n times and take the mode. With so little noise, 4 or 5 times should be more than enough.

hedora

2 replies

21h51m

2024-09-07 20:40:31 UTC

The article says it’s predictable within a given client, so your trick wouldn’t work.

So, just use the anti-fingerprint noise as a cookie, I guess?

Dibby053

1 replies

21h8m

2024-09-07 21:23:29 UTC

Huh, it seems it's just my browser that resets the noise every run.

I opened the page in Firefox like the article suggests and I get a different pattern per site and session. That prevents using the noise as a supercookie, I think, if its pattern changes every time cookies are deleted.

sunaookami

0 replies

20h21m

2024-09-07 22:10:39 UTC

It is different per site and per session, yes.

98469056

2 replies

23h58m

2024-09-07 18:32:56 UTC

While peeking at the source, I noticed that the doctype declaration is missing a space. It currently reads <!doctypehtml>, but it should be <!doctype html>

palsecam

0 replies

12h12m

2024-09-08 06:19:21 UTC

`<!doctype html>` can be minified into `<!doctypehtml>`.

It’s, strictly speaking, invalid HTML, but it still successfully triggers standards mode.

See https://GitHub.com/kangax/html-minifier/pull/970 / https://HTML.spec.WHATWG.org/multipage/parsing.html#parse-er...

(I too use that trick on https://FreeSolitaire.win)

BugsJustFindMe

0 replies

16h48m

2024-09-08 01:43:39 UTC

Maybe their javascript adds it back in :)

rrrix1

1 replies

1h20m

2024-09-08 17:11:35 UTC

I very much enjoyed reading this. Quite clever!

But...

Annoyingly, I host my blog on GitHub pages, which doesn’t support Brotli.

Is the glaringly obvious solution to this not as obvious as I think it is?

TFA went through a lot of round-about work to get (some) Brotli compression. Very impressive Yak Shave!

If you're married to the idea of a Git-based automatically published web site, you could at least replicate your code and site to Gitlab Pages, which has supported precompressed Brotli since 2019. Or use one of Cloudflare's free tier services. There's a variety of ways to solve this problem before the first byte is sent to the client.

Far too much of the world's source code already depends exclusively on Github. I find it distasteful to also have the small web do the same while blindly accepting an inferior experience and worse technology.

purplesyringa

0 replies

1h11m

2024-09-08 17:20:04 UTC

The solution's obvious, but if I followed it, I wouldn't have a fun topic to discuss or fake internet points to boast about, huh? :)

I'll probably switch to Cloudflare Pages someday when I have time to do that.

ranger_danger

1 replies

17h45m

2024-09-08 00:45:44 UTC

ensure it works without JavaScript enabled

manually decompress it in JavaScript

Brotli decompressor in WASM

the irony seems lost

purplesyringa

0 replies

10h9m

2024-09-08 08:22:01 UTC

A nojs version is compiled separately and is linked via a meta refresh. Not ideal, and I could add some safeguards for people without webgl, but I'm not an idiot.

galaxyLogic

1 replies

23h2m

2024-09-07 19:29:05 UTC

Is there a tool or some other way to easily encode a JPG image so it can be embedded into HTML? I know there is something like that, but is it easy? Could it be made easier?

throwanem

0 replies

22h53m

2024-09-07 19:37:50 UTC

You can convert it to base64 and inline it anywhere an image URL is accepted, eg

    <img src="data:image/jpeg;base64,abc123..." />

(Double-check the exact syntax and the MIME type before you use it; it's been a few years since I have, and this example is from perhaps imperfect memory.)

butz

1 replies

21h41m

2024-09-07 20:49:41 UTC

I wonder what is the difference in CPU usage on client side for WebP variant vs standard HTML? Are you causing more battery drain on visitor devices?

lucb1e

0 replies

20h0m

2024-09-07 22:31:10 UTC

It depends. Quite often (this is how you can tell I live in Germany) mobile data switches to "you're at the EDGE of the network range" mode¹ and transferring a few KB means keeping the screen on and radio active for a couple of minutes. If the page is now 45KB instead of 95KB, that's a significant reduction in battery drain!

Under normal circumstances you're probably very right

¹ Now I wonder if the makers foresaw how their protocol name might sound to us now

Pesthuf

1 replies

19h27m

2024-09-07 23:03:56 UTC

It's impressive how close this is to Brotli even though brotli has this massive pre-shared dictionary. Is the actual compression algorithm used by it just worse, or does the dictionary just matter much less than I think?

lifthrasiir

0 replies

15h3m

2024-09-08 03:28:36 UTC

Pre-shared dictionary is most effective for the small size that can't reach the stationary distribution required for typical compressors. I don't know the exact threshold, but my best guess is around 1--10 KB.

Jamie9912

1 replies

11h27m

2024-09-08 07:04:20 UTC

Why don't they make zstd images surely that would beat webp

sgbeal

0 replies

9h3m

2024-09-08 09:28:36 UTC

Why don't they make zstd images surely that would beat webp

zstd is a general-purpose compressor. By and large (and i'm unaware of any exceptions), specialized/format-specific compression (like png, wepb, etc.) will compress better than a general-purpose compressor because format-specific compressors can take advantage of quirks of the format which a general-purpose solution cannot. Also, format-specific ones are often lossy (or conditionally so), enabling them to trade lower fidelity for better compression, something a general-purpose compressor cannot do.

DaleCurtis

1 replies

1h19m

2024-09-08 17:12:01 UTC

What a fun excursion :) You can also use the ImageDecoder API: https://developer.mozilla.org/en-US/docs/Web/API/ImageDecode... and VideoFrame.copyTo: https://developer.mozilla.org/en-US/docs/Web/API/VideoFrame/... to skip canvas entirely.

purplesyringa

0 replies

1h13m

2024-09-08 17:18:13 UTC

It's unfortunately Chromium-only for now, and I wanted to keep code simple. I've got a PoC lying around with VideoFrame and whatnot, but I thought this would be better for a post.

toddmorey

0 replies

21h56m

2024-09-07 20:35:17 UTC

"I hope you see where I’m going with this and are yelling 'Oh why the fuck' right now."

I love reading blogpost like these.

somishere

0 replies

20h25m

2024-09-07 22:05:56 UTC

Lots of nice tricks in here, definitely fun! Only minor nitpick is that it departs fairly rapidly from the lede ... which espouses the dual virtues of an accessible and js-optional reading experience ;)

phh

0 replies

9h51m

2024-09-08 08:40:31 UTC

A real-world web page compressed with WebP? Oh, how about the one you’re reading right now? Unless you use an old browser or have JavaScript turned off, WebP compresses this page starting from the “Fool me twice” section. If you haven’t noticed this, I’m happy the trick is working :-)

Well it didn't work in Materialistic (I guess their webview disable js), and the failure mode is really not comfortable.

niceguy4

0 replies

2024-09-07 18:10:28 UTC

Not to side track the conversation but to side track the conversation, has there been many other major WebP exploits like the serious one in the past?

michaelbrave

0 replies

2h17m

2024-09-08 16:13:43 UTC

I personally don't much care for the format, if I save an image and it ends up WebP then I have to convert it before I can edit or use it in any meaningful way since it's not supported in anything other than web browsers. It's just giving me extra steps to have to do.

kopirgan

0 replies

20h18m

2024-09-07 22:13:37 UTC

19 year old and look at the list of stuff she's done! Perhaps started coding in the womb?! Amazing.

kardos

0 replies

3h1m

2024-09-08 15:30:28 UTC

On the fingerprinting noise: this sounds like a job for FEC [1]. It would increase the size but allow using the Canvas API. I don't know if this would solve the flicker though (not a front end expert here)

Also, it's a long shot, but could the combo of FEC (+size) and lossy compression (-size) be a net win?

[1] https://en.m.wikipedia.org/wiki/Error_correction_code

jfoster

0 replies

14h2m

2024-09-08 04:29:07 UTC

I work on Batch Compress (https://batchcompress.com/en) and recently added WebP support, then made it the default soon after.

As far as I know, it was already making the smallest JPEGs out of any of the web compression tools, but WebP was coming out only ~50% of the size of the JPEGs. It was an easy decision to make WebP the default not too long after adding support for it.

Quite a lot of people use the site, so I was anticipating some complaints after making WebP the default, but it's been about a month and so far there has been only one complaint/enquiry about WebP. It seems that almost all tools & browsers now support WebP. I've only encountered one website recently where uploading a WebP image wasn't handled correctly and blocked the next step. Almost everything supports it well these days.

gildas

0 replies

20h59m

2024-09-07 21:32:22 UTC

In the same vein, you can package HTML pages as self-extracting ZIP files with SingleFile [1]. You can even include a PNG image to produce files compatible with HTML, ZIP and PNG [2], and for example display the PNG image in the HTML page [3].

[1] https://github.com/gildas-lormeau/SingleFile?tab=readme-ov-f...

[2] https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG

[3] https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG/raw/...

divbzero

0 replies

16h1m

2024-09-08 02:30:15 UTC

Typically, Brotli is better than gzip, and gzip is better than nothing. gzip is so cheap everyone enables it by default, but Brotli is way slower.

Note that way slower applies to speed of compression, not decompression. So Brotli is a good bet if you can precompress.

Annoyingly, I host my blog on GitHub pages, which doesn’t support Brotli.

If your users all use modern browsers and you host static pages through a service like Cloudflare or CloudFront that supports custom HTTP headers, you can implement your own Brotli support by precompressing the static files with Brotli and adding a Content-Encoding: br HTTP header. This is kind of cheating because you are ignoring proper content negotiation with Accept-Encoding, but I’ve done it successfully for sites with targeted user bases.

cobbal

0 replies

3h40m

2024-09-08 14:51:37 UTC

I would love to try reading the lossy version.

bogzz

0 replies

20h24m

2024-09-07 22:07:23 UTC

Still waiting on a webp encoder to be added to the Go stdlib...

bawolff

0 replies

21h38m

2024-09-07 20:52:48 UTC

They did all this and didn't even measure time to first paint?

What is the point of doing this sort of thing if you dont even test how much faster or slower it made the page to load?

ajsnigrutin

0 replies

17h46m

2024-09-08 00:45:38 UTC

So... 60kB less of transfer... and how much slower is it on an eg. galaxy s8 phone my mom has, because of all the shenanigans done to save those 60kB?

TacticalCoder

0 replies

22h54m

2024-09-07 19:36:47 UTC

I loved that (encoding stuff in webp) but my takeaway from the figures in the article is this: brotli is so good I'll host from somewhere where I can serve brotli (when and if the client supports brotli ofc).