return to table of content

WebP: The WebPage Compression Format

BugsJustFindMe
52 replies
23h59m

the longest post on my site, takes 92 KiB instead of 37 KiB. This amounts to an unnecessary 2.5x increase in load time

Sure, if you ignore latency. In reality it's an unnecessary 0.001% increase in load time because that size increase isn't enough to matter vs the round trip time. And the time you save transmitting 55 fewer KiB is probably less than the time lost to decompression. :p

While fun, I would expect this specific scenario to actually be worse for the user experience not better. Speed will be a complete wash and compatibility will be worse.

edflsafoiewq
18 replies
23h2m

Well, that, and there's an 850K Symbols-2048-em%20Nerd%20Font%20Complete.woff2 file that sort of drowns out the difference, at least if it's not in cache.

lxgr
6 replies
21h18m

Now I got curious, and there's also a 400 kB CSS file to go with it: https://purplesyringa.moe/fonts/webfont.css

I'm not up to date on web/font development – does anybody know what that does?

bobbylarrybobby
5 replies
21h7m

It adds unicode characters before elements with the given class. Then it's up to the font to display those Unicode characters — in this case, based on the class names, one can infer that the font assigns an icon to each character used.

lxgr
4 replies
21h0m

That makes sense, thank you!

So the purpose is effectively to have human-readable CSS class names to refer to given glyphs in the font, rather than having stray private use Unicode characters in the HTML?

lobsterthief
3 replies
20h49m

Yep

This is a reasonable approach if you have a large number of icons across large parts of the site, but you should always compile the CSS/icon set down to only those used.

If only a few icons, and the icons are small, then inlining the SVG is a better option. But if you have too many SVGs directly embedded on the site, the page size itself will suffer.

As always with website optimization, whether something is a good option always “depends”.

yencabulator
0 replies
3h45m

More reasonable than this class+CSS would be e.g. a React/static-website-template/etc custom element that outputs the correct glyph. The output doesn't need to contain this indirection, and all of the possibilities.

lifthrasiir
0 replies
14h29m

I think at least some recent tools will produce ligatures to turn a plain text into an icon to avoid this issue.

alpaca128
0 replies
6h18m

Another detail is that this feature breaks and makes some sites nearly unusable if the browser is set to ignore a website's custom fonts.

est
6 replies
15h7m

seriously, why can't modern browsers turn off features like remote fonts, webrtc, etc. in settings. I hate when reading a bit then the font changes. Not to say fingerprinting risks.

BugsJustFindMe
4 replies
13h37m

You can, and then when someone uses an icon font instead of graphics their page breaks.

Dalewyn
2 replies
13h12m

Skill issue.

Pictures are pictures, text is text. <img> tag exists for a reason.

lenkite
1 replies
10h21m

Its a convenience packaging issue. An icon font is simply more convenient to handle. <img> tags for a hundred images requires more work.

Icon fonts are used all over the place - look at the terminal nowadays. Most TUI's require an icon font to be installed.

Dalewyn
0 replies
44m

<img> tags for a hundred images requires more work.

So it's a skill issue.

est
0 replies
6h51m

I believe many Web APIs can be enabled/disabled on a website basis, no?

collinmanderson
0 replies
5h29m

iPhone lock down mode turns off remote fonts (including breaking font icons as sibling says)

marcellus23
3 replies
21h2m

Wow, yeah. That kind of discredits the blog author a bit.

purplesyringa
1 replies
10h21m

It's cached. Not ideal, sure, and I'll get rid of that bloat someday, but that file is not mission-critical and the cost is amortized between visits. I admit my fault though.

marcellus23
0 replies
3h7m

Definitely fair and I was a bit harsh. It just seemed a bit nonsensical to go through such efforts to get rid of a few kilobytes while serving a massive font file. But I also understand that it's mostly for fun :)

edflsafoiewq
0 replies
20h53m

I mean, it's all just for fun of course.

jsnell
11 replies
23h51m

That size difference is large enough to make a difference in the number of round trips required (should be roughly one fewer roundtrip with any sensible modern value for the initial congestion window).

Won't be a 2.5x difference, but also not 0.001%.

BugsJustFindMe
5 replies
23h38m

You don't need a new roundtrip for every packet. That would be devastating for throughput. One vs two vs three file packets get acked as a batch either way, not serially.

Also when you get to the end, you then see

The actual savings here are moderate: the original is 88 KiB with gzip, and the WebP one is 83 KiB with gzip. In contrast, Brotli would provide 69 KiB.

At 69 KiB you're still over the default TCP packet max, which means both cases transmit the same number of packets, one just has a bunch of extra overhead added for the extra JavaScript fetch, load, and execute.

The time saved here is going to be negligible at best anyway, but there looks to be actually negative because we're burning time without reducing the number of needed packets at all.

jsnell
4 replies
23h19m

Those numbers are for a different page. For the original page, the article quotes 44 kB with this method vs. 92 kB for gzip.

At 69 KiB you're still over the default TCP packet max, which means both cases transmit the same number of packets,

What? No, they absolutely don't transmit the same number of packets. Did you mean some other word?

mananaysiempre
2 replies
22h20m

I expect what GP meant is the default TCP window size, so in a situation where bandwidth costs are dwarfed by roundtrip costs, these two cases will end up taking essentially the same time, because they will incur the same number of ACK roundtrips. Don’t know if the numbers work out, but they at least sound plausible.

jsnell
0 replies
19h3m

No, there is no way the numbers would work out to the same number of roundtrips. The sizes are different by a factor of 2.5x, and the congestion window will only double in a single roundtrip. The only way the number of roundtrips would be the same is if both transfers fit in the initial congestion window.

BugsJustFindMe
0 replies
21h13m

Yes, sorry

codetrotter
0 replies
22h22m

They were probably thinking of the max size for packets in TCP, which is 64K (65535 bytes).

However, Ethernet has a MTU (Maximum Transmission Unit) of 1500 bytes. Unless jumbo frames are used.

And so I agree with you, the number of packets that will be sent for 69 KiB vs 92 KiB will likely be different.

pierrec
4 replies
23h6m

Interesting, how does that add a round trip? For the record here's what I believe to be the common definition of an additional "round trip", in a web development context:

  - client requests X
  - client gets X, which contains a reference to Y
  - therefore client requests Y
So you're starting a new request that depends on the client having received the first one. (although upon closer inspection I think the technique described in the blog post manages to fit everything into the first response, so I'm not sure how relevant this is)

crote
2 replies
22h8m

In reality it's more like:

  - client requests X
  - server sends bytes 0-2k of X
  - client acknowledges bytes 0-2k of X
  - server sends bytes 2k-6k of X
  - client acknowledges bytes 2k-6k of X
  - server sends bytes 6k-14k of X
  - client acknowledges bytes 6k-14k of X
  - server sends bytes 14k-30k of X
  - client acknowledges bytes 14k-30k of X
  - server sends bytes 30k-62k of X
  - client acknowledges bytes 30k-62k of X
  - server sends bytes 62k-83k of X
  - client acknowledges bytes 62k-83k of X
  - client has received X, which contains a reference to Y
  - therefore client requests Y
It's all about TCP congestion control here. There are dozens of algorithms used to handle it, but in pretty much all cases you want to have some kind of slow buildup in order to avoid completely swamping a slower connection and having all but the first few of your packets getting dropped.

notpushkin
1 replies
18h54m

client acknowledges bytes 0-2k of X

Doesn’t client see reference to Y at this point? Modern browsers start parsing HTML even before they receive the whole document.

Timwi
0 replies
13h35m

Not just modern. This was even more significant on slow connections, so they've kind of always done that. One could even argue that HTML, HTTP (specifically, chunked encoding) and gzip are all intentionally designed to enable this.

jsnell
0 replies
22h51m

Unless a resource is very small, it won't be transmitted in a single atomic unit. The sender will only send a part of it, wait the client to acknowledge having received them, and only then send more. That requires a network roundtrip. The larger the resource, the more network roundtrips will be required.

If you want to learn more, pretty much any resource on TCP should explain this stuff. Here's something I wrote years ago, the background section should be pretty applicable: https://www.snellman.net/blog/archive/2017-08-19-slow-ps4-do...

lxgr
5 replies
21h38m

That's certainly reasonable if you optimize only for loading time (and make certain assumptions about everybody's available data rate), but sometimes I really wish website (and more commonly app) authors wouldn't make that speed/data tradeoff so freely on my behalf, for me to find out after they've already pushed that extra data over my metered connection.

The tragedy here is that while some people, such as the author of TFA, go to great lengths to get from about 100 to 50 kB, others don't think twice to send me literally tens of megabytes of images, when I just want to know when a restaurant is open – on roaming data.

Resource awareness exists, but it's unfortunately very unevenly distributed.

k__
3 replies
20h52m

We need a MoSh-based browser with gopher support.

nox101
2 replies
17h29m

and you'd need AI to tell you whats in the pictures because lots of restaurant sites just have photos of their menu and some designer with no web knowledge put their phone number, address, and hours in an image designed in Photoshop

tacone
1 replies
11h40m

You're in for bad luck. Some time ago I tried some photos of pasta dishes with Gemini and it could not guess the recipe name.

mintplant
0 replies
21m

I've found Gemini to be pretty terrible at vision tasks compared to the competition. Try GPT-4o or Claude-3.5-Sonnet instead.

zamadatix
0 replies
4h31m

There is an interesting "Save-Data" header to let a site know which makes sense to optimize for on connection but it seems to be Chrome only so far https://caniuse.com/?search=save-data

I wish there was a bit of an opposite option - a "don't lazy/partially load anything" for those of us on fiber watching images pop up as we scroll past them in the page that's been open for a minute.

Retr0id
5 replies
23h38m

Why is there more latency?

Edit: Ah, I see OP's code requests the webp separately. You can avoid the extra request if you write a self-extracting html/webp polyglot file, as is typically done in the demoscene.

BugsJustFindMe
2 replies
23h35m

It takes more time for your message to get back and forth between your computer and the server than it takes for the server to pump out some extra bits.

Even if you transmit the js stuff inline, the op's notion of time still just ignores the fact that it takes the caller time to even ask the server for the data in the first place, and at such small sizes that time swallows the time to transmit from the user's perspective.

Retr0id
1 replies
23h19m

Here's a demo that only uses a single request for the whole page load: https://retr0.id/stuff/bee_movie.webp.html

It is technically 2 requests, but the second one is a cache hit, in my testing.

BugsJustFindMe
0 replies
23h16m

That's fine, but if you're evaluating the amount of time it takes to load a webpage, you cannot ignore the time it takes for the client request to reach your server in the first place or for the client to then unpack the data. The time saved transmitting such a small number of bits will be a fraction of the time spent making that initial request anyway. That's all I'm saying.

OP is only looking at transmit size differences, which is both not the same as transmit time differences and also not what the user actually experiences when requesting the page.

purplesyringa
1 replies
10h18m

Hmm? I'm not sure where you're taking that from. The webp is inlined.

Retr0id
0 replies
4h49m

Ah, so it is! I was skim-reading and stopped at `const result = await fetch("compressor/compressed.webp");`

sleepydog
2 replies
21h15m

Seriously, if you're saving less than a TCP receive window's worth of space it's not going to make any difference to latency.

I suppose it could make a difference on lossy networks, but I'm not sure.

lelandfe
1 replies
20h48m

If the blob contains requests (images, but also stylesheets, JS, or worst case fonts), it will actually instead be a net negative to latency. The browser's preload scanner begins fetching resources even before the HTML is finished being parsed. That can't happen if the HTML doesn't exist until after JS decodes it. In other words, the entire body has become a blocking resource.

These are similar conversations people have around hydration, by the by.

zahlman
1 replies
18h49m

Actually, I was rather wondering about that claim, because it seems accidentally cherry-picked. Regarding that post:

This code minifies to about 550 bytes. Together with the WebP itself, this amounts to 44 KiB. In comparison, gzip was 92 KiB, and Brotli would be 37 KiB.

But regarding the current one:

The actual savings here are moderate: the original is 88 KiB with gzip, and the WebP one is 83 KiB with gzip. In contrast, Brotli would provide 69 KiB. Better than nothing, though.

Most of the other examples don't show dramatic (like more than factor-of-2) differences between the compression methods either. In my own local testing (on Python wheel data, which should be mostly Python source code, thus text that's full of common identifiers and keywords) I find that XZ typically outperforms gzip by about 25%, while Brotli doesn't do any better than XZ.

lifthrasiir
0 replies
5h26m

XZ was never considered to become a compression algorithm built into web browsers to start with. Brotli decoder is already there for HTTP, so it has been proposed to include the full Brotli decoder and decoder API as it shouldn't take too much effort to add an encoder and expose them.

Also, XZ (or LZMA/LZMA2 in general) produces a smaller compressed data than Brotli with lots of free time, but is much slower than Brotli when targetting the same compression ratio. This is because LZMA/LZMA2 uses an adaptive range coder and multiple code distribution contexts, both highly contribute to the slowness when higher compression ratios are requested. Brotli only has the latter and its coding is just a bitwise Huffman coder.

oefrha
1 replies
17h25m

It’s not just decompression time. They need to download the whole thing before decompression, whereas the browser can decompress and render HTML as it’s streamed from the server. If the connection is interrupted you lose everything, instead of being able to read the part you’ve downloaded.

So, for any reasonable connection the difference doesn’t matter; for actually gruesomely slow/unreliable connections where 50KB matters this is markedly worse. While a fun experiment, please don’t do it on your site.

robocat
0 replies
6h2m

Other major issues that I had to contend with:

1: browsers choose when to download files and run JavaScript. It is not as easy as one might think to force JavaScript to run immediately as high priority (which it needs to be when it is on critical path to painting).

2: you lose certain browser optimisations where normally many things are done in parallel. Instead you are introducing delays into critical path and those delays might not be worth the "gain".

3: Browsers do great things to start requesting files in parallel as files are detected with HTML/CSS. Removing that feature can be a poor tradeoff.

There are a few other unobvious downsides. I would never deploy anything like that to a production site without serious engineering effort to measure the costs and benefits.

jgalt212
0 replies
23h32m

I have similar feelings on js minification especially if you're sending via gzip.

fsndz
0 replies
20h23m

exactly what I thought too

supriyo-biswas
14 replies
1d

This claim itself is probably a hoax and not relevant to the article at hand; but these days with text-to-image models and browser support, you could probably do something like <img prompt="..."> and have the browser render something that matches the description, similar to the "cookbook" analogy used in the Wikipedia article.

Lorin
7 replies
1d

That's an interesting concept, although it would generate a ton of bogomips since each client has to generate the image themselves instead of one time on the server.

You'd also want "seed" and "engine" attributes to ensure all visitors see the same result.

LeoPanthera
4 replies
1d

You could at least push the work closer to the edge, by having genAI servers on each LAN, and in each ISP, similar to the idea of a caching web proxy before HTTPS rendered them impossible.

lucianbr
3 replies
23h51m

Push the work closer to the edge, and multiply it by quite a lot. Generate each image many times. Why would we want this? Seems like the opposite of caching in a sense.

sroussey
1 replies
22h57m

If you are reading a web page on mars and bandwidth is more precious than processing power, then <img prompt=“…”> might make sense.

Not so much for us on earth however.

roywiggins
0 replies
22h33m

This sort of thing (but applied to video) is a plot point in A Fire Upon The Deep. Vinge's term for the result is an "evocation."

bobbylarrybobby
0 replies
21h5m

All compression is, in a sense, the opposite of caching. You have to do more work to get the data, but you save space.

onion2k
1 replies
21h41m

Unless you don't actually care if everyone sees the same results. So long as the generated image is approximately what you prompted for, and the content of the image is decorative so it doesn't really need to be a specific, accurate representation of something, it's fine to display a different picture for every user.

One of the best uses of responsive design I've ever seen was a site that looked completely different at different breakpoints - different theme, font, images, and content. It's was beautiful, and creative, and fun. Lots of users saw different things and had no idea other versions were there.

semolino
0 replies
20h27m

What site are you referring to?

everforward
2 replies
21h3m

Similar ideas have floated around for a while. I’ve always enjoyed the elegance of compressing things down to a start and end index of digits of Pi.

It’s utterly impractical, but fun to muse about how neat it would be if it weren’t.

LtWorf
1 replies
18h58m

AFAIK it's not been proved that every combination does exist in π.

By comparison, you could easily define a number that goes 0,123456789101112131415… and use indexes to that number. However the index would probably be larger than what you're trying to encode.

everforward
0 replies
10m

Huh, I presumed that any non-repeating irrational number would include all number sequences but I think you’re right. Even a sequence of 1 and 0 could be non-repetitive.

I am curious what the compression ratios would be. I suspect the opposite, but the numbers are at a scale where my mind falters so I wouldn’t say that with any confidence. Just 64 bits can get you roughly 10^20 digits into the number, and the “reach” grows exponentially with bits. I would expect that the smaller the file, the more common its sequence is.

7bit
1 replies
23h58m

That would require a lot of GBs in libraries in the browser and a lot of processing power on the client CPU to render an image that is so unimportant that it doesn't really matter if it shows exactly what the author intended. To summarize that in three words: a useless technology.

That idea is something that is only cool in theory.

lxgr
0 replies
21h3m

At least for LLMs, something very similar is already happening: https://huggingface.co/blog/Xenova/run-gemini-nano-in-your-b...

Currently, we're definitely not there in terms of space/time tradeoffs for images, but I could imagine at least parameterized ML-based upscaling (i.e. ship a low-resolution image and possibly a textual description, have a local model upscale it to display resolution) at some point.

ec109685
0 replies
22h42m

Similar to what Samsung does if you take a picture of the moon.

magicalhippo
2 replies
22h22m

Reminds me of when I was like 13 and learned about CRC codes for the first time. Infinite compression here we come! Just calculate the 32bit CRC code for say 64 bits, transmit the CRC, then on the other hand just loop over all possible 64 bit numbers until you got the same CRC. So brilliant! Why wasn't this already used?!

Of course, the downsides became apparent once the euphoria had faded.

lxgr
0 replies
21h2m

Very relatable! "If this MD5 hash uniquely identifies that entire movie, why would anyone need to ever send the actual... Oh, I get it."

Arguably the cruelest implication of the pigeonhole principle.

_factor
0 replies
22h10m

Even better, take a physical object and slice it precisely in a ratio that contains your data in the fraction!

lxgr
1 replies
21h17m

The Sloot Digital Coding System is an alleged data sharing technique that its inventor claimed could store a complete digital movie file in 8 kilobytes of data

8 kilobytes? Rookie numbers. I'll do it in 256 bytes, as long as you're fine with a somewhat limited selection of available digital movie files ;)

MBCook
0 replies
18h28m

I can shrink any file down to just 32 bits using my unique method.

I call it the High Amplitude Shrinkage Heuristic, or H.A.S.H.

It is also reversible, but only safely to the last encoded file due to quantum hyperspace entanglement of ionic bonds. H.A.S.H.ing a different file will disrupt them preventing recovery of the original data.

Spivak
1 replies
21h47m

The real version of this is Nvidia's web conferencing demo where they make a 3d model of your face and then only transfer the wireframe movements which is super clever.

https://m.youtube.com/watch?v=TQy3EU8BCmo

You can really feel the "compute has massively outpaced networking speed" where this kind of thing is actually practical. Maybe I'll see 10G residential in my lifetime.

kstrauser
0 replies
20h56m

This messages comes to you via a 10Gbps, $50 residential fiber connection.

The future is already here – it's just not very evenly distributed.

Groxx
0 replies
23h12m

Time Lords probably, saving us from the inevitable end of this technology path, where all data in the universe is compressed into one bit which leads to an information-theoretic black hole that destroys everything.

raggi
8 replies
23h27m

Chromies got in the way of it for a very long time, but zstd is now coming to the web too, as it’s finally landed in chrome - now we’ve gotta get safari onboard

mananaysiempre
6 replies
22h14m

I’d love to go all-Zstandard, but in this particular case, as far as I know, Brotli and Zstandard are basically on par at identical values of decompressor memory consumption.

simondotau
4 replies
19h40m

Realistically, everything should just support everything. There’s no reason why every (full featured) web server and every (full featured) web browser couldn’t support all compelling data compression algorithms.

Unfortunately we live in a world where Google decides to rip JPEG-XL support out of Chrome for seemingly no reason other than spite. If the reason was a lack of maturity in the underlying library, fine, but that wasn’t the reason they offered.

madeofpalk
2 replies
18h41m

There’s no reason

Of course, there is - and it's really boring. Prioritisation, and maintenance.

It's a big pain to add, say, 100 compressions formats and support them indefinitely, especially with little differentiation between them. Once we agree on what the upper bound of useless formats is, we can start to negotiate what the lower limit is.

simondotau
0 replies
6h8m

I qualified it with compelling which means only including formats/encodings which have demonstrably superior performance in some non-trivial respect.

And I qualified it with mature implementation because I agree that if there is no implementation which has a clear specification, is well written, actively maintained, and free of jank, then it ought not qualify.

Relative to the current status quo, I would only imagine the number of data compression, image compression, and media compression options to increase by a handful. Single digits. But the sooner we add them, the sooner they can become sufficiently widely deployed as to be useful.

raggi
0 replies
15h33m

It’s not prioritization of the code - it’s relatively little implementation and low maintenance but security is critical here and everyone is very afraid of compressors because gzip, jpeg and various others were pretty bad. Zstd, unlike lz4 before it (at least early on) has a good amount of tests and fuzzing. The implementation could probably be safer (with fairly substantial effort) but having test and fuzzing coverage is a huge step forward over prior industry norms

rafaelmn
0 replies
17h22m

How many CVEs came out of different file format handling across platforms ? Including shit in browsers has insane impact.

raggi
0 replies
15h27m

Brotli and zstd are close, each trading in various configurations and cases - but zstd is muuuuuch more usable in a practical sense, because the code base can be easily compiled and linked against in a wider number of places with less effort, and the cli tools turn up all over for similar reasons. Brotli like many Google libraries in the last decade are infected with googlisms. Brotli is less bad, in a way, for example the C comes with zero build system at all, which is marginally better than a bazel disaster of generated stuff, but it’s still not in a state the average distro contributor gives a care to go turn into a library, prepare pkgconf rules for and whatnot - plus no versioning and so on. Oh and the tests are currently failing.

lxgr
6 replies
21h30m

From the linked Github issue giving the rationale why Brotli is not available in the CompressionStream API:

As far as I know, browsers are only shipping the decompression dictionary. Brotli has a separate dictionary needed for compression, which would significantly increase the size of the browser.

How can the decompression dictionary be smaller than the compression one? Does the latter contain something like a space-time tradeoff in the form of precalculated most efficient representations of given input substrings or something similar?

zamadatix
2 replies
18h5m

Perhaps I'm reading past some of the surrounding context but that doesn't actually say the problem is about the relative sizes, just that the compression dictionary isn't already in browsers while the decompression dictionary already is.

It's a bit disappointing you can't use Brotli in the DecompressionStream() interface just because it may or may not be available in the CompressionStream() interface though.

lxgr
1 replies
15h10m

I'm actually not convinced that there are two different dictionaries at all. The Brotli RFC only talks about a static dictionary, not a separate encoding vs. a decoding dictionary.

My suspicion is that this is a confusion of the (runtime) sliding window, which limits maximum required memory on the decoder's side to 16 MB, with the actual shared static dictionary (which needs to be present in the decoder only, as far as I can tell; the encoder can use it, and if it does, it would be the same one the decoder has as well).

zamadatix
0 replies
14h12m

IIRC from working with Brotli before it's not that the it's truly a "different dictionary" but rather more like a "reverse view" of what is ultimately the same dictionary mappings.

On one hand it seems a bit silly to worry about ~100 KB in browser for what will probably, on average, save more than that in upload/download the first time it is used. On the other hand "it's just a few hundred KB" each release for a few hundred releases ends up being a lot of cruft you can't remove without breaking old stuff. On the third hand coming out of our head... it's not like Chrome has been against shipping more for functionality for features they'd like to impose on users even if users don't actually want them anyways so what are small ones users can actually benefit from against that.

lifthrasiir
2 replies
15h5m

I believe the compression dictionary refers to [1], which is used to quickly match dictionary-compressable byte sequences. I don't know where 170 KB comes from, but that hash alone does take 128 KiB and might be significant if it can't be easily recomputed. But I'm sure that it can be quickly computed on the loading time if the binary size is that important.

[1] https://github.com/google/brotli/blob/master/c/enc/dictionar...

lxgr
1 replies
14h58m

I was wondering that too, but that dictionary itself compresses down to 29 KB (using regular old gzip), so it seems pretty lightweight to include even if it were hard/annoying to precompute at runtime or install time.

lifthrasiir
0 replies
14h39m

Once installed it will occupy 128 KiB of data though, so it might be still relevant for the Chromium team.

butz
5 replies
21h45m

Dropping google fonts should improve page load time a bit too, considering those are loaded from remote server that requires additional handshake.

kevindamm
4 replies
21h37m

..but if enough other sites are also using that font then it may already be available locally.

pornel
2 replies
20h59m

This has stopped working many years ago. Every top-level domain now has its own private cache of all other domains.

You likely have dozens of copies of Google Fonts, each in a separate silo, with absolutely zero reuse between websites.

This is because a global cache use to work like a cookie, and has been used for tracking.

kevindamm
0 replies
20h32m

ah, I had forgotten about that, you're right.

well at least you don't have to download it more than once for the site, but first impressions matter yeah

niutech
4 replies
22h26m

This page is broken at least on Sailfish OS browser, there is a long empty space after the paragraph:

Alright, so we’re dealing with 92 KiB for gzip vs 37 + 71 KiB for Brotli. Umm…

That said, the overhead of gzip vs brotli HTML compression is nothing compared with amount of JS/images/video current websites use.

mediumsmart
2 replies
21h47m

same on orion and safari and librewolf - is this a chrome page?

zamadatix
0 replies
4h37m

Works fine on Safari and Firefox for me.

simmonmt
0 replies
20h59m

A different comment says librewolf disables webgl by default, breaking OP's decompression. Is that what you're seeing?

txtsd
0 replies
3h49m

Same on Mull

gkbrk
4 replies
23h53m

Why readPixels is not subject to anti-fingerprinting is beyond me. It does not sprinkle hardly visible typos all over the page, so that works for me.

keep the styling and the top of the page (about 8 KiB uncompressed) in the gzipped HTML and only compress the content below the viewport with WebP

Ah, that explains why the article suddenly cut off after a random sentence, with an empty page that follows. I'm using LibreWolf which disables WebGL, and I use Chromium for random web games that need WebGL. The article worked just fine with WebGL enabled, neat technique to be honest.

niutech
3 replies
19h22m

It isn't neat as long as it doesn't work with all modern web browsers (even with fingerprinting protection) and doesn't have a fallback for older browsers. WWW should be universally accessible and progressively enhanced, starting with plain HTML.

kjhcvkek77
1 replies
8h1m

This philosophy hands your content on a silver platter to ai companies, so they can rake in money while giving nothing back to the author.

latexr
0 replies
7h20m

I don’t support LLM companies stealing content and profiting from it without contributing back. But if you’re going to fight that by making things more difficult for humans, especially those with accessibility needs, then what even is the point of publishing anything?

afavour
0 replies
17h9m

It isn’t a serious proposal. It’s a creative hack that no one, author included is suggesting should be used in production.

lifthrasiir
3 replies
15h21m

It is actually possible to use Brotli directly in the web browser... with caveats of course. I believe my 2022 submission to JS1024 [1] is the first ever demonstration of this concept, and I also have a proof-of-concept code for the arbitrary compression (which sadly didn't work for the original size-coding purpose though). The main caveat is that you are effectively limited to the ASCII character, and that it is highly sensitive to the rendering stack for the obvious reason---it no longer seems to function in Firefox right now.

[1] https://js1024.fun/demos/2022/18/readme

[2] https://gist.github.com/lifthrasiir/1c7f9c5a421ad39c1af19a9c...

zamadatix
0 replies
4h40m

The key note for understanding the approach (without diving into how to understand it's actually wrangled):

The only possibility is to use the WOFF2 font file format which Brotli was originally designed for, but you need to make a whole font file to leverage this. This got more complicated recently by the fact that modern browsers sanitize font files, typically by the OpenType Sanitizer (OTS), as it is very insecure to put untrusted font files directly to the system. Therefore we need to make an WOFF2 file that is sane enough to be accepted by OTS _and_ has a desired byte sequence inside which can be somehow extracted. After lots of failed experiments, I settled on the glyph widths ("advance") which get encoded in a sequence of two-byte signed integers with almost no other restrictions.

Fantastic idea!

purplesyringa
0 replies
10h16m

This technique is amazing and way cooler than my post. Kudos!

lifthrasiir
0 replies
4h3m

Correction: It still works on Firefox, I just forgot that its zoom factor should be exactly 100% on Firefox to function. :-)

csjh
3 replies
23h14m

I think the most surprising part here is the gzipped-base64'd-compressed data almost entirely removes the base64 overhead.

zamadatix
2 replies
17h48m

It feels somewhat intuitive since (as the article notes) the Huffman encoding stage effectively "reverses" the original base64 overhead issue that an 8 bit (256 choices) index is used for 6 bits (64 choices) of actual characters. A useful compression algorithm which _didn't_ do this sort of thing would be very surprising as it would mean it doesn't notice simple patterns in the data but somehow compresses things anyways.

ranger_danger
1 replies
17h44m

how does it affect error correction though?

zamadatix
0 replies
16h40m

Neither base64 nor the gzipped version have error correction as implemented. The extra overhead bits in base64 come from selecting only a subset of printable characters, not by adding redundancy to the useful bits.

astrostl
3 replies
20h35m

Things I seek in an image format:

(1) compatibility

(2) features

WebP still seems far behind on (1) to me so I don't care about the rest. I hope it gets there, though, because folks like this seem pretty enthusiastic about (2).

mistrial9
1 replies
17h56m

agree - webp has a lib on Linux but somehow, standard image viewers just do not read it, so -> "FAIL"

sgbeal
0 replies
9h7m

webp has a lib on Linux but somehow, standard image viewers just do not read it

That may apply to old "LTS" Linuxes, but not any relatively recent one. Xviewer and gimp immediately come to mind as supporting it and i haven't had a graphics viewer on Linux _not_ be able to view webp in at least 3 or 4 years.

jfoster
0 replies
13h30m

The compatibility gap on WebP is already quite small. Every significant web browser now supports it. Almost all image tools & viewers do as well.

Lossy WebP comes out a lot smaller than JPEG. It's definitely worth taking the saving.

lucb1e
1 replies
20h9m

That page breaks my mouse gestures add-on! (Or, I guess we don't have add-ons anymore but rather something like script injections that we call extensions, yay...) Interesting approach to first deliver 'garbage' and then append a bit of JS to transform it back into a page. The inner security nerd in me wonders if this might open up attacks if you would have some kind of user-supplied data, such as a comment form. One could probably find a sequence of bytes to comment that will, after compression, turn into a script tag, positioned (running) before yours would?

Retr0id
0 replies
19h15m

Yeah that's plausible, you definitely don't want any kind of untrusted data in the input.

Something I wanted to do but clearly never got around to, was figuring out how to put an open-comment sequence (<!--) in a header somewhere, so that most of the garbage gets commented out

lifthrasiir
0 replies
15h13m

In my experience WebP didn't work well for the general case where this technique is actually useful (i.e. data less than 10 KB), because the most additions of WebP lossless to PNG are about modelling and not encoding, while this textual compression would only use the encoding part of WebP.

Dibby053
3 replies
22h15m

I didn't know canvas anti-fingerprinting was so rudimentary. I don't think it increases uniqueness (the noise is different every run) but bypassing it seems trivial: run the thing n times and take the mode. With so little noise, 4 or 5 times should be more than enough.

hedora
2 replies
21h51m

The article says it’s predictable within a given client, so your trick wouldn’t work.

So, just use the anti-fingerprint noise as a cookie, I guess?

Dibby053
1 replies
21h8m

Huh, it seems it's just my browser that resets the noise every run.

I opened the page in Firefox like the article suggests and I get a different pattern per site and session. That prevents using the noise as a supercookie, I think, if its pattern changes every time cookies are deleted.

sunaookami
0 replies
20h21m

It is different per site and per session, yes.

98469056
2 replies
23h58m

While peeking at the source, I noticed that the doctype declaration is missing a space. It currently reads <!doctypehtml>, but it should be <!doctype html>

BugsJustFindMe
0 replies
16h48m

Maybe their javascript adds it back in :)

rrrix1
1 replies
1h20m

I very much enjoyed reading this. Quite clever!

But...

Annoyingly, I host my blog on GitHub pages, which doesn’t support Brotli.

Is the glaringly obvious solution to this not as obvious as I think it is?

TFA went through a lot of round-about work to get (some) Brotli compression. Very impressive Yak Shave!

If you're married to the idea of a Git-based automatically published web site, you could at least replicate your code and site to Gitlab Pages, which has supported precompressed Brotli since 2019. Or use one of Cloudflare's free tier services. There's a variety of ways to solve this problem before the first byte is sent to the client.

Far too much of the world's source code already depends exclusively on Github. I find it distasteful to also have the small web do the same while blindly accepting an inferior experience and worse technology.

purplesyringa
0 replies
1h11m

The solution's obvious, but if I followed it, I wouldn't have a fun topic to discuss or fake internet points to boast about, huh? :)

I'll probably switch to Cloudflare Pages someday when I have time to do that.

ranger_danger
1 replies
17h45m

ensure it works without JavaScript enabled

manually decompress it in JavaScript

Brotli decompressor in WASM

the irony seems lost

purplesyringa
0 replies
10h9m

A nojs version is compiled separately and is linked via a meta refresh. Not ideal, and I could add some safeguards for people without webgl, but I'm not an idiot.

galaxyLogic
1 replies
23h2m

Is there a tool or some other way to easily encode a JPG image so it can be embedded into HTML? I know there is something like that, but is it easy? Could it be made easier?

throwanem
0 replies
22h53m

You can convert it to base64 and inline it anywhere an image URL is accepted, eg

    <img src="data:image/jpeg;base64,abc123..." />
(Double-check the exact syntax and the MIME type before you use it; it's been a few years since I have, and this example is from perhaps imperfect memory.)

butz
1 replies
21h41m

I wonder what is the difference in CPU usage on client side for WebP variant vs standard HTML? Are you causing more battery drain on visitor devices?

lucb1e
0 replies
20h0m

It depends. Quite often (this is how you can tell I live in Germany) mobile data switches to "you're at the EDGE of the network range" mode¹ and transferring a few KB means keeping the screen on and radio active for a couple of minutes. If the page is now 45KB instead of 95KB, that's a significant reduction in battery drain!

Under normal circumstances you're probably very right

¹ Now I wonder if the makers foresaw how their protocol name might sound to us now

Pesthuf
1 replies
19h27m

It's impressive how close this is to Brotli even though brotli has this massive pre-shared dictionary. Is the actual compression algorithm used by it just worse, or does the dictionary just matter much less than I think?

lifthrasiir
0 replies
15h3m

Pre-shared dictionary is most effective for the small size that can't reach the stationary distribution required for typical compressors. I don't know the exact threshold, but my best guess is around 1--10 KB.

Jamie9912
1 replies
11h27m

Why don't they make zstd images surely that would beat webp

sgbeal
0 replies
9h3m

Why don't they make zstd images surely that would beat webp

zstd is a general-purpose compressor. By and large (and i'm unaware of any exceptions), specialized/format-specific compression (like png, wepb, etc.) will compress better than a general-purpose compressor because format-specific compressors can take advantage of quirks of the format which a general-purpose solution cannot. Also, format-specific ones are often lossy (or conditionally so), enabling them to trade lower fidelity for better compression, something a general-purpose compressor cannot do.

purplesyringa
0 replies
1h13m

It's unfortunately Chromium-only for now, and I wanted to keep code simple. I've got a PoC lying around with VideoFrame and whatnot, but I thought this would be better for a post.

toddmorey
0 replies
21h56m

"I hope you see where I’m going with this and are yelling 'Oh why the fuck' right now."

I love reading blogpost like these.

somishere
0 replies
20h25m

Lots of nice tricks in here, definitely fun! Only minor nitpick is that it departs fairly rapidly from the lede ... which espouses the dual virtues of an accessible and js-optional reading experience ;)

phh
0 replies
9h51m

A real-world web page compressed with WebP? Oh, how about the one you’re reading right now? Unless you use an old browser or have JavaScript turned off, WebP compresses this page starting from the “Fool me twice” section. If you haven’t noticed this, I’m happy the trick is working :-)

Well it didn't work in Materialistic (I guess their webview disable js), and the failure mode is really not comfortable.

niceguy4
0 replies
1d

Not to side track the conversation but to side track the conversation, has there been many other major WebP exploits like the serious one in the past?

michaelbrave
0 replies
2h17m

I personally don't much care for the format, if I save an image and it ends up WebP then I have to convert it before I can edit or use it in any meaningful way since it's not supported in anything other than web browsers. It's just giving me extra steps to have to do.

kopirgan
0 replies
20h18m

19 year old and look at the list of stuff she's done! Perhaps started coding in the womb?! Amazing.

kardos
0 replies
3h1m

On the fingerprinting noise: this sounds like a job for FEC [1]. It would increase the size but allow using the Canvas API. I don't know if this would solve the flicker though (not a front end expert here)

Also, it's a long shot, but could the combo of FEC (+size) and lossy compression (-size) be a net win?

[1] https://en.m.wikipedia.org/wiki/Error_correction_code

jfoster
0 replies
14h2m

I work on Batch Compress (https://batchcompress.com/en) and recently added WebP support, then made it the default soon after.

As far as I know, it was already making the smallest JPEGs out of any of the web compression tools, but WebP was coming out only ~50% of the size of the JPEGs. It was an easy decision to make WebP the default not too long after adding support for it.

Quite a lot of people use the site, so I was anticipating some complaints after making WebP the default, but it's been about a month and so far there has been only one complaint/enquiry about WebP. It seems that almost all tools & browsers now support WebP. I've only encountered one website recently where uploading a WebP image wasn't handled correctly and blocked the next step. Almost everything supports it well these days.

divbzero
0 replies
16h1m

Typically, Brotli is better than gzip, and gzip is better than nothing. gzip is so cheap everyone enables it by default, but Brotli is way slower.

Note that way slower applies to speed of compression, not decompression. So Brotli is a good bet if you can precompress.

Annoyingly, I host my blog on GitHub pages, which doesn’t support Brotli.

If your users all use modern browsers and you host static pages through a service like Cloudflare or CloudFront that supports custom HTTP headers, you can implement your own Brotli support by precompressing the static files with Brotli and adding a Content-Encoding: br HTTP header. This is kind of cheating because you are ignoring proper content negotiation with Accept-Encoding, but I’ve done it successfully for sites with targeted user bases.

cobbal
0 replies
3h40m

I would love to try reading the lossy version.

bogzz
0 replies
20h24m

Still waiting on a webp encoder to be added to the Go stdlib...

bawolff
0 replies
21h38m

They did all this and didn't even measure time to first paint?

What is the point of doing this sort of thing if you dont even test how much faster or slower it made the page to load?

ajsnigrutin
0 replies
17h46m

So... 60kB less of transfer... and how much slower is it on an eg. galaxy s8 phone my mom has, because of all the shenanigans done to save those 60kB?

TacticalCoder
0 replies
22h54m

I loved that (encoding stuff in webp) but my takeaway from the figures in the article is this: brotli is so good I'll host from somewhere where I can serve brotli (when and if the client supports brotli ofc).