That post just gets better all the way through.
I can't believe I never realized the frequency domain can be used for image compression. It's so obvious after seeing it. Is that how most image compression algorithms work? Just wipe out the quieter parts of the frequency domain?
Images are not truly bandlimited, which means they can't be perfectly represented in the frequency domain, so instead there's a compromise where smaller blocks of them are encoded with a mix of frequency domain and spatial domain predictors. But that's the biggest part of it, yes.
Most of the problem is sharp edges. These take an infinite number of frequencies to represent (= Nyquist theorem), so leaving some out gets you blurriness or ringing artifacts.
The other reason is that bandlimited signals infinitely repeat, but realistic images don't - whatever's on the left side of a photo doesn't necessarily predict anything about whatever's on the right side.
How are images not bandlimited? They don't get brighter than 255, 255, 255 or darker than 0,0,0
Bandlimited means limited in the frequency domain, not the spatial domain.
(Also, video is actually worse - only 16-235. Good thing there's HDR now.)
A real image not, but a digital image built up from pixels certainly is band limited. A sharp edge will require contributions from components across the whole spectrum that can be supported on a matrix the size of the image, the highest of which is actually called the Nyquist frequency
Not quite. You can tell this isn't true because there are many common images (game graphics, text, pixel art) where upscaling them with a sinc filter obviously produces a visually "wrong" image (blurry or ringing etc), whereas you can reconstruct them at a higher resolution "as intended" with something nonlinear (nearest neighbor interpolation, OCR, emulator filters like scale2x). That means the image contains information that doesn't work like a bandlimited signal does.
You could say MIDI is sort of like that for audio but it's used a lot less often.
I thought the image transform was conceptually done on a grid of infinitely-repeating copies of the image in the spatial domain?
Yep, this is how both MP3 (and Ogg-Vorbis) and JPEG all work. Picking the weights for which frequencies to keep is, presumably, chosen based on some psychoacoustic model but the coarse description is literally throwing away high order frequency information.
Does audio encoding use a similar method of using matrices to pick which frequencies get thrown away? Some video encoders allow you to change the matrices so you can tweak them based on content.
Audio is one dimensional, so it doesn't use matrices but just arrays (called subbands).
And you can't get too hard into psychoacoustic coding, because people will play compressed audio through all kinds of speakers or EQs that will unhide everything you tried to hide with the psychoacoustics. But yes, it's similar.
(IIRC, the #1 mp3 encoder LAME was mostly tuned by listening to it on laptop speakers.)
I know one mix studio that has a large selection of monitors to listen to a mix through ranging from the highest of high end studio monitors, mid-level monitors, home bookshelf speakers, and even a collection of headphones and earbuds. So when you say "check it on whatever you have available", you have to be a bit more specific with this guy's setup
There is more to it. Often the idea isn't just that you throw away frequencies, but also that data with less variance is possible to encode more efficiently. And it's not just that high frequency info is noise, it also tends to be smaller magnitude.
DCT is also often used as a substep in more complex image (or video) compression algorithms. That is, first identify some sub-area of the image with a lot of detail, then apply DCT to that sub-area and keep more of the spectrum, then do the same for other areas and keep more or less of the spectrum. This is where the quantization parameters that you have seen for video compression algorithms affect the behavior.
I remember seeing some video where they did a FT of an audio sample and then just used mspaint to remove some frequency component and transformed back to the audio / time domain.
Something along those lines anyway.
You don’t generally completely wipe the high frequencies, just encode it with less bits.