The issue with long runs of 0x00 is related to "clock recovery".
Some digital data streams, especially high-speed serial data streams (such as the raw stream of data from the magnetic head of a disk drive and serial communication networks such as Ethernet) are sent without an accompanying clock signal. The receiver generates a clock from an approximate frequency reference, and then phase-aligns the clock to the transitions in the data stream with a phase-locked loop (PLL).
In order for this scheme to work, a data stream must transition frequently enough to correct for any drift in the PLL's oscillator. The limit for how long a clock-recovery unit can operate without a transition is known as its maximum consecutive identical digits (CID) specification.
I love how we used to use a bunch of very clever "code book" systems like 8b/10b which did a lot of careful work with small runs of bits to ensure the clock was recoverable and to avoid line capacitance issues.
Then we just moved to things like 64/66b which takes a giant chunk of bits, adds a short header to guarantee a clock transition, then runs everything through a pseudorandom scrambler.
A sprinkle of entropy improves everything
They do a similar thing for GPS data recovery. It is so far below the noise floor that normally speaking the signal is not recoverable. But then you inject some (known) noise and suddenly the noise modulated by the (unknown) signal starts to drown out the noise in the rest of the system and that in turn allows you to recover bits from the signal.
https://ciechanow.ski/gps/
That's a great article, you should post it separately.
https://news.ycombinator.com/from?site=ciechanow.ski ;)
This thread brings back fond memories from electronics and digital signal processing.
It's not only below the noise floor, but all satellites transmit the code on the same frequencies, so it's several signals all at once below the noise floor. Which makes the known noise unique to each satellite, and you dredge out the same frequency repeatedly with different sets of known noise to recover multiple signals.
Gold sequences are really neat, which is precisely the same pseudorandom scrambling technique, but where each sequence is selected to have low correlation with all other sequences in use, which is what enables the frequency sharing property of the system.
QR codes do this too, as explained by yesterday’s front page post about decoding QR codes by hand.
The PCIe standards kept moving to longer codes, and nowadays they're able to do "1b/1b" (no header at all)
Not really accurate. The switch from NRZ to PAM4 actually massively increased the bit error rate. They switched away from the 8b/10b style line code and replaced it with forwards error correction.
PCIe 6.0 uses 256 Byte frames, with 242 Bytes of data, 8 Bytes or CRC and 3 Bytes of error correction.
So it actually has way more overhead than the older versions and their 128b/130b line coding, It's just at a slightly different layer.
Against adversaries, scrambling schemes can produce some very perplexing behaviour. See the "weak sectors" used for CD copy protection for one infamous example.
This is probably irrelevant. The audio is analog (requires DAC to decode into bitstream) and is transmitted with some fixed frequency (44.1kHz or 48kHz), without any particular synchronization.
Nope. If you're trying to recover bits from a signal which is either high or low, and the signal stays high for a long time, unless your clocks are perfect (they won't be) you won't be able to tell just how many bits were in that long period of "high"s.
Another concern is if the waveform is AC coupled somewhere and DC can't pass.
Even if you have a perfectly synchronized clock, all-or-mostly-1's will be the same as all-or-mostly-0's in the long run.
From this page I found a very interesting article: Wireless Set Number 10
https://en.wikipedia.org/wiki/Wireless_Set_Number_10