Very cool! How are the balloons transferring telemetry back to earth for analysis, etc?
Asking because my research at the University of Oxford was around hyper space-efficient data transfer from remote locations for a fraction of the price.
The result was an award-winning technology (https://jsonbinpack.sourcemeta.com) to serialise plain JSON that was proven to be more space-efficient than every tested alternative (including Protocol Buffers, Apache Avro, ASN.1, etc) in every tested case (https://arxiv.org/abs/2211.12799).
If it's interesting, I'd love to connect and discuss (jv@jviotti.com) how at least the open-source offering could help.
Sounds cool. How does it differ from CBOR?
CBOR is a schema-less binary format. JSON BinPack supports both schema-less (like CBOR) and schema-driven (with JSON Schema) modes. Even on the schema-less mode, JSON BinPack is more space-efficient than CBOR. See https://benchmark.sourcemeta.com for a live benchmark and https://arxiv.org/abs/2211.12799 for a more detailed academic benchmark
Thanks for linking the benchmarks. I appreciate the work on shaving additional bytes especially in cases where every byte matters. Real savings seem to be in the schema-driven mode. Comparing a "realistic", schemaless payload for a general storage use-case (eg. the config examples), it looks pretty even with CBOR. E: my bad, BinPack is getting more efficient with larger payloads https://benchmark.sourcemeta.com/#jsonresume
As a note, while cbor is schemaless, there do exist tools to make it work with schemas. In rust cborium will generate rust types from a json schema that serde can use.
I never used cborium, but if I'm understanding it correctly, I think it adds types at the language deserialisation stage and not over the wire. Which means that it makes it a lot more ergonomic to use within Rust, but doesn't use typings for space-efficiency over the wire.
Exactly! The real hardcore savings will always be when you pass a schema, as JSON BinPack uses that to derive smarter encoding rules.
However, schema-less is very useful too. The idea with supporting both is that clients can start with schema-less, without bothering with schemas, and already get some space-efficiency. But once they are using JSON BinPack, they can start incrementally adding schema information on the messages they care about, etc to start squeezing out more performance.
Compare that with i.e. Protocol Buffers, which pretty much forces you to use schemas from the beginning and it can be a bit of a barrier for some projects, mainly at the beginning.
i thought this was an odd sales pitch from the jsonbinpack site, given that a central use-case is IoT, which frequently runs on batteries or power-constrained environments where there's no such thing as "essentially free"
I would imagine that CPUs are much more efficient than a satellite transmitter, probably? I guess you'd have to balance the additional computational energy required vs. the savings in energy from less transmitting.
Yeah, it all depends very much, given how huge the "embedded/IoT" spectrum is. Each use case has its own unique constraints, which makes it very hard to give general advice.
Fair point! "Embedded" and "IoT" are overloaded terms. For example, you find "IoT" devices all the way from extremely low powered micro-controllers to Linux-based ones with plenty of power and they are all considered "embedded". I'll take notes to improve the wording.
That said, the production-ready implementation of JSON BinPack is designed to run on low powered devices and still provide those same benefits.
A lot of the current work is happening at https://github.com/sourcemeta/jsontoolkit, a dependency of JSON BinPack that implements a state-of-the-art JSON Schema compiler (I'm a TSC member of JSON Schema btw) to do fast and efficient schema evaluation within JSON BinPack on low powered devices compared to the current prototype (which requires schema evaluation for resolving logical schema operators). Just an example of the complex runtime-efficiency tracks we are pursuing.
For sure, but radio transmitter time is almost always much more expensive than CPU time! It’s 4mA-20mA vs 180mA on an esp32; having the radio on is a 160mA load! As long as every seven milliseconds compressing saves a millisecond of transmission, your compression algorithm comes out ahead.
This looks promising! One of the important aspects of protocol buffers, avro etc is how they deal with evolving schemas and backwards/forward compatibility. I don't see anything in the docs addressing that. Is it possible for old services to handle new payloads / new services to handle old payloads or do senders and receivers need to be rewritten each time the schema changes?
A lot of people already think about this problem with respect to API compatibility for REST services using the OpenAPI spec for example. It's possible to have a JSON Schema which is backwards compatible with previous versions. I'm not sure how backwards-compatible the resulting JSON BinPack schemas are however.
Great seeing you over here Michael :) For other people reading this thread, Michael and I are collaborating on a paper covering the schema compiler I've been working on for JSON BinPack. Funny coincidence!
Good question! Compared to Protocol Buffers and Apache Avro, that each have their own specialised schema languages created by them, for them, JSON BinPack taps into the popular and industry-standard JSON Schema language.
That means that you can use any tooling/approach from the wide JSON Schema ecosystem to manage schema evolution. A popular one from the decentralised systems world is Cambria (https://www.inkandswitch.com/cambria/).
That said, I do recognise that schema evolution tech in the JSON Schema world is not as great as it should be. I'm a TSC member of JSON Schema and a few of us are definitely thinking hard on this problem too and trying to make it even better that the competition.
Do you have any info on how your system stacks up to msgpack? (https://msgpack.org/index.html)
Asking because we use msgpack in production at work and it can sometimes be a bit slower to encode/decode than is ideal when dealing with real-time data.
We do! See https://benchmark.sourcemeta.com for a live benchmark and https://arxiv.org/abs/2211.12799 for a more detailed academic benchmark.
The TLDR is that is that if you use JSON BinPack on schema-less mode, its still more space-efficient than MessagePack but not by a huge margin (depends on the type of data of course). But if you start passing a JSON Schema along with your data, the results become way smaller.
Please reach out to jv@jviotti.com. I would love to discuss your use case more.
From the OP:
That's the hardware. I meant on the software side through the transceiver. If you transfer less bits through the satellite transceiver, I believe you can probably reduce costs.
Let's definitely talk, we're using protobufs right now. I'll send an email
Why this over a compact, data-specific format? JSON feels like an unnecessary limitation for this company's use case. I am having a hard time believing it is more space-efficient than a purpose-built format.
It surprised me how popular this message got. I love nerding out about binary serialization and space-efficiency and great to see I'm not the only one :)
If you want to get deeper, I published two (publicly available) deep papers studying the current state of JSON-compatible binary serialization that you might enjoy. They study in a lot of detail technologies like Protocol Buffers, CBOR, MessagePack, and others that were mentioned in the thread:
- https://arxiv.org/abs/2201.02089
- https://arxiv.org/abs/2201.03051
Hope they are useful!