Lots of confusion on what JSONB is.
To your application, using JSONB looks very similar to the JSON datatype. You still read and write JSON strings—Your application will never see the raw JSONB content. The same SQL functions are available, with a different prefix (jsonb_). Very little changes from the application's view.
The difference is that the JSON datatype is stored to disk as JSON, whereas the JSONB is stored in a special binary format. With the JSON datatype, the JSON must be parsed in full to perform any operation against the column. With the JSONB datatype, operations can be performed directly against the on-disk format, skipping the parsing step entirely.
If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick. If you're querying or manipulating the data using SQL, JSONB will be the best pick.
Is there any downside to storing JSON-B even if you’re not planning to query it? For example, size on disk, read/write performance?
If the order of items in the JSON blob matters then JSONB probably wouldn't preserve the order.
JSON is unordered. Nothing in your code should assume otherwise.
"An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array."
You make it sound like it's one of the laws of physics.
ON part of JSON doesn't know about objects. When serialized, object entries are just an array of key-value pairs with a weird syntax and a well-defined order. That's true for any serialization format actually.
It's the JS part of JSON that imposes non-duplicate keys with undefined order constraint.
You are the engineer, you can decide how you use your tools depending on your use case. Unless eg. you need interop with the rest of the world, it's your JSON, (mis)treat it to your heart's content.
The text I quoted is from the RFC. json.org and ECMA-404 both agree. You are welcome to do whatever you want, but then it isn't JSON anymore.
It is very much still JSON, and your code can very much assume keys are ordered if your JSON tools respect it. ECMA-404 agrees:
If you work in environments that respect JSON key order (like browser and I think also Python) then unordered behavior of JSONB would be the exception not the rule.
That does not follow.
JSON formatted data remains JSON no matter if you use it incorrectly.
This happens all the time in the real world - applications unknowingly rely on undefined (but generally true) behavior. If you e.g. need to integrate with a legacy application where you're not completely sure how it handles JSON, then it's likely better to use plain text JSON.
You may take the risk as well, but then good luck explaining that those devs 10 years out of the company are responsible for the breakage happening after you've converted JSON to JSONB.
In some cases, insisting on ignoring the key order is too expensive luxury, since it basically forces you to parse the whole document first and only then process it. In case you have huge documents, you have to stream-read and this often implies relying on a particular key order (which isn't a problem since those same huge documents will likely be stream-written with a particular order too).
Actually since ES2015 the iteration order of object properties is fully defined: first integer keys, then string keys in insertion order, finally symbols in insertion order. (Of course, symbols cannot be represented in JSON.)
And duplicate properties in JSON are guaranteed to be treated the same way they are in object literals: a duplicate overwrites the previous value but doesn't change the order of the key.
Concretely that means if you write:
This is guaranteed to evaluate to: (Note that '1' was moved to front, and 'a' comes before 'b' even though the associated value comes from the final entry in the JSON code.)Python made a similar change in version 3.6 (officially since 3.7), both with regards to insertion order and later values overwriting earlier ones while preserving order. I think the only difference at this point is that Python doesn't move integer-like keys to the front, because unlike JavaScript, Python properly distinguishes between different key types.
That’s exactly the kind of difference between json and jsonb that you gotta keep in mind. Object properties are unordered, but a json string is very much ordered. It’s the same sequence of characters and lines each time, unless you parse it and dump it again. So if you want to preserve an unmodified original json string for some (e.g. cosmetic) reasons, you probably want json.
I would expect you are threading on dangerous grounds to assume a type called JSON is going to preserve the data byte for byte. It might currently but I doubt that is in the API contract. You really want to use TEXT if that is your requirement
I don't know what SQLite does, but in JS the order is actually defined. JS is not Java, object is not HashMap, if anything, it's closer to LinkedHashMap, but even that is not correct because there are numeric slots which always go first. https://tc39.es/ecma262/#sec-ordinaryownpropertykeys
> but in JS the order is actually defined
Defined yes, but still arbitrary and may not be consistent between JSON values of the same schema, as per the document you linked to:
> in ascending chronological order of property creation
Also, while JSON came from JS sort-of, it is, for better or worse (better than XML!) a standard apart from JS with its own definitions and used in many other contexts. JSON as specified does not have a prescribed order for properties, so it is not safe to assume one. JS may generally impose a particular order, but other things may not when [re]creating a JSON string from their internal format (JSONB in this case), so by assuming a particular order will be preserved you would be relying on a behaviour that is undefined in the context of JSON.
Yes, I know, the point was not to say that it's safe to depend on this universally, but rather why it's safe in JS and why it's not elsewhere -> other languages use hash maps simply because authors were either lazy or unaware of the original behaviour. (which sucks, in my opinion, but nobody can fix it now)
Assume you're writing an editor for JSON files. Don't think many users of that editor would be very happy if you change the order of the attributes in their json files, even though technically it's the same...
According to the JSON spec it's also legal to round all numbers to zero.
If you’re writing an editor, you will need to separate in progress text that user is editing from the parsed and validated data, and keep both.
I don't disagree, but people might still assume it. If you serialise a Map in Java, some Map implementations will maintain insertion order for example.
I think this is why JS and Python chose to make key ordering defined - the most popular implementations maintained insertion order anyway, so it was inevitable people would end up writing code relying on it
The same goes for maps in Go, which now explicitly randomizes map iteration with the range keyword to prevent developers from relying on a particular ordering. Neat trick.
That might get them to rely on the randomization, though :)
There’s processing to be done with JSONB on every read/write, which is wasted if you’re always reading/writing the full blob.
Which occurs with JSON as well (SQLite doesn't have a dedicated JSON nor JSONB type). The only actual cost would be the conversion between JSONB and JSON.
No, SQLite's "JSON" is just TEXT, there's no overhead with reading/writing a string.
That's what I said I think? "JSON" is a TEXT that is handled as a JSON string by `json_*` functions, while "JSONB" is a BLOB that is handled as an internal format by `jsonb_*` functions. You generally don't want JSONB in the application side though, so you do need a conversion for that.
Yes, but when you use the BLOB with jsonb functions, your application demands go from:
Read JSON from TEXT column.
Parse JSON into Internal Binary Format.
Run json_*() function on this format, which will Serialize Internal Binary Format to JSON as output.
To:
Read JSONB from BLOB column.
Run json_*() function on Internal Binary Format, which will serialize the Internal Binary Format to JSON as output.
Because:
The json_* and jsonb_* all accept _either_ JSON or JSONB as their input. The difference is jsonb_* functions also produces it as output. So even in the above case, if your function output is just being used to feed back into another table as a BLOB, then you can use the jsonb_* version of the function and skip the serialization step entirely.
Oh, you are right! I later looked at the draft documentation and realized that `json(b)_` only determines the output type. That said, it is still true that you need one more function call `json(...)` to retrieve a textual form of JSON which you do often need instead of JSONB.
I don’t think that’s always correct (it is within this thread, which stated (way up) “if you’re always reading/writing the full blob”, but I think this discussion is more general by now). Firstly, it need not convert the full jsonb blob (example: jsonb_extract(field,'$.a.b.c.d.e'))
Secondly, it need not convert to text at all, for example when calling a jsonb_* function that returns jsonb.
(Technically, whether any implementation is that smart is an implementation detail, but the article claims this is
“a rewrite of the SQLite JSON functions that, depending on usage patterns, could be several times faster than the original JSON functions”
so it can’t be completely dumb.
There’s no point using json_* functions if you’re always reading/writing the full blob.
Of course (and I never said that), but if you need to store a JSON value in the DB and use a JSONB-encoded BLOB as an optimization, you eventually read it back to a textual JSON. It's just like having a UNIX timestamp in your DB and converting back to a parsed date and time for application uses, except that applications may handle a UNIX timestamp directly and can't handle JSONB at all.
It doesnt validate that its valid json?
Not specifically JSONB, but I do recall that with MongoDB's equivalent, BSON, the sizes of the binary equivalent tend to be larger in practice, I would expect JSONB to have a similar trade off.
There'll also be a conversion cost if you ultimately want it back in JSON form.
From the post:
According to the linked announcement the data size is 5-10% smaller. So you are exchanging a small processing cost for smaller storage size. It will depend on your application but in many cases smaller disk reads and writes as well as more efficient cache utilization will make storing as JSONB a better choice even if you never manipulate it in SQL. Although the difference is likely small.
I haven’t delved into it, but you likely lose the ability to round-trip your json strings, just as in postgreSQL’s jsonb type.
likely- will remove/change white space in ‘foo’,
- does not guarantee to keep field order
- will fail if a dictionary in the ‘foo’ string has duplicate fields
- will fail if ‘foo’ doesn’t contain a valid json string
I don't know if it's true of SQLite, but you missed the most important point at least with PostgreSQL : you can build indexes directly against attributes inside JSONB which really turns it into a true NoSQL / relational hybrid that can let you have your cake and eat it too for some design problems that don't fit neatly into either pure relational or pure NoSQL approaches.
What would that SQL look like in practice ?
Here's an example i have running on a pi with a temperature sensor. All data is received as json messages in mqtt, then stored as is in a postgres table. The view turns it into relational data.
If i just need the messages between certain temperatures i can speed this up by adding an index on the 'temperature' field in json.
You can do this with text JSON in SQLite as well, but JSONB could speed it up.
Indeed. Those who use Postgres are already familiar with the difference.
Rule of thumb: if you are not sure, or do not have time for the nuances, just use JSONB.
There’s a huge nuance worth mentioning: with JSONB you lose key ordering in objects.
This may be desired - it makes two effectively-equal objects have the same representation at rest!
But if you are storing human-written JSON - say, configs from an internal interface, where one might collocate a “__foo_comments” key above “foo” - their layout will be lost, and this may lead to someone visually seeing their changes scrambled on save.
JSON processors are not required nor expected to retain object key ordering from either input or object construction order.
Slight caveat. It seems that you should default to JSONB for all cases except for values directly returned in the query result. As IIUC you will get a JSONB blob back and will be responsible for parsing it on the client.
So use it for writes:
Also use it for filters if applicable (although this seems like a niche use case, I can't actually think of an example).But if returning values you probably want to use `json_*` functions
Otherwise you will end up receiving the binary format.There is no json data type ! If you are just storing json blobs, then the BLOB or TEXT data types will be the best picks.
Why not? I feel like a database should allow for a JSON data type that stores only valid JSON or throws an exception. It would also be nice to be able to access subfields of the JSON.
SELECT userid, username, some_json_field["some_subfield"][0] FROM users where ...
Not sure where to give feature suggestions so I'm just leaving this here for future devs to find
SQLite has a surprisingly small number of types: null, integer, real, text, and blob. That’s it.
https://www.sqlite.org/datatype3.html
In context, what is clearly meant is, "If you're just reading and writing full JSON blobs, use `insert ... ($id,$json)`; if you're primarily querying the data, use `insert ... ($id, jsonb($json))`, which will convert the text into JSONB before storing it.
If my application never sees a difference to normal JSON, everything is compatible and all there is are perf improvements, why is there a new set of functions to interact with it (jsonb_*)?
It seems that the JSON type is even able to contain JSONB. So why even use these functions, if the normal ones don't care?
The difference is the storage format, which is important for a SQL database since you define it in the schema. Also the performance characteristics are different.
As someone mentioned below, the order of keys is undefined in JSON spec, but applications may rely on it anyway, and thus conversion to JSONB may lead to breakage.
There are some other minor advantages of having exact representation of the original - e.g. hashing, signatures, equality comparison is much simpler on the JSON string (you need a strict key order, which is again undefined by the spec, but happens in the real world anyway).
TFA doesn't say that this is _the_ JSONB of PostgreSQL fame, but I assume it must be. JSONB is a brilliant binary JSON format. What makes it brilliant is that arrays and objects (which when serialized are a sort of array) are encoded by having N-1 lengths of values then 1 offset to the Nth value, then N-1 lengths of values then... The idea is that lengths are going to be similar, while offsets never are, so using lengths makes JSONB more compressible, but offsets are needed to enable random access, so if you store mostly lengths and sometimes offsets you can get O(1)ish random access while still retaining compressibility.
Originally JSONB used only offsets, but that made it incompressible. That held up a PostgreSQL release for some time while they figured out what to do about it.
Link? My reading last time I looked at this was that the sqlite and postgres “jsonb” were different.
That's not exactly right, as the jsonb_* functions return JSONB if you choose to use them.
I don't know how the driver is written, but this is misleading if sqlite provides an api to read the jsonb data with a in-memory copy, an app can surely benefit from skipping the json string parsing.