return to table of content

JSONB has landed

luhn
53 replies
11h43m

Lots of confusion on what JSONB is.

To your application, using JSONB looks very similar to the JSON datatype. You still read and write JSON strings—Your application will never see the raw JSONB content. The same SQL functions are available, with a different prefix (jsonb_). Very little changes from the application's view.

The difference is that the JSON datatype is stored to disk as JSON, whereas the JSONB is stored in a special binary format. With the JSON datatype, the JSON must be parsed in full to perform any operation against the column. With the JSONB datatype, operations can be performed directly against the on-disk format, skipping the parsing step entirely.

If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick. If you're querying or manipulating the data using SQL, JSONB will be the best pick.

blowski
33 replies
11h17m

Is there any downside to storing JSON-B even if you’re not planning to query it? For example, size on disk, read/write performance?

jamesfinlayson
18 replies
11h4m

If the order of items in the JSON blob matters then JSONB probably wouldn't preserve the order.

mike_d
17 replies
11h2m

JSON is unordered. Nothing in your code should assume otherwise.

"An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array."

plq
4 replies
9h51m

You make it sound like it's one of the laws of physics.

ON part of JSON doesn't know about objects. When serialized, object entries are just an array of key-value pairs with a weird syntax and a well-defined order. That's true for any serialization format actually.

It's the JS part of JSON that imposes non-duplicate keys with undefined order constraint.

You are the engineer, you can decide how you use your tools depending on your use case. Unless eg. you need interop with the rest of the world, it's your JSON, (mis)treat it to your heart's content.

mike_d
2 replies
9h38m

You make it sound like it's one of the laws of physics.

The text I quoted is from the RFC. json.org and ECMA-404 both agree. You are welcome to do whatever you want, but then it isn't JSON anymore.

throwaway290
0 replies
7h40m

It is very much still JSON, and your code can very much assume keys are ordered if your JSON tools respect it. ECMA-404 agrees:

The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

If you work in environments that respect JSON key order (like browser and I think also Python) then unordered behavior of JSONB would be the exception not the rule.

The_Colonel
0 replies
7h46m

That does not follow.

JSON formatted data remains JSON no matter if you use it incorrectly.

This happens all the time in the real world - applications unknowingly rely on undefined (but generally true) behavior. If you e.g. need to integrate with a legacy application where you're not completely sure how it handles JSON, then it's likely better to use plain text JSON.

You may take the risk as well, but then good luck explaining that those devs 10 years out of the company are responsible for the breakage happening after you've converted JSON to JSONB.

In some cases, insisting on ignoring the key order is too expensive luxury, since it basically forces you to parse the whole document first and only then process it. In case you have huge documents, you have to stream-read and this often implies relying on a particular key order (which isn't a problem since those same huge documents will likely be stream-written with a particular order too).

sltkr
0 replies
4h55m

It's the JS part of JSON that imposes non-duplicate keys with undefined order constraint.

Actually since ES2015 the iteration order of object properties is fully defined: first integer keys, then string keys in insertion order, finally symbols in insertion order. (Of course, symbols cannot be represented in JSON.)

And duplicate properties in JSON are guaranteed to be treated the same way they are in object literals: a duplicate overwrites the previous value but doesn't change the order of the key.

Concretely that means if you write:

    Object.entries(JSON.parse('{"a":10, "1":20, "b":30, "a":40}'))
This is guaranteed to evaluate to:

    [['1', 20], ['a', 40], ['b', 30]]
(Note that '1' was moved to front, and 'a' comes before 'b' even though the associated value comes from the final entry in the JSON code.)

Python made a similar change in version 3.6 (officially since 3.7), both with regards to insertion order and later values overwriting earlier ones while preserving order. I think the only difference at this point is that Python doesn't move integer-like keys to the front, because unlike JavaScript, Python properly distinguishes between different key types.

hakunin
4 replies
10h26m

That’s exactly the kind of difference between json and jsonb that you gotta keep in mind. Object properties are unordered, but a json string is very much ordered. It’s the same sequence of characters and lines each time, unless you parse it and dump it again. So if you want to preserve an unmodified original json string for some (e.g. cosmetic) reasons, you probably want json.

rowanG077
3 replies
7h37m

I would expect you are threading on dangerous grounds to assume a type called JSON is going to preserve the data byte for byte. It might currently but I doubt that is in the API contract. You really want to use TEXT if that is your requirement

cztomsik
2 replies
6h59m

I don't know what SQLite does, but in JS the order is actually defined. JS is not Java, object is not HashMap, if anything, it's closer to LinkedHashMap, but even that is not correct because there are numeric slots which always go first. https://tc39.es/ecma262/#sec-ordinaryownpropertykeys

dspillett
1 replies
6h29m

> but in JS the order is actually defined

Defined yes, but still arbitrary and may not be consistent between JSON values of the same schema, as per the document you linked to:

> in ascending chronological order of property creation

Also, while JSON came from JS sort-of, it is, for better or worse (better than XML!) a standard apart from JS with its own definitions and used in many other contexts. JSON as specified does not have a prescribed order for properties, so it is not safe to assume one. JS may generally impose a particular order, but other things may not when [re]creating a JSON string from their internal format (JSONB in this case), so by assuming a particular order will be preserved you would be relying on a behaviour that is undefined in the context of JSON.

cztomsik
0 replies
5h19m

Yes, I know, the point was not to say that it's safe to depend on this universally, but rather why it's safe in JS and why it's not elsewhere -> other languages use hash maps simply because authors were either lazy or unaware of the original behaviour. (which sucks, in my opinion, but nobody can fix it now)

ako
2 replies
3h8m

Assume you're writing an editor for JSON files. Don't think many users of that editor would be very happy if you change the order of the attributes in their json files, even though technically it's the same...

jchanimal
0 replies
2h26m

According to the JSON spec it's also legal to round all numbers to zero.

golergka
0 replies
1h52m

If you’re writing an editor, you will need to separate in progress text that user is editing from the parsed and validated data, and keep both.

jamesfinlayson
1 replies
10h10m

I don't disagree, but people might still assume it. If you serialise a Map in Java, some Map implementations will maintain insertion order for example.

ifwinterco
0 replies
4h38m

I think this is why JS and Python chose to make key ordering defined - the most popular implementations maintained insertion order anyway, so it was inevitable people would end up writing code relying on it

cdogl
1 replies
10h25m

The same goes for maps in Go, which now explicitly randomizes map iteration with the range keyword to prevent developers from relying on a particular ordering. Neat trick.

mike_hock
0 replies
5h26m

That might get them to rely on the randomization, though :)

oefrha
9 replies
11h3m

There’s processing to be done with JSONB on every read/write, which is wasted if you’re always reading/writing the full blob.

lifthrasiir
8 replies
10h1m

Which occurs with JSON as well (SQLite doesn't have a dedicated JSON nor JSONB type). The only actual cost would be the conversion between JSONB and JSON.

oefrha
7 replies
9h34m

No, SQLite's "JSON" is just TEXT, there's no overhead with reading/writing a string.

lifthrasiir
5 replies
9h20m

That's what I said I think? "JSON" is a TEXT that is handled as a JSON string by `json_*` functions, while "JSONB" is a BLOB that is handled as an internal format by `jsonb_*` functions. You generally don't want JSONB in the application side though, so you do need a conversion for that.

akira2501
2 replies
8h53m

Yes, but when you use the BLOB with jsonb functions, your application demands go from:

Read JSON from TEXT column.

Parse JSON into Internal Binary Format.

Run json_*() function on this format, which will Serialize Internal Binary Format to JSON as output.

To:

Read JSONB from BLOB column.

Run json_*() function on Internal Binary Format, which will serialize the Internal Binary Format to JSON as output.

Because:

The json_* and jsonb_* all accept _either_ JSON or JSONB as their input. The difference is jsonb_* functions also produces it as output. So even in the above case, if your function output is just being used to feed back into another table as a BLOB, then you can use the jsonb_* version of the function and skip the serialization step entirely.

lifthrasiir
0 replies
8h51m

Oh, you are right! I later looked at the draft documentation and realized that `json(b)_` only determines the output type. That said, it is still true that you need one more function call `json(...)` to retrieve a textual form of JSON which you do often need instead of JSONB.

Someone
0 replies
1h28m

Run json_*() function on Internal Binary Format, which will serialize the Internal Binary Format to JSON as output

I don’t think that’s always correct (it is within this thread, which stated (way up) “if you’re always reading/writing the full blob”, but I think this discussion is more general by now). Firstly, it need not convert the full jsonb blob (example: jsonb_extract(field,'$.a.b.c.d.e'))

Secondly, it need not convert to text at all, for example when calling a jsonb_* function that returns jsonb.

(Technically, whether any implementation is that smart is an implementation detail, but the article claims this is

“a rewrite of the SQLite JSON functions that, depending on usage patterns, could be several times faster than the original JSON functions”

so it can’t be completely dumb.

oefrha
1 replies
9h14m

There’s no point using json_* functions if you’re always reading/writing the full blob.

lifthrasiir
0 replies
9h4m

Of course (and I never said that), but if you need to store a JSON value in the DB and use a JSONB-encoded BLOB as an optimization, you eventually read it back to a textual JSON. It's just like having a UNIX timestamp in your DB and converting back to a parsed date and time for application uses, except that applications may handle a UNIX timestamp directly and can't handle JSONB at all.

nonethewiser
0 replies
1h23m

It doesnt validate that its valid json?

chippiewill
1 replies
6h47m

Not specifically JSONB, but I do recall that with MongoDB's equivalent, BSON, the sizes of the binary equivalent tend to be larger in practice, I would expect JSONB to have a similar trade off.

There'll also be a conversion cost if you ultimately want it back in JSON form.

cowsandmilk
0 replies
6h36m

From the post:

JSONB is also slightly smaller than text JSON in most cases (about 5% or 10% smaller) so you might also see a modest reduction in your database size if you use a lot of JSON.
kevincox
0 replies
1h22m

According to the linked announcement the data size is 5-10% smaller. So you are exchanging a small processing cost for smaller storage size. It will depend on your application but in many cases smaller disk reads and writes as well as more efficient cache utilization will make storing as JSONB a better choice even if you never manipulate it in SQL. Although the difference is likely small.

Someone
0 replies
1h14m

I haven’t delved into it, but you likely lose the ability to round-trip your json strings, just as in postgreSQL’s jsonb type.

  Jsonb_extract(jsonb(foo), '€')
likely

- will remove/change white space in ‘foo’,

- does not guarantee to keep field order

- will fail if a dictionary in the ‘foo’ string has duplicate fields

- will fail if ‘foo’ doesn’t contain a valid json string

zmmmmm
3 replies
6h43m

I don't know if it's true of SQLite, but you missed the most important point at least with PostgreSQL : you can build indexes directly against attributes inside JSONB which really turns it into a true NoSQL / relational hybrid that can let you have your cake and eat it too for some design problems that don't fit neatly into either pure relational or pure NoSQL approaches.

euroderf
1 replies
6h19m

What would that SQL look like in practice ?

ako
0 replies
3h2m

Here's an example i have running on a pi with a temperature sensor. All data is received as json messages in mqtt, then stored as is in a postgres table. The view turns it into relational data.

If i just need the messages between certain temperatures i can speed this up by adding an index on the 'temperature' field in json.

    create or replace view bme680_v(ts, message, temperature, humidity, pressure, gas, iaq) as
    SELECT mqtt_raw.created_at                                          AS ts
     , mqtt_raw.message
     , (mqtt_raw.message::jsonb ->> 'temperature'::text)::numeric    AS temperature
     , (mqtt_raw.message::jsonb ->> 'humidity'::text)::numeric       AS humidity
     , (mqtt_raw.message::jsonb ->> 'pressure'::text)::numeric       AS pressure
     , (mqtt_raw.message::jsonb ->> 'gas_resistance'::text)::numeric AS gas
     , (mqtt_raw.message::jsonb ->> 'IAQ'::text)::numeric            AS iaq
    FROM mqtt_raw
    WHERE mqtt_raw.topic::text = 'pi/bme680'::text
    ORDER BY mqtt_raw.created_at DESC;

debugnik
0 replies
4h47m

You can do this with text JSON in SQLite as well, but JSONB could speed it up.

renegade-otter
3 replies
3h34m

Indeed. Those who use Postgres are already familiar with the difference.

Rule of thumb: if you are not sure, or do not have time for the nuances, just use JSONB.

btown
1 replies
57m

There’s a huge nuance worth mentioning: with JSONB you lose key ordering in objects.

This may be desired - it makes two effectively-equal objects have the same representation at rest!

But if you are storing human-written JSON - say, configs from an internal interface, where one might collocate a “__foo_comments” key above “foo” - their layout will be lost, and this may lead to someone visually seeing their changes scrambled on save.

cryptonector
0 replies
42m

JSON processors are not required nor expected to retain object key ordering from either input or object construction order.

kevincox
0 replies
1h10m

Slight caveat. It seems that you should default to JSONB for all cases except for values directly returned in the query result. As IIUC you will get a JSONB blob back and will be responsible for parsing it on the client.

So use it for writes:

    UPDATE t SET col = jsonb_*(?)
Also use it for filters if applicable (although this seems like a niche use case, I can't actually think of an example).

But if returning values you probably want to use `json_*` functions

    SELECT json_extract(col1, '$.myProp') FROM t
Otherwise you will end up receiving the binary format.

lovasoa
3 replies
7h37m

If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick.

There is no json data type ! If you are just storing json blobs, then the BLOB or TEXT data types will be the best picks.

dheera
1 replies
2h28m

There is no json data type

Why not? I feel like a database should allow for a JSON data type that stores only valid JSON or throws an exception. It would also be nice to be able to access subfields of the JSON.

SELECT userid, username, some_json_field["some_subfield"][0] FROM users where ...

Not sure where to give feature suggestions so I'm just leaving this here for future devs to find

sroussey
0 replies
1h56m

SQLite has a surprisingly small number of types: null, integer, real, text, and blob. That’s it.

https://www.sqlite.org/datatype3.html

gwd
0 replies
6h51m

In context, what is clearly meant is, "If you're just reading and writing full JSON blobs, use `insert ... ($id,$json)`; if you're primarily querying the data, use `insert ... ($id, jsonb($json))`, which will convert the text into JSONB before storing it.

nikeee
2 replies
8h15m

If my application never sees a difference to normal JSON, everything is compatible and all there is are perf improvements, why is there a new set of functions to interact with it (jsonb_*)?

It seems that the JSON type is even able to contain JSONB. So why even use these functions, if the normal ones don't care?

dtech
0 replies
7h59m

The difference is the storage format, which is important for a SQL database since you define it in the schema. Also the performance characteristics are different.

The_Colonel
0 replies
7h38m

As someone mentioned below, the order of keys is undefined in JSON spec, but applications may rely on it anyway, and thus conversion to JSONB may lead to breakage.

There are some other minor advantages of having exact representation of the original - e.g. hashing, signatures, equality comparison is much simpler on the JSON string (you need a strict key order, which is again undefined by the spec, but happens in the real world anyway).

cryptonector
1 replies
46m

TFA doesn't say that this is _the_ JSONB of PostgreSQL fame, but I assume it must be. JSONB is a brilliant binary JSON format. What makes it brilliant is that arrays and objects (which when serialized are a sort of array) are encoded by having N-1 lengths of values then 1 offset to the Nth value, then N-1 lengths of values then... The idea is that lengths are going to be similar, while offsets never are, so using lengths makes JSONB more compressible, but offsets are needed to enable random access, so if you store mostly lengths and sometimes offsets you can get O(1)ish random access while still retaining compressibility.

Originally JSONB used only offsets, but that made it incompressible. That held up a PostgreSQL release for some time while they figured out what to do about it.

ks2048
0 replies
11m

Link? My reading last time I looked at this was that the sqlite and postgres “jsonb” were different.

tyingq
0 replies
7h47m

Your application will never see the raw JSONB content.

That's not exactly right, as the jsonb_* functions return JSONB if you choose to use them.

tuyiown
0 replies
7h49m

If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick.

I don't know how the driver is written, but this is misleading if sqlite provides an api to read the jsonb data with a in-memory copy, an app can surely benefit from skipping the json string parsing.

lake-view
13 replies
11h58m

Because it seems to not be common knowledge on this thread: JSONB is a format offered by Postgres for a while now, and is recommended over plain JSON primarily for improved read performance.

https://www.postgresql.org/docs/current/datatype-json.html

M4v3R
6 replies
11h28m

I don’t know about Sqlite’s implementation but in Postgres JSONB is not 100% transparent to the application. One caveat I’ve encountered while working on an application that stored large JSON objects in Postgres initially as JSONB is that it doesn’t preserve object key order, i.e. the order of keys in an object when you store it will not match the order of keys when you retrieve said object. While for most applications this is not an issue the one I was working on actually did rely on the order of keys (which I am aware is a bad practice but this is how the system was designed) and suddenly we noticed that the app started misbehaving. Changing the column type from JSONB to JSON fixed the problem.

magicalhippo
1 replies
11h23m

Given that the order of the keys is specified as having no significance in regular JSON[1], this is out-of-spec usage.

If key order has to be preserved then a blob type would be a better fit, then you're guaranteed to get back what you wrote.

For example, SQLite says it stores JSON as regular text but MySQL converts it to an internal representation[2], so if you migrate you might be in trouble.

[1]: https://ecma-international.org/publications-and-standards/st...

[2]: https://dev.mysql.com/doc/refman/8.0/en/json.html

devit
0 replies
4h0m

You can also add a property to all objects with an array of keys in the order you want (if you can guarantee it won't conflict with existing properties, or can escape those), or turn them into a {k, o} array where k is the key array and o the object or if you don't care about looking up keys then either a {k, v} array where v is a value array or an object where keys are prefixed with their position or putting the position in the value as an array of value and position.

clarkdave
1 replies
6h35m

Indeed! There is at least one other difference, which is that the presence of null characters will cause an error in `jsonb`, but not `json`:

# select '{ "id": "\u0000" }'::json;

-> { "id": "\u0000" }

# select '{ "id": "\u0000" }'::jsonb;

-> ERROR: unsupported Unicode escape sequence

cryptonector
0 replies
16m

The object key order thing is not a JSON spec compliance issue, but this one is. Either PG should store this escaped or use counted byte strings, but you can see why neither is desirable for a high-performance binary JSON format.

fbdab103
0 replies
10h52m

I suppose it is what it is, but if ordering matters, it is only JSON-like in appearance. json.org says:

An object is an unordered set of name/value pairs.
cryptonector
0 replies
19m

Never ever count on object key ordering, or even on there being no duplicate keys in objects. If the JSON texts you're dealing with can be processed -or even just stored- by other code then you're in for pain if you make any assumptions about object key ordering / object key dups.

Once you accept this then it stops being a problem.

o11c
3 replies
11h34m

Postgres's JSONB uses native numbers, which is faster but coerces values. It also removes duplicate keys but you shouldn't use those anyway.

Sqlite's JSONB keeps numbers as strings, which is slower but preserves your weird JSON (since there's no such thing as standard JSON). I'm not sure about duplicate keys.

rafaelmn
0 replies
4h17m

Postgres's JSONB uses native numbers, which is faster but coerces values.

You would probably be string quoting your numbers in your data/model if this matters to you so sounds like the right call from PG implementation.

lifthrasiir
0 replies
9h54m

SQLite's approach is faster if you are not frequently extracting numerical values out of JSONB. It also makes much easier to convert JSONB back to JSON. I think SQLite JSONB reserved enough space to define a native number type (among others) if this assumption turned out to be false.

chrismorgan
0 replies
9h48m

since there's no such thing as standard JSON

I-JSON is the most sensible JSON profile I know of: https://datatracker.ietf.org/doc/html/rfc7493. It says: UTF-8 only, prefer not to use numbers beyond IEEE 754-2008 binary64 precision, no duplicate keys, and a couple more things.

vhiremath4
0 replies
4h14m

A warning to anyone who uses JSONB in Postgres. Past a certain size, the engine must go to the TOAST table off disk, even for indexed reads. If you store your JSONB in Postgres, don’t wait for it to grow out of hand or store a bunch of random stuff in it from an API call!

selcuka
0 replies
11h53m

That JSONB is different from SQLite's JSONB, though.

p-e-w
12 replies
12h8m

There's all the rest of open source, and then there's SQLite. A public domain software that doesn't accept contributions from outsiders, and that happens to run much of the world.

And it just keeps getting better and better and better, and faster and faster and faster.

I don't know how these guys manage to succeed where almost all other projects fail, but I hope they keep going.

kccqzy
3 replies
11h10m

It's perhaps the best example of the cathedral model of open source.

zertrin
2 replies
9h54m

Yeah, learned about it while reading the documentation of Fossil (from the same SQLite people). Their approach certainly has its own merits (and drawbacks).

Just wondering how they will transition once the original few people at the top of the hierarchy need to retire, eventually it will happen.

I guess they need to find younger trusted committers with the same dedication and spirit. That's not necessarily easy. But for a piece of software as important as SQLite, I have a feeling they will find those.

ykonstant
1 replies
9h39m

Just wondering how they will transition once the original few people at the top of the hierarchy need to retire, eventually it will happen.

I had always viewed this as a "future worry", and then Bram Moolenaar passed away :(

dspillett
0 replies
6h21m

The problem with “future worry”s is that the Universe doesn't always agree with us on when that future will be, and it often happens to be closer than expected, even more often closer than desired.

LVB
3 replies
10h40m

The patterns I've noticed, watching it evolve for nearly 20 years are:

- They draft behind others, and I mean this in a good way. They seem to eschew trailblazing new features, but keep a close eye on alternatives (esp. Postgres) and how use cases are emerging. When a sufficiently interesting concept has stabilized, they bring it to SQLite.

- Closely related to this, they seem to stay clear of technical and community distractions. They have their plans and execute.

- I don't know if D. Richard Hipp is considered a BDFL, but that's my impression and he seems good at it.

vsnf
2 replies
8h24m

When put this way, it reminds me how Apple generally chooses to add features to the iPhone. They wait for use cases to be proven, usually by Samsung, and then add a very polished version to the phone.

jiehong
0 replies
7h39m

And Java is similar nowadays

6510
0 replies
5h15m

Arguably the entire app eco system exists only to farm features. This is arguably the case whenever users are allowed to create things.

Go get a pro subscription with each of your competitors :)

ykonstant
1 replies
9h40m

Can anyone comment on the characteristics of the source code? Is the sqlite source worth reading for someone who is well-versed in C and x86 assembly, and has a rudimentary knowledge of databases?

eesmith
0 replies
7h30m

"If you're looking into building your own database ...SQLite is amazing. It has very clean and readable code, so I'd suggest using it as a reference" https://news.ycombinator.com/item?id=21041833

"I suggest SQLite because the source code is superb (seriously, some of the most readable, most logically organized and best commented C code you'll ever see)" https://news.ycombinator.com/item?id=12559301

"it never hurts to look at a good open-source codebase written in C, for example the SQLite code is worth looking at (if a bit overwhelming)" https://news.ycombinator.com/item?id=33132772

whalesalad
0 replies
11h29m

Sometimes you gotta keep the riff raff out in order to march forwards with your vision. Too many cooks in the kitchen can be deadly.

cryptonector
0 replies
3m

I don't know how these guys manage to succeed where almost all other projects fail, [...].

They have a proprietary (not open source) test suite with 100% branch coverage. That makes it impossible to have credible forks of SQLite3. And it makes the gatekeepers valuable because SQLite3 is the most widely used piece of software ever built. So there's a SQLite Consortium, and all the big tech players that depend on SQLite3 are members, and that's how the SQLite team pays the bills.

chubot
11 replies
12h15m

Hm I googled and found this draft of the encoding - https://sqlite.org/draft/jsonb.html

It feels like it would be better to use a known binary encoding. I thought the MessagePack data model corresponded pretty much exactly to JSON ?

Edit: someone else mentioned BSON - https://bsonspec.org/

To be honest the wins (in this draft) don't seem that compelling

The advantage of JSONB over ordinary text RFC 8259 JSON is that JSONB is both slightly smaller (by between 5% and 10% in most cases) and can be processed in less than half the number of CPU cycles.

JSON has been optimized to death; it seems like you could get the 2x gain and avoid a new format with normal optimization, or perhaps compile-time options for SIMD JSON techniques

---

And this seems likely to confuse:

The "JSONB" name is inspired by PostgreSQL, but the on-disk format for SQLite's JSONB is not the same as PostgreSQL's. The two formats have the same name, but they have wildly different internal representations and are not in any way binary compatible.

---

Any time data is serialized, SOMEBODY is going to read it. With something as popular as sqlite, that's true 10x over.

So to me, this seems suboptimal on 2 fronts.

topspin
3 replies
11h53m

JSON has been optimized to death; it seems like you could get the 2x gain and avoid a new format with normal optimization

Either I'm experiencing a reading comprehension mishap or this is self contradictory. Where is a "2x gain" supposed to come from through "normal optimization" from after something has already been optimized "to death?"

SIMD JSON techniques

Which are infeasible in key SQLite use cases.

brazzy
1 replies
11h22m

Credit to SQLite developers for adopting an extant binary JSON format, as opposed to inventing yet another one.

The comment you are replying to cites a statement saying explicitly that they are not adopting the Postgres JSONB binary format, only the name and the abstract concept. The API is not compatible with Postgres either.

topspin
0 replies
11h17m

I noted that and removed that bit prior to your reply.

Points off for a.) inventing yet another binary JSON and/or b.) using the same name as an existing binary JSON.

chubot
0 replies
32m

I just meant there are a zillion different known ways of optimizing JSON, that maybe could be applied here. But maybe they already optimized it to the degree they're willing and wanted a binary format

gleenn
2 replies
12h5m

I thought the point was that the DB could reach into JSONB for you (potentially indexed as well) instead of having to deserialize outside. So the exact serialization isn't maybe as interesting as the direct queryability.

fbdab103
1 replies
11h58m

Does SQLite make any promises on the internal implementation? I assumed that the only guarantee was at the SQL interface.

banana_giraffe
0 replies
11h28m

They do promise it'll be a stable format:

The JSONB format is not intended as an interchange format. Nevertheless, JSONB is stored in database files which are intended to be readable and writable for many decades into the future. To that end, the JSONB format is well-defined and stable. The separate SQLite JSONB format document provides details of the JSONB format for the curious reader.

And indeed the functions jsonb() and json() will let you convert to and from jsonb.

fbdab103
1 replies
11h53m

SQLite says you should not attempt to access it: https://sqlite.org/draft/jsonb.html

JSONB is not intended as an external format to be used by applications. JSONB is designed for internal use by SQLite only. Programmers do not need to understand the JSONB format in order to use it effectively. Applications should access JSONB only through the JSON SQL functions, not by looking at individual bytes of the BLOB.

However, JSONB is intended to be portable and backwards compatible for all future versions of SQLite. In other words, you should not have to export and reimport your SQLite database files when you upgrade to a newer SQLite version. For that reason, the JSONB format needs to be well-defined.

If SQLite intends to own the format forever, I can believe that their requirements are such that leaning on an existing implementation is not worth the savings to implement.

chubot
0 replies
34m

Users don't read docs, and any visible implementation detail becomes set in stone:

https://www.hyrumslaw.com/

It looks like they're aware of that ... it's probably fine -- not ideal, but fine

If users didn't care about what the blob format was, there wouldn't be JSON support in the first place! You would have started with something like JSONB

lifthrasiir
0 replies
12h4m

It takes a lot of complex code to optimize a JSON parser. You don't need to optimize that hard to make an equally performant parser for binary serialization formats, and given SQLite's JSONB is purely an internal optimization, it doesn't have to be a well-known interchange format (which has much more concerns than SQLite).

Any time data is serialized, SOMEBODY is going to read it. With something as popular as sqlite, that's true 10x over.

And this statement is equally true for the SQLite format itself. That doesn't mean that the SQLite format should be replaced with something more standard, of course.

jomohke
0 replies
10h38m

Re-using standards is a great idea, and should remain people's default, but I don't see the benefit here.

What's the advantage of re-using a format?

Ecosystem? That won't help SQLite here, who don't have dependencies.

Keep in mind that anyone trying to write a parser for this is also writing a parser for the entire SQLite file format (the only way to access bytes). And it's a spec simple enough to fit on one monitor.

Design?

BSON (and others?) seem to have different goals.

SQLite's format seems to minimise conversion/parsing (eg. it has multiple TEXT types depending on how much escaping is needed; BSON has one fully-parsed UTF-8 string type). BSON is more complex: includes many types not supported by json (dates, regexs, uuids...) and has a bunch of deprecated features already.

SQLite's on disk-format is something they intend to support "forever", and as with the rest of SQLite, they enjoy pragmatic simplicity.

breadchris
8 replies
11h26m

I didnt understand the purposes of document stores until the past couple of years and they are fabulous for building POCs. Enhanced JSON support will help a lot for making sqlite a suitable document store.

I get full type support by serializing and deserializing protobuf messages from a db column and not making this column JSONB means i can filter this column too, instead of having to flatten the searchable data to other columns.

p1necone
7 replies
11h4m

Yeah as long as you're reading and writing to the database with the same language, and that language has good type safety the benefits of your database schema effectively being defined by the same types as the rest of your code is pretty nice for a lot of use cases.

You just have to be vigilant about correctly migrating existing data to the current shape if you ever make breaking changes to types.

BozeWolf
6 replies
9h6m

This. Would be nice if there was a framework (in go, or python pydantic) which would help me migrate data made with old structs to new structs. And also deal with the transaction.

For now i use sqlite to deal with transactions and only make backward compatible updates to structs. Brittle, but it is a toy app anyways.

(Normally use django to deal with models and migrations, but wanted to do something different)

buzziebee
4 replies
8h22m

Yeah migrations are the biggest issue for me. I really don't like not knowing what the actual shape of the document will be. Missing transactions, and not great relationship performance makes modelling some systems more hassle than it's worth.

I gave it a good go to use mongo and firestore for a few projects, but after a year or two of experimenting I'll be sticking to SQL based DBs unless there are super clear and obvious benefits to using a document based model.

piaste
3 replies
6h6m

There's a gradual approach there, where you start out with a JSONB column, and then as each piece of the data structure stabilizes* you move it out of json fields and into its own columns/tables.

* meaning, when there's enough code that depends on it that changing it would require some planning

rav
1 replies
2h10m

Alternatively, keep it in the JSON/JSONB column until you need to search/filter/query on it, in which case you pull it out into a column.

piaste
0 replies
1h39m

Even that may not be immediately necessary. I don't think SQLite has it yet, but Postgres can build partial indexes on JSONB fields.

Though most of the time, in that situation, I would pull it out.

jadbox
0 replies
4h17m

This is the way I build all my apps now. Eventually the jsonb field stores nothing as it all gets moved to defined fields.

halfcat
0 replies
3h26m

Most commonly I see people use Alembic to migrate SQLAlchemy and Pydantic models.

But I tend to just use Django. Every time I try piecing together the parts (ex FastAPI, Pydantic, Alembic, etc) I reach a point where I realize I’m recreating a half baked Django, and kick myself for not starting with Django in the first place.

Animats
7 replies
11h49m

Is this just a data type, or did SQLite put in a JSON interpreter?

lifthrasiir
2 replies
11h47m

What do you mean by a JSON interpreter? I think SQLite already has a full implementation of JSON including tree walkers, and this internal format (that is externally just another BLOB) makes them more efficient.

Animats
1 replies
10h7m

Oh, sorry, I was thinking that someone had put a Javascript interpreter in the database itself. Fortunately, no.

mariusor
0 replies
7h55m

Despite the name, the JSON format doesn't really have much to do with JavaScript.

sureglymop
0 replies
10h4m

Pretty sure SQLite comes with a JSON parser. JSON is a very simple format for which it is easy to write a parser for...

justinclift
0 replies
10h49m

SQLite has had pretty decent support for JSON for a while now:

https://sqlite.org/json1.html

cryptonector
0 replies
5m

It's a BLOB with a binary representation of JSON.

HDThoreaun
0 replies
10h36m

You mean a JSON parser? I don't think you can execute anything with json.

nojs
5 replies
7h3m

What are the use cases for storing and manipulating json at the db level like this - why not use a relational schema and query it in the normal way?

spappal
1 replies
5h48m

Align with the rest of the system, when sqlite is used as a component in a system which mainly speaks json.

Performance, when records are complex and usually not accessed.

Ease, avoiding/postponing table design decisions.

Flexibility, for one of many corner-cases (since sqlite is a very broad tool).

Incremental enhancement, e.g. when starting with sqlite as replacement to an ndjson-file and incrementally taking advantage of transactions and indexes on fields [1,2,3].

For example, several of these could apply when doing structured logging to a sqlite database.

    [1]: https://www.sqlite.org/expridx.html
    [2]: https://www.sqlite.org/gencol.html
    [3]: https://antonz.org/json-virtual-columns/
    See also: https://www.sqlite.org/json1.html

mkl
0 replies
5h21m
dystroy
1 replies
6h4m

Imagine you need to attach to some records a bag of data which you always fully need or not. This bag may have a tree structure, which is expensive to deal with in a relational model. An example would be the JSON configuration of a component/client/whatever. In such a case, trying to fit this bag of data in a relational model would be inefficient.

Those bags of data are usually called "documents". And a lot of systems need a way to store them along relational data.

If you're just storing and retrieving those documents, without any query, you don't need JSONB, a simple blob is enough.

If you sometimes need to do some queries, especially free queries (you want all component whose configuration has some property), then JSONB is suitable as it lets you do the filtering in the database.

nojs
0 replies
5h47m

If you sometimes need to do some queries, especially free queries (you want all component whose configuration has some property), then JSONB is suitable as it lets you do the filtering in the database.

This feels like a slippery slope into a denormalised mess though. Before you know it your whole client record is a document and you’re using Postgres as a NoSQL database

ggregoire
0 replies
2h14m

There are good use cases, but from my personal experience, I've seen people use JSONB mainly to avoid to have too many columns when they inspect their tables, so they split their data into two groups, the ones supposedly important that have dedicated columns and the rest that is thrown into a single JSONB column.

mongol
4 replies
6h16m

Unrelated question: what languages integrates best with sqlite? I am using it with go and cgo, but it is often advised to avoid cgo. Perhaps it doesn't matter so much, I can use it anyways, but would be interesting to hear about other experiences.

hahn-kev
1 replies
6h3m

Define best

mongol
0 replies
5h59m

Little overhead, a "native" integration.

quesera
0 replies
1h6m

Any language with bindings will be fine.

The C integration API to Sqlite is very simple and small, so a) almost all languages will have bindings, and b) any bindings that exist will be reasonable.

doublerabbit
0 replies
3h14m

what languages integrates best with sqlite?

TCL.

kevin_thibedeau
2 replies
12h10m

Next step is to go full Ouroboros and have embedded SQLite DBs as records.

selcuka
1 replies
12h6m

Next step is to go full Ouroboros

That was the previous step. Binary BLOBs have been supported in SQLite for some time.

6510
0 replies
5h37m

Think of the children!

bvrmn
2 replies
10h34m

Despite internal format I see immediate external usage in applications. For example batch insertions in Python. Per row insert call has noticeable overhead. And JSONB could bring performance back with CTE:

    CREATE TABLE data(id, name, age);

    WITH ins AS (
        SELECT c1.value, c2.value, c3.value
        FROM json_each('["some", "uuid", "key"]') c1
        INNER JOIN json_each('["joe", "sam", "phil"]') c2 USING (id)
        INNER JOIN json_each('[10, 20, 30]') c3 USING (id)
    )
    INSERT INTO data (id, name, age)
    SELECT * FROM ins
Each json_each could accept a bind parameter with JSONB BLOB from an app.

matharmin
1 replies
9h46m

You don't need JSONB for this - doing this with plain JSON is simpler and already faster than individual inserts for most bindings in my experience.

I typically do bulk inserts using a single JSON argument like this:

    WITH ins AS (SELECT e.value ->> 'id', e.value ->> 'name', e.value ->> 'age' FROM json_each(?) e)
    INSERT INTO data (id, name, age)
    SELECT * FROM ins
The same approach can be used for bulk updates and deletes as well.

bvrmn
0 replies
9h31m

I have quite wide records (over 50 fields) and ->> performs not well with text keys. I did not try it with array indexing though:

    WITH ins AS (
        SELECT value ->> 0, value ->> 1, value ->> 2
        FROM json_each('[["some", "joe", 10], ["uuid", "sam", 20], ["key", "phil", 30]]')
    )
    INSERT INTO data (id, name, value)
    SELECT * FROM ins

evanjrowley
1 replies
12h12m

I'm familiar with MongoDB's BSON, but not JSONB. Here is an article I found that talks about the differences: https://blog.ferretdb.io/pjson-how-to-store-bson-in-jsonb/

masklinn
0 replies
7h58m

AFAIK jsonb is not a specific format, it’s just a generic term for “a json equivalent binary representation”. The sqlite blurb says jsonb is generally smaller than json, but IIRC from when postgres added jsonb postgres’ is generally slightly larger.

downrightmike
1 replies
11h51m

Is it an excellent and unchecked attack surface like JSON is in MSSQL? Postgres is also known for SQL injection. And if this is based off of that...

https://www.imperva.com/blog/abusing-json-based-sql/

maxbond
0 replies
11h28m

This blog post is about using these operators to bypass WAFs. You don't need JSON operators for that. You can substitute `1==1` with `2==2`, or `1!=1`, or `true`, or a million other approaches. There are an infinite number of such strings.

This is a problem with WAFs, not databases. Postgres and SQL Server both provide prepared statements as an alternative to string concatenation, which addresses SQL injection. (Though some people may be stuck with legacy or vendor-contolled systems that they can't fix, and so WAFs are their only option.)

array-species
1 replies
12h17m

Interested to know what the Deno JavaScript/TypeScript project thinks of this addition given it has already has a key value store backed by the database and JSON is JavaScript friendly.

laurencerowe
0 replies
11h51m

While this will be good for SQLite queries that look into JSON data I’m not sure it really changes much for Deno. V8’s JSON parser is very highly optimised so it may be tricky to make a faster SQLite JSONB parser.

SigmundA
1 replies
1h15m

I would prefer relational databases just have a compound hierarchal data type that can contain all the types it supports. Json is so anemic on types.

This way it can efficiently store numbers , dates, uuids or raw binary etc. and should really have some sort of key interning to efficiently store repeating key names.

Then just have functions to convert to/from json if thats what you want

cryptonector
0 replies
6m

Json is so anemic on types.

So is SQLite3 though.

I do wish that SQLite3 could get a CREATE TYPE command so that one could get a measure of static typing. Under the covers there would still be only the types that SQLite3 has now, so a CREATE TYPE would have to specify which one of those types underlies the user-defined type. I think this should be doable.

OJFord
1 replies
5h38m

Anyone know what their release process is like, will this be in v3.45? (Downloads page has latest release as 3.44, and TFA says this is in pre-release snapshot.)

I don't use SQLite directly much, but I'd be keen to use this in Cloudflare's D1 & Fly.io. Having said that though, I'm not sure they publicise sqlite version (or even that it isn't customised) - double-checking now they currently only talk about importing SQLite3 dumps or compatible .sql files, not actually that it is SQLite.

So API changes like this actually break that promise don't they? Even though you wouldn't normally think of an addition (of `jsonb_*` functions) as being a major change (and I know it isn't semver anyway) they are for Cloudflare's promise of being able to import SQLite-compatible dumps/query files, which wouldn't previously have but henceforth might contain those functions.

jadbox
0 replies
4h12m

I came here to ask the same thing. Btw Cloudflare D1 has a lot of limitations while it has been in a beta. In my tests, almost all D1 read/writes go to a single region. It is highly recommended to try to limit edge function usage of D1 to no more than a single read query or 1read+1write to reduce edge latency.

sjmiller609
0 replies
1h13m

I wish there was some way to compress JSON data across multiple rows, I often have very similar blobs in each row. Or some other way to reduce amount of storage for a lot of similar blobs across many rows, for example 10kB JSON where only one or two keys are different.

radarsat1
0 replies
9h32m

I have some cases where sometimes I want to store a vector of floats along with a data item. I usually don't need to match against the column, just store it. I know I could use BLOB but often I just use JSON for this kind of thing so that I don't have to deal with data types and conversion. It's wasteful and imprecise though due to the string conversion. Is JSONB a good middle ground option for these cases?

Edit: Sorry, just saw the comment below by o11c,

Sqlite's JSONB keeps numbers as strings

which means the answer to my question is basically "no".

nalgeon
0 replies
12h55m

You can try JSONB in the pre-release snapshot [1] or live in the playground [2].

[1]: https://sqlite.org/download.html

[2]: https://codapi.org/sqlite

mgaunard
0 replies
1h41m

why not decode it as a struct or list of whatever are the native types in the database for arbitrarily nested objects?

lovasoa
0 replies
7h41m

As the maintainer of sql.js and SQLPage, I am very excited about this:

- json is used a lot by sql.js users to interact with JavaScript.

- to generate more sophisticated web pages in SQLPage, json is crucial

I can't wait for the next release !

euroderf
0 replies
8h57m

So what will JSONB look like in a standalone DB browser (DBeaver, etc.) ?

cryptonector
0 replies
43m

I'm surprised that TFA doesn't say that the JSONB it refers to is PostgreSQL's JSONB. I assume it must be because SQLite3 adopts a lot of things from PostgreSQL, so D.R. Hipp and crew must be familiar with PG's JSONB and so they wouldn't create a different JSONB. Plus the PG JSONB is brilliant, so it would be wise to copy it or at least take inspiration from it.

There's.. no need to interop with PG's JSONB encoding, I think. If so then SQLite3's could differ from PG's.

Anyways, it'd be nice if TFA was clearer on this point.

ape4
0 replies
3h36m

Does Sqlite support Sqlite database format in tables