JSONB has landed

Lots of confusion on what JSONB is.

To your application, using JSONB looks very similar to the JSON datatype. You still read and write JSON strings—Your application will never see the raw JSONB content. The same SQL functions are available, with a different prefix (jsonb_). Very little changes from the application's view.

The difference is that the JSON datatype is stored to disk as JSON, whereas the JSONB is stored in a special binary format. With the JSON datatype, the JSON must be parsed in full to perform any operation against the column. With the JSONB datatype, operations can be performed directly against the on-disk format, skipping the parsing step entirely.

If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick. If you're querying or manipulating the data using SQL, JSONB will be the best pick.

Is there any downside to storing JSON-B even if you’re not planning to query it? For example, size on disk, read/write performance?

If the order of items in the JSON blob matters then JSONB probably wouldn't preserve the order.

JSON is unordered. Nothing in your code should assume otherwise.

"An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array."

You make it sound like it's one of the laws of physics.

ON part of JSON doesn't know about objects. When serialized, object entries are just an array of key-value pairs with a weird syntax and a well-defined order. That's true for any serialization format actually.

It's the JS part of JSON that imposes non-duplicate keys with undefined order constraint.

You are the engineer, you can decide how you use your tools depending on your use case. Unless eg. you need interop with the rest of the world, it's your JSON, (mis)treat it to your heart's content.

You make it sound like it's one of the laws of physics.

The text I quoted is from the RFC. json.org and ECMA-404 both agree. You are welcome to do whatever you want, but then it isn't JSON anymore.

It is very much still JSON, and your code can very much assume keys are ordered if your JSON tools respect it. ECMA-404 agrees:

The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

If you work in environments that respect JSON key order (like browser and I think also Python) then unordered behavior of JSONB would be the exception not the rule.

That does not follow.

JSON formatted data remains JSON no matter if you use it incorrectly.

This happens all the time in the real world - applications unknowingly rely on undefined (but generally true) behavior. If you e.g. need to integrate with a legacy application where you're not completely sure how it handles JSON, then it's likely better to use plain text JSON.

You may take the risk as well, but then good luck explaining that those devs 10 years out of the company are responsible for the breakage happening after you've converted JSON to JSONB.

In some cases, insisting on ignoring the key order is too expensive luxury, since it basically forces you to parse the whole document first and only then process it. In case you have huge documents, you have to stream-read and this often implies relying on a particular key order (which isn't a problem since those same huge documents will likely be stream-written with a particular order too).

It's the JS part of JSON that imposes non-duplicate keys with undefined order constraint.

Actually since ES2015 the iteration order of object properties is fully defined: first integer keys, then string keys in insertion order, finally symbols in insertion order. (Of course, symbols cannot be represented in JSON.)

And duplicate properties in JSON are guaranteed to be treated the same way they are in object literals: a duplicate overwrites the previous value but doesn't change the order of the key.

Concretely that means if you write:

    Object.entries(JSON.parse('{"a":10, "1":20, "b":30, "a":40}'))

This is guaranteed to evaluate to:

    [['1', 20], ['a', 40], ['b', 30]]

(Note that '1' was moved to front, and 'a' comes before 'b' even though the associated value comes from the final entry in the JSON code.)

Python made a similar change in version 3.6 (officially since 3.7), both with regards to insertion order and later values overwriting earlier ones while preserving order. I think the only difference at this point is that Python doesn't move integer-like keys to the front, because unlike JavaScript, Python properly distinguishes between different key types.

That’s exactly the kind of difference between json and jsonb that you gotta keep in mind. Object properties are unordered, but a json string is very much ordered. It’s the same sequence of characters and lines each time, unless you parse it and dump it again. So if you want to preserve an unmodified original json string for some (e.g. cosmetic) reasons, you probably want json.

I would expect you are threading on dangerous grounds to assume a type called JSON is going to preserve the data byte for byte. It might currently but I doubt that is in the API contract. You really want to use TEXT if that is your requirement

I don't know what SQLite does, but in JS the order is actually defined. JS is not Java, object is not HashMap, if anything, it's closer to LinkedHashMap, but even that is not correct because there are numeric slots which always go first. https://tc39.es/ecma262/#sec-ordinaryownpropertykeys

> but in JS the order is actually defined

Defined yes, but still arbitrary and may not be consistent between JSON values of the same schema, as per the document you linked to:

> in ascending chronological order of property creation

Also, while JSON came from JS sort-of, it is, for better or worse (better than XML!) a standard apart from JS with its own definitions and used in many other contexts. JSON as specified does not have a prescribed order for properties, so it is not safe to assume one. JS may generally impose a particular order, but other things may not when [re]creating a JSON string from their internal format (JSONB in this case), so by assuming a particular order will be preserved you would be relying on a behaviour that is undefined in the context of JSON.

Yes, I know, the point was not to say that it's safe to depend on this universally, but rather why it's safe in JS and why it's not elsewhere -> other languages use hash maps simply because authors were either lazy or unaware of the original behaviour. (which sucks, in my opinion, but nobody can fix it now)

Assume you're writing an editor for JSON files. Don't think many users of that editor would be very happy if you change the order of the attributes in their json files, even though technically it's the same...

According to the JSON spec it's also legal to round all numbers to zero.

If you’re writing an editor, you will need to separate in progress text that user is editing from the parsed and validated data, and keep both.

I don't disagree, but people might still assume it. If you serialise a Map in Java, some Map implementations will maintain insertion order for example.

I think this is why JS and Python chose to make key ordering defined - the most popular implementations maintained insertion order anyway, so it was inevitable people would end up writing code relying on it

The same goes for maps in Go, which now explicitly randomizes map iteration with the range keyword to prevent developers from relying on a particular ordering. Neat trick.

That might get them to rely on the randomization, though :)

There’s processing to be done with JSONB on every read/write, which is wasted if you’re always reading/writing the full blob.

Which occurs with JSON as well (SQLite doesn't have a dedicated JSON nor JSONB type). The only actual cost would be the conversion between JSONB and JSON.

No, SQLite's "JSON" is just TEXT, there's no overhead with reading/writing a string.

That's what I said I think? "JSON" is a TEXT that is handled as a JSON string by `json_*` functions, while "JSONB" is a BLOB that is handled as an internal format by `jsonb_*` functions. You generally don't want JSONB in the application side though, so you do need a conversion for that.

Yes, but when you use the BLOB with jsonb functions, your application demands go from:

Read JSON from TEXT column.

Parse JSON into Internal Binary Format.

Run json_*() function on this format, which will Serialize Internal Binary Format to JSON as output.

To:

Read JSONB from BLOB column.

Run json_*() function on Internal Binary Format, which will serialize the Internal Binary Format to JSON as output.

Because:

The json_* and jsonb_* all accept _either_ JSON or JSONB as their input. The difference is jsonb_* functions also produces it as output. So even in the above case, if your function output is just being used to feed back into another table as a BLOB, then you can use the jsonb_* version of the function and skip the serialization step entirely.

Oh, you are right! I later looked at the draft documentation and realized that `json(b)_` only determines the output type. That said, it is still true that you need one more function call `json(...)` to retrieve a textual form of JSON which you do often need instead of JSONB.

Run json_*() function on Internal Binary Format, which will serialize the Internal Binary Format to JSON as output

I don’t think that’s always correct (it is within this thread, which stated (way up) “if you’re always reading/writing the full blob”, but I think this discussion is more general by now). Firstly, it need not convert the full jsonb blob (example: jsonb_extract(field,'$.a.b.c.d.e'))

Secondly, it need not convert to text at all, for example when calling a jsonb_* function that returns jsonb.

(Technically, whether any implementation is that smart is an implementation detail, but the article claims this is

“a rewrite of the SQLite JSON functions that, depending on usage patterns, could be several times faster than the original JSON functions”

so it can’t be completely dumb.

There’s no point using json_* functions if you’re always reading/writing the full blob.

Of course (and I never said that), but if you need to store a JSON value in the DB and use a JSONB-encoded BLOB as an optimization, you eventually read it back to a textual JSON. It's just like having a UNIX timestamp in your DB and converting back to a parsed date and time for application uses, except that applications may handle a UNIX timestamp directly and can't handle JSONB at all.

It doesnt validate that its valid json?

Not specifically JSONB, but I do recall that with MongoDB's equivalent, BSON, the sizes of the binary equivalent tend to be larger in practice, I would expect JSONB to have a similar trade off.

There'll also be a conversion cost if you ultimately want it back in JSON form.

From the post:

JSONB is also slightly smaller than text JSON in most cases (about 5% or 10% smaller) so you might also see a modest reduction in your database size if you use a lot of JSON.

According to the linked announcement the data size is 5-10% smaller. So you are exchanging a small processing cost for smaller storage size. It will depend on your application but in many cases smaller disk reads and writes as well as more efficient cache utilization will make storing as JSONB a better choice even if you never manipulate it in SQL. Although the difference is likely small.

I haven’t delved into it, but you likely lose the ability to round-trip your json strings, just as in postgreSQL’s jsonb type.

  Jsonb_extract(jsonb(foo), '€')

likely

- will remove/change white space in ‘foo’,

- does not guarantee to keep field order

- will fail if a dictionary in the ‘foo’ string has duplicate fields

- will fail if ‘foo’ doesn’t contain a valid json string

I don't know if it's true of SQLite, but you missed the most important point at least with PostgreSQL : you can build indexes directly against attributes inside JSONB which really turns it into a true NoSQL / relational hybrid that can let you have your cake and eat it too for some design problems that don't fit neatly into either pure relational or pure NoSQL approaches.

What would that SQL look like in practice ?

Here's an example i have running on a pi with a temperature sensor. All data is received as json messages in mqtt, then stored as is in a postgres table. The view turns it into relational data.

If i just need the messages between certain temperatures i can speed this up by adding an index on the 'temperature' field in json.

    create or replace view bme680_v(ts, message, temperature, humidity, pressure, gas, iaq) as
    SELECT mqtt_raw.created_at                                          AS ts
     , mqtt_raw.message
     , (mqtt_raw.message::jsonb ->> 'temperature'::text)::numeric    AS temperature
     , (mqtt_raw.message::jsonb ->> 'humidity'::text)::numeric       AS humidity
     , (mqtt_raw.message::jsonb ->> 'pressure'::text)::numeric       AS pressure
     , (mqtt_raw.message::jsonb ->> 'gas_resistance'::text)::numeric AS gas
     , (mqtt_raw.message::jsonb ->> 'IAQ'::text)::numeric            AS iaq
    FROM mqtt_raw
    WHERE mqtt_raw.topic::text = 'pi/bme680'::text
    ORDER BY mqtt_raw.created_at DESC;

You can do this with text JSON in SQLite as well, but JSONB could speed it up.

Indeed. Those who use Postgres are already familiar with the difference.

Rule of thumb: if you are not sure, or do not have time for the nuances, just use JSONB.

There’s a huge nuance worth mentioning: with JSONB you lose key ordering in objects.

This may be desired - it makes two effectively-equal objects have the same representation at rest!

But if you are storing human-written JSON - say, configs from an internal interface, where one might collocate a “__foo_comments” key above “foo” - their layout will be lost, and this may lead to someone visually seeing their changes scrambled on save.

JSON processors are not required nor expected to retain object key ordering from either input or object construction order.

Slight caveat. It seems that you should default to JSONB for all cases except for values directly returned in the query result. As IIUC you will get a JSONB blob back and will be responsible for parsing it on the client.

So use it for writes:

    UPDATE t SET col = jsonb_*(?)

Also use it for filters if applicable (although this seems like a niche use case, I can't actually think of an example).

But if returning values you probably want to use `json_*` functions

    SELECT json_extract(col1, '$.myProp') FROM t

Otherwise you will end up receiving the binary format.

If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick.

There is no json data type ! If you are just storing json blobs, then the BLOB or TEXT data types will be the best picks.

There is no json data type

Why not? I feel like a database should allow for a JSON data type that stores only valid JSON or throws an exception. It would also be nice to be able to access subfields of the JSON.

SELECT userid, username, some_json_field["some_subfield"][0] FROM users where ...

Not sure where to give feature suggestions so I'm just leaving this here for future devs to find

SQLite has a surprisingly small number of types: null, integer, real, text, and blob. That’s it.

https://www.sqlite.org/datatype3.html

In context, what is clearly meant is, "If you're just reading and writing full JSON blobs, use `insert ... ($id,$json)`; if you're primarily querying the data, use `insert ... ($id, jsonb($json))`, which will convert the text into JSONB before storing it.

If my application never sees a difference to normal JSON, everything is compatible and all there is are perf improvements, why is there a new set of functions to interact with it (jsonb_*)?

It seems that the JSON type is even able to contain JSONB. So why even use these functions, if the normal ones don't care?

The difference is the storage format, which is important for a SQL database since you define it in the schema. Also the performance characteristics are different.

As someone mentioned below, the order of keys is undefined in JSON spec, but applications may rely on it anyway, and thus conversion to JSONB may lead to breakage.

There are some other minor advantages of having exact representation of the original - e.g. hashing, signatures, equality comparison is much simpler on the JSON string (you need a strict key order, which is again undefined by the spec, but happens in the real world anyway).

TFA doesn't say that this is _the_ JSONB of PostgreSQL fame, but I assume it must be. JSONB is a brilliant binary JSON format. What makes it brilliant is that arrays and objects (which when serialized are a sort of array) are encoded by having N-1 lengths of values then 1 offset to the Nth value, then N-1 lengths of values then... The idea is that lengths are going to be similar, while offsets never are, so using lengths makes JSONB more compressible, but offsets are needed to enable random access, so if you store mostly lengths and sometimes offsets you can get O(1)ish random access while still retaining compressibility.

Originally JSONB used only offsets, but that made it incompressible. That held up a PostgreSQL release for some time while they figured out what to do about it.

Link? My reading last time I looked at this was that the sqlite and postgres “jsonb” were different.

Your application will never see the raw JSONB content.

That's not exactly right, as the jsonb_* functions return JSONB if you choose to use them.

If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick.

I don't know how the driver is written, but this is misleading if sqlite provides an api to read the jsonb data with a in-memory copy, an app can surely benefit from skipping the json string parsing.

Because it seems to not be common knowledge on this thread: JSONB is a format offered by Postgres for a while now, and is recommended over plain JSON primarily for improved read performance.

https://www.postgresql.org/docs/current/datatype-json.html

I don’t know about Sqlite’s implementation but in Postgres JSONB is not 100% transparent to the application. One caveat I’ve encountered while working on an application that stored large JSON objects in Postgres initially as JSONB is that it doesn’t preserve object key order, i.e. the order of keys in an object when you store it will not match the order of keys when you retrieve said object. While for most applications this is not an issue the one I was working on actually did rely on the order of keys (which I am aware is a bad practice but this is how the system was designed) and suddenly we noticed that the app started misbehaving. Changing the column type from JSONB to JSON fixed the problem.

Given that the order of the keys is specified as having no significance in regular JSON[1], this is out-of-spec usage.

If key order has to be preserved then a blob type would be a better fit, then you're guaranteed to get back what you wrote.

For example, SQLite says it stores JSON as regular text but MySQL converts it to an internal representation[2], so if you migrate you might be in trouble.

[1]: https://ecma-international.org/publications-and-standards/st...

[2]: https://dev.mysql.com/doc/refman/8.0/en/json.html

You can also add a property to all objects with an array of keys in the order you want (if you can guarantee it won't conflict with existing properties, or can escape those), or turn them into a {k, o} array where k is the key array and o the object or if you don't care about looking up keys then either a {k, v} array where v is a value array or an object where keys are prefixed with their position or putting the position in the value as an array of value and position.

Indeed! There is at least one other difference, which is that the presence of null characters will cause an error in `jsonb`, but not `json`:

# select '{ "id": "\u0000" }'::json;

-> { "id": "\u0000" }

# select '{ "id": "\u0000" }'::jsonb;

-> ERROR: unsupported Unicode escape sequence

The object key order thing is not a JSON spec compliance issue, but this one is. Either PG should store this escaped or use counted byte strings, but you can see why neither is desirable for a high-performance binary JSON format.

I suppose it is what it is, but if ordering matters, it is only JSON-like in appearance. json.org says:

An object is an unordered set of name/value pairs.

Never ever count on object key ordering, or even on there being no duplicate keys in objects. If the JSON texts you're dealing with can be processed -or even just stored- by other code then you're in for pain if you make any assumptions about object key ordering / object key dups.

Once you accept this then it stops being a problem.

Postgres's JSONB uses native numbers, which is faster but coerces values. It also removes duplicate keys but you shouldn't use those anyway.

Sqlite's JSONB keeps numbers as strings, which is slower but preserves your weird JSON (since there's no such thing as standard JSON). I'm not sure about duplicate keys.

Postgres's JSONB uses native numbers, which is faster but coerces values.

You would probably be string quoting your numbers in your data/model if this matters to you so sounds like the right call from PG implementation.

SQLite's approach is faster if you are not frequently extracting numerical values out of JSONB. It also makes much easier to convert JSONB back to JSON. I think SQLite JSONB reserved enough space to define a native number type (among others) if this assumption turned out to be false.

since there's no such thing as standard JSON

I-JSON is the most sensible JSON profile I know of: https://datatracker.ietf.org/doc/html/rfc7493. It says: UTF-8 only, prefer not to use numbers beyond IEEE 754-2008 binary64 precision, no duplicate keys, and a couple more things.

A warning to anyone who uses JSONB in Postgres. Past a certain size, the engine must go to the TOAST table off disk, even for indexed reads. If you store your JSONB in Postgres, don’t wait for it to grow out of hand or store a bunch of random stuff in it from an API call!

That JSONB is different from SQLite's JSONB, though.

There's all the rest of open source, and then there's SQLite. A public domain software that doesn't accept contributions from outsiders, and that happens to run much of the world.

And it just keeps getting better and better and better, and faster and faster and faster.

I don't know how these guys manage to succeed where almost all other projects fail, but I hope they keep going.

It's perhaps the best example of the cathedral model of open source.

Yeah, learned about it while reading the documentation of Fossil (from the same SQLite people). Their approach certainly has its own merits (and drawbacks).

Just wondering how they will transition once the original few people at the top of the hierarchy need to retire, eventually it will happen.

I guess they need to find younger trusted committers with the same dedication and spirit. That's not necessarily easy. But for a piece of software as important as SQLite, I have a feeling they will find those.

Just wondering how they will transition once the original few people at the top of the hierarchy need to retire, eventually it will happen.

I had always viewed this as a "future worry", and then Bram Moolenaar passed away :(

The problem with “future worry”s is that the Universe doesn't always agree with us on when that future will be, and it often happens to be closer than expected, even more often closer than desired.

The patterns I've noticed, watching it evolve for nearly 20 years are:

- They draft behind others, and I mean this in a good way. They seem to eschew trailblazing new features, but keep a close eye on alternatives (esp. Postgres) and how use cases are emerging. When a sufficiently interesting concept has stabilized, they bring it to SQLite.

- Closely related to this, they seem to stay clear of technical and community distractions. They have their plans and execute.

- I don't know if D. Richard Hipp is considered a BDFL, but that's my impression and he seems good at it.

When put this way, it reminds me how Apple generally chooses to add features to the iPhone. They wait for use cases to be proven, usually by Samsung, and then add a very polished version to the phone.

And Java is similar nowadays

Arguably the entire app eco system exists only to farm features. This is arguably the case whenever users are allowed to create things.

Go get a pro subscription with each of your competitors :)

Can anyone comment on the characteristics of the source code? Is the sqlite source worth reading for someone who is well-versed in C and x86 assembly, and has a rudimentary knowledge of databases?

"If you're looking into building your own database ...SQLite is amazing. It has very clean and readable code, so I'd suggest using it as a reference" https://news.ycombinator.com/item?id=21041833

"I suggest SQLite because the source code is superb (seriously, some of the most readable, most logically organized and best commented C code you'll ever see)" https://news.ycombinator.com/item?id=12559301

"it never hurts to look at a good open-source codebase written in C, for example the SQLite code is worth looking at (if a bit overwhelming)" https://news.ycombinator.com/item?id=33132772

Sometimes you gotta keep the riff raff out in order to march forwards with your vision. Too many cooks in the kitchen can be deadly.

I don't know how these guys manage to succeed where almost all other projects fail, [...].

They have a proprietary (not open source) test suite with 100% branch coverage. That makes it impossible to have credible forks of SQLite3. And it makes the gatekeepers valuable because SQLite3 is the most widely used piece of software ever built. So there's a SQLite Consortium, and all the big tech players that depend on SQLite3 are members, and that's how the SQLite team pays the bills.

Hm I googled and found this draft of the encoding - https://sqlite.org/draft/jsonb.html

It feels like it would be better to use a known binary encoding. I thought the MessagePack data model corresponded pretty much exactly to JSON ?

Edit: someone else mentioned BSON - https://bsonspec.org/

To be honest the wins (in this draft) don't seem that compelling

The advantage of JSONB over ordinary text RFC 8259 JSON is that JSONB is both slightly smaller (by between 5% and 10% in most cases) and can be processed in less than half the number of CPU cycles.

JSON has been optimized to death; it seems like you could get the 2x gain and avoid a new format with normal optimization, or perhaps compile-time options for SIMD JSON techniques

---

And this seems likely to confuse:

The "JSONB" name is inspired by PostgreSQL, but the on-disk format for SQLite's JSONB is not the same as PostgreSQL's. The two formats have the same name, but they have wildly different internal representations and are not in any way binary compatible.

---

Any time data is serialized, SOMEBODY is going to read it. With something as popular as sqlite, that's true 10x over.

So to me, this seems suboptimal on 2 fronts.

JSON has been optimized to death; it seems like you could get the 2x gain and avoid a new format with normal optimization

Either I'm experiencing a reading comprehension mishap or this is self contradictory. Where is a "2x gain" supposed to come from through "normal optimization" from after something has already been optimized "to death?"

SIMD JSON techniques

Which are infeasible in key SQLite use cases.

Credit to SQLite developers for adopting an extant binary JSON format, as opposed to inventing yet another one.

The comment you are replying to cites a statement saying explicitly that they are not adopting the Postgres JSONB binary format, only the name and the abstract concept. The API is not compatible with Postgres either.

I noted that and removed that bit prior to your reply.

Points off for a.) inventing yet another binary JSON and/or b.) using the same name as an existing binary JSON.

I just meant there are a zillion different known ways of optimizing JSON, that maybe could be applied here. But maybe they already optimized it to the degree they're willing and wanted a binary format

I thought the point was that the DB could reach into JSONB for you (potentially indexed as well) instead of having to deserialize outside. So the exact serialization isn't maybe as interesting as the direct queryability.

Does SQLite make any promises on the internal implementation? I assumed that the only guarantee was at the SQL interface.

They do promise it'll be a stable format:

The JSONB format is not intended as an interchange format. Nevertheless, JSONB is stored in database files which are intended to be readable and writable for many decades into the future. To that end, the JSONB format is well-defined and stable. The separate SQLite JSONB format document provides details of the JSONB format for the curious reader.

And indeed the functions jsonb() and json() will let you convert to and from jsonb.

SQLite says you should not attempt to access it: https://sqlite.org/draft/jsonb.html

JSONB is not intended as an external format to be used by applications. JSONB is designed for internal use by SQLite only. Programmers do not need to understand the JSONB format in order to use it effectively. Applications should access JSONB only through the JSON SQL functions, not by looking at individual bytes of the BLOB.

However, JSONB is intended to be portable and backwards compatible for all future versions of SQLite. In other words, you should not have to export and reimport your SQLite database files when you upgrade to a newer SQLite version. For that reason, the JSONB format needs to be well-defined.

If SQLite intends to own the format forever, I can believe that their requirements are such that leaning on an existing implementation is not worth the savings to implement.

Users don't read docs, and any visible implementation detail becomes set in stone:

https://www.hyrumslaw.com/

It looks like they're aware of that ... it's probably fine -- not ideal, but fine

If users didn't care about what the blob format was, there wouldn't be JSON support in the first place! You would have started with something like JSONB

It takes a lot of complex code to optimize a JSON parser. You don't need to optimize that hard to make an equally performant parser for binary serialization formats, and given SQLite's JSONB is purely an internal optimization, it doesn't have to be a well-known interchange format (which has much more concerns than SQLite).

Any time data is serialized, SOMEBODY is going to read it. With something as popular as sqlite, that's true 10x over.

And this statement is equally true for the SQLite format itself. That doesn't mean that the SQLite format should be replaced with something more standard, of course.

Re-using standards is a great idea, and should remain people's default, but I don't see the benefit here.

What's the advantage of re-using a format?

Ecosystem? That won't help SQLite here, who don't have dependencies.

Keep in mind that anyone trying to write a parser for this is also writing a parser for the entire SQLite file format (the only way to access bytes). And it's a spec simple enough to fit on one monitor.

Design?

BSON (and others?) seem to have different goals.

SQLite's format seems to minimise conversion/parsing (eg. it has multiple TEXT types depending on how much escaping is needed; BSON has one fully-parsed UTF-8 string type). BSON is more complex: includes many types not supported by json (dates, regexs, uuids...) and has a bunch of deprecated features already.

SQLite's on disk-format is something they intend to support "forever", and as with the rest of SQLite, they enjoy pragmatic simplicity.

I didnt understand the purposes of document stores until the past couple of years and they are fabulous for building POCs. Enhanced JSON support will help a lot for making sqlite a suitable document store.

I get full type support by serializing and deserializing protobuf messages from a db column and not making this column JSONB means i can filter this column too, instead of having to flatten the searchable data to other columns.

Yeah as long as you're reading and writing to the database with the same language, and that language has good type safety the benefits of your database schema effectively being defined by the same types as the rest of your code is pretty nice for a lot of use cases.

You just have to be vigilant about correctly migrating existing data to the current shape if you ever make breaking changes to types.

This. Would be nice if there was a framework (in go, or python pydantic) which would help me migrate data made with old structs to new structs. And also deal with the transaction.

For now i use sqlite to deal with transactions and only make backward compatible updates to structs. Brittle, but it is a toy app anyways.

(Normally use django to deal with models and migrations, but wanted to do something different)

Yeah migrations are the biggest issue for me. I really don't like not knowing what the actual shape of the document will be. Missing transactions, and not great relationship performance makes modelling some systems more hassle than it's worth.

I gave it a good go to use mongo and firestore for a few projects, but after a year or two of experimenting I'll be sticking to SQL based DBs unless there are super clear and obvious benefits to using a document based model.

There's a gradual approach there, where you start out with a JSONB column, and then as each piece of the data structure stabilizes* you move it out of json fields and into its own columns/tables.

* meaning, when there's enough code that depends on it that changing it would require some planning

Alternatively, keep it in the JSON/JSONB column until you need to search/filter/query on it, in which case you pull it out into a column.

Even that may not be immediately necessary. I don't think SQLite has it yet, but Postgres can build partial indexes on JSONB fields.

Though most of the time, in that situation, I would pull it out.

This is the way I build all my apps now. Eventually the jsonb field stores nothing as it all gets moved to defined fields.

Most commonly I see people use Alembic to migrate SQLAlchemy and Pydantic models.

But I tend to just use Django. Every time I try piecing together the parts (ex FastAPI, Pydantic, Alembic, etc) I reach a point where I realize I’m recreating a half baked Django, and kick myself for not starting with Django in the first place.

Is this just a data type, or did SQLite put in a JSON interpreter?

What do you mean by a JSON interpreter? I think SQLite already has a full implementation of JSON including tree walkers, and this internal format (that is externally just another BLOB) makes them more efficient.

Oh, sorry, I was thinking that someone had put a Javascript interpreter in the database itself. Fortunately, no.

Despite the name, the JSON format doesn't really have much to do with JavaScript.

Pretty sure SQLite comes with a JSON parser. JSON is a very simple format for which it is easy to write a parser for...

SQLite has had pretty decent support for JSON for a while now:

https://sqlite.org/json1.html

It's a BLOB with a binary representation of JSON.

You mean a JSON parser? I don't think you can execute anything with json.

What are the use cases for storing and manipulating json at the db level like this - why not use a relational schema and query it in the normal way?

Align with the rest of the system, when sqlite is used as a component in a system which mainly speaks json.

Performance, when records are complex and usually not accessed.

Ease, avoiding/postponing table design decisions.

Flexibility, for one of many corner-cases (since sqlite is a very broad tool).

Incremental enhancement, e.g. when starting with sqlite as replacement to an ndjson-file and incrementally taking advantage of transactions and indexes on fields [1,2,3].

For example, several of these could apply when doing structured logging to a sqlite database.

    [1]: https://www.sqlite.org/expridx.html
    [2]: https://www.sqlite.org/gencol.html
    [3]: https://antonz.org/json-virtual-columns/
    See also: https://www.sqlite.org/json1.html

Clickable:

[1]: https://www.sqlite.org/expridx.html

[2]: https://www.sqlite.org/gencol.html

[3]: https://antonz.org/json-virtual-columns/

When you format links as code, they don't become links.

Imagine you need to attach to some records a bag of data which you always fully need or not. This bag may have a tree structure, which is expensive to deal with in a relational model. An example would be the JSON configuration of a component/client/whatever. In such a case, trying to fit this bag of data in a relational model would be inefficient.

Those bags of data are usually called "documents". And a lot of systems need a way to store them along relational data.

If you're just storing and retrieving those documents, without any query, you don't need JSONB, a simple blob is enough.

If you sometimes need to do some queries, especially free queries (you want all component whose configuration has some property), then JSONB is suitable as it lets you do the filtering in the database.

If you sometimes need to do some queries, especially free queries (you want all component whose configuration has some property), then JSONB is suitable as it lets you do the filtering in the database.

This feels like a slippery slope into a denormalised mess though. Before you know it your whole client record is a document and you’re using Postgres as a NoSQL database

There are good use cases, but from my personal experience, I've seen people use JSONB mainly to avoid to have too many columns when they inspect their tables, so they split their data into two groups, the ones supposedly important that have dedicated columns and the rest that is thrown into a single JSONB column.

Unrelated question: what languages integrates best with sqlite? I am using it with go and cgo, but it is often advised to avoid cgo. Perhaps it doesn't matter so much, I can use it anyways, but would be interesting to hear about other experiences.

Define best

Little overhead, a "native" integration.

Any language with bindings will be fine.

The C integration API to Sqlite is very simple and small, so a) almost all languages will have bindings, and b) any bindings that exist will be reasonable.

what languages integrates best with sqlite?

TCL.

Next step is to go full Ouroboros and have embedded SQLite DBs as records.

Next step is to go full Ouroboros

That was the previous step. Binary BLOBs have been supported in SQLite for some time.

Think of the children!

Despite internal format I see immediate external usage in applications. For example batch insertions in Python. Per row insert call has noticeable overhead. And JSONB could bring performance back with CTE:

    CREATE TABLE data(id, name, age);

    WITH ins AS (
        SELECT c1.value, c2.value, c3.value
        FROM json_each('["some", "uuid", "key"]') c1
        INNER JOIN json_each('["joe", "sam", "phil"]') c2 USING (id)
        INNER JOIN json_each('[10, 20, 30]') c3 USING (id)
    )
    INSERT INTO data (id, name, age)
    SELECT * FROM ins

Each json_each could accept a bind parameter with JSONB BLOB from an app.

You don't need JSONB for this - doing this with plain JSON is simpler and already faster than individual inserts for most bindings in my experience.

I typically do bulk inserts using a single JSON argument like this:

    WITH ins AS (SELECT e.value ->> 'id', e.value ->> 'name', e.value ->> 'age' FROM json_each(?) e)
    INSERT INTO data (id, name, age)
    SELECT * FROM ins

The same approach can be used for bulk updates and deletes as well.

I have quite wide records (over 50 fields) and ->> performs not well with text keys. I did not try it with array indexing though:

    WITH ins AS (
        SELECT value ->> 0, value ->> 1, value ->> 2
        FROM json_each('[["some", "joe", 10], ["uuid", "sam", 20], ["key", "phil", 30]]')
    )
    INSERT INTO data (id, name, value)
    SELECT * FROM ins

I'm familiar with MongoDB's BSON, but not JSONB. Here is an article I found that talks about the differences: https://blog.ferretdb.io/pjson-how-to-store-bson-in-jsonb/

AFAIK jsonb is not a specific format, it’s just a generic term for “a json equivalent binary representation”. The sqlite blurb says jsonb is generally smaller than json, but IIRC from when postgres added jsonb postgres’ is generally slightly larger.

Is it an excellent and unchecked attack surface like JSON is in MSSQL? Postgres is also known for SQL injection. And if this is based off of that...

https://www.imperva.com/blog/abusing-json-based-sql/

This blog post is about using these operators to bypass WAFs. You don't need JSON operators for that. You can substitute `1==1` with `2==2`, or `1!=1`, or `true`, or a million other approaches. There are an infinite number of such strings.

This is a problem with WAFs, not databases. Postgres and SQL Server both provide prepared statements as an alternative to string concatenation, which addresses SQL injection. (Though some people may be stuck with legacy or vendor-contolled systems that they can't fix, and so WAFs are their only option.)

Interested to know what the Deno JavaScript/TypeScript project thinks of this addition given it has already has a key value store backed by the database and JSON is JavaScript friendly.

While this will be good for SQLite queries that look into JSON data I’m not sure it really changes much for Deno. V8’s JSON parser is very highly optimised so it may be tricky to make a faster SQLite JSONB parser.

I would prefer relational databases just have a compound hierarchal data type that can contain all the types it supports. Json is so anemic on types.

This way it can efficiently store numbers , dates, uuids or raw binary etc. and should really have some sort of key interning to efficiently store repeating key names.

Then just have functions to convert to/from json if thats what you want

Json is so anemic on types.

So is SQLite3 though.

I do wish that SQLite3 could get a CREATE TYPE command so that one could get a measure of static typing. Under the covers there would still be only the types that SQLite3 has now, so a CREATE TYPE would have to specify which one of those types underlies the user-defined type. I think this should be doable.

Anyone know what their release process is like, will this be in v3.45? (Downloads page has latest release as 3.44, and TFA says this is in pre-release snapshot.)

I don't use SQLite directly much, but I'd be keen to use this in Cloudflare's D1 & Fly.io. Having said that though, I'm not sure they publicise sqlite version (or even that it isn't customised) - double-checking now they currently only talk about importing SQLite3 dumps or compatible .sql files, not actually that it is SQLite.

So API changes like this actually break that promise don't they? Even though you wouldn't normally think of an addition (of `jsonb_*` functions) as being a major change (and I know it isn't semver anyway) they are for Cloudflare's promise of being able to import SQLite-compatible dumps/query files, which wouldn't previously have but henceforth might contain those functions.

I came here to ask the same thing. Btw Cloudflare D1 has a lot of limitations while it has been in a beta. In my tests, almost all D1 read/writes go to a single region. It is highly recommended to try to limit edge function usage of D1 to no more than a single read query or 1read+1write to reduce edge latency.

I wish there was some way to compress JSON data across multiple rows, I often have very similar blobs in each row. Or some other way to reduce amount of storage for a lot of similar blobs across many rows, for example 10kB JSON where only one or two keys are different.

I have some cases where sometimes I want to store a vector of floats along with a data item. I usually don't need to match against the column, just store it. I know I could use BLOB but often I just use JSON for this kind of thing so that I don't have to deal with data types and conversion. It's wasteful and imprecise though due to the string conversion. Is JSONB a good middle ground option for these cases?

Edit: Sorry, just saw the comment below by o11c,

Sqlite's JSONB keeps numbers as strings

which means the answer to my question is basically "no".

You can try JSONB in the pre-release snapshot [1] or live in the playground [2].

[1]: https://sqlite.org/download.html

[2]: https://codapi.org/sqlite

why not decode it as a struct or list of whatever are the native types in the database for arbitrarily nested objects?

As the maintainer of sql.js and SQLPage, I am very excited about this:

- json is used a lot by sql.js users to interact with JavaScript.

- to generate more sophisticated web pages in SQLPage, json is crucial

I can't wait for the next release !

So what will JSONB look like in a standalone DB browser (DBeaver, etc.) ?

I'm surprised that TFA doesn't say that the JSONB it refers to is PostgreSQL's JSONB. I assume it must be because SQLite3 adopts a lot of things from PostgreSQL, so D.R. Hipp and crew must be familiar with PG's JSONB and so they wouldn't create a different JSONB. Plus the PG JSONB is brilliant, so it would be wise to copy it or at least take inspiration from it.

There's.. no need to interop with PG's JSONB encoding, I think. If so then SQLite3's could differ from PG's.

Anyways, it'd be nice if TFA was clearer on this point.

Does Sqlite support Sqlite database format in tables