HN comments for: Why SQLite Uses Bytecode

wolf550e

31 replies

12h58m

2024-04-30 05:30:29 UTC

The page is the result of this exchange on Twitter:

https://twitter.com/gorilla0513/status/1784756577465200740

I was surprised to receive a reply from you, the author. Thank you :) Since I'm a novice with both compilers and databases, could you tell me what the advantages and disadvantages are of using a VM with SQLite?

https://twitter.com/DRichardHipp/status/1784783482788413491

It is difficult to summarize the advantages and disadvantages of byte-code versus AST for SQL in a tweet. I need to write a new page on this topic for the SQLite documentation. Please remind me if something does not appear in about a week.

paulddraper

28 replies

12h47m

2024-04-30 05:42:14 UTC

There are three approaches:

1. interpreted code

2. compiled then interpreted bytecode

3. compiled machine code

The further up, the simpler.

The further down, the faster.

SQLite

12 replies

6h58m

2024-04-30 11:30:32 UTC

Performance analysis indicates that SQLite spends very little time doing bytecode decoding and dispatch. Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records - all of which happens in compiled C code. Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements.

So at least in the case of SQLite, compiling all the way down to machine code might provide a performance boost 3% or less. That's not very much, considering the size, complexity, and portability costs involved.

A key point to keep in mind is that SQLite bytecodes tend to be very high-level (create a B-Tree cursor, position a B-Tree cursor, extract a specific column from a record, etc). You might have had prior experience with bytecode that is lower level (add two numbers, move a value from one memory location to another, jump if the result is non-zero, etc). The ratio of bytecode overhead to actual work done is much higher with low-level bytecode. SQLite also does these kinds of low-level bytecodes, but most of its time is spent inside of high-level bytecodes, where the bytecode overhead is negligible compared to the amount of work being done.

blacklion

3 replies

6h5m

2024-04-30 12:23:39 UTC

Looks like SQLIte bytecode is similar to IBM RPG language from "IBM i" platform, which is used directly, without translation from SQL :)

Edit: PRG->RPG.

wglb

2 replies

5h14m

2024-04-30 13:14:56 UTC

I'm curious. Would you have a pointer to the documentation of PRG language?

blacklion

1 replies

4h59m

2024-04-30 13:29:29 UTC

Oooops, it is IBM RPG, not PRG! My bad!

There are some links in Wikipedia.

I never used it, but only read big thread about SQL vs RPG on Russian-speaking forum, and there was a ton of examples from person who works with IBM i platform in some big bank. Basic operations look like SQLite bytecodes: open table, move cursor after record with key value "X", get some fields, update some field, plus loops, if's, basic arithmetic, etc.

kayodelycaon

0 replies

4h40m

2024-04-30 13:48:41 UTC

For those that aren’t familiar, IBM i is ye olde A/S 400.

To be fair, RPG has been continuously updated and the version I’ve seen has Java integration.

It’s somewhat amusing that a single developer using Python can run circles around entire teams of RPG programmers. Technology marches on. :)

beeboobaa3

2 replies

5h30m

2024-04-30 12:58:47 UTC

Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements

compiling all the way down to machine code might provide a performance boost 3% or less

This logic doesn't seem sound? Because the application now spends 3% on bytecode dispatch doesn't tell us anything about how long it would instead spend on e.g. interpreting SQL.

asimpletune

1 replies

4h56m

2024-04-30 13:32:34 UTC

He’s saying in the best case scenario that 3% would go to 0. Therefore the reality would probably be even less.

vlovich123

0 replies

3h13m

2024-04-30 15:15:30 UTC

Unfortunately you can’t do performance analysis this way but I think the overall point that it’s a fraction probably still stands as you’d expect the I/O work to dominate.

The reason you can’t do the analysis the way that you explicitly stated (which fairly is what was implied) is that when you lower the code to machine code you typically get rid of branches in the execution path. Since branches slow down execution by a disproportionate amount vs how long they themselves take, it’s easy to get a >3% boost getting rid of code paths that seem like there’s only 3% of room.

What the author also failed to mention is that they’ve gone on many optimization hunts eeking out less than a percent here or there to get a cumulative boost, so 3% isn’t anything to sneeze at.

That being said, the broader point which is valid is that the maintenance isn’t worth the performance profile that SQLite targets not to mention that JITs open up an attack vector for security exploits. So the cost vs benefit is squarely in the interpreted side for now.

Edit: Forgot to mention that performance tuning a JIT to work well is also hard - for small queries you’ll spend more time compiling the byte code than you would just executing. That’s why all the big JS engines do a tiered approach where each tier optimizes a bit more.

temporarely

1 replies

5h25m

2024-04-30 13:04:19 UTC

A while back was musing if it was possible to come up with something resembling the instruction set for a CPU for an abstract relational-engine. Is this basically what SQLite is doing with bytecodes?

rkrzr

0 replies

4h57m

2024-04-30 13:32:16 UTC

No. The SQLite bytecode is much higher-level than the instruction set of a CPU.

wolf550e

0 replies

1h55m

2024-04-30 16:33:34 UTC

It is possible to construct a worst case scenario for bytecode execution, for example very complex expressions in the WHERE and/or SELECT clauses that compute values, and a query plan that performs a full table scan over a table with say 100 million rows that is cached in RAM (or maybe use generate_series, whatever works best).

Computing the expressions should dominate execution time, right?

Then, to compare against the best possible case, we can write a custom C program that uses sqlite internals to perform the same task (full scan of the table, extract values from row) and does not use the bytecode VM and computes the complex expressions in regular C code (e.g. a function that accepts floats and returns a float or whatever).

Then comparing the two implementations will tell us how much faster sqlite can be if it had a "perfect" JIT.

titzer

0 replies

4h32m

2024-04-30 13:57:05 UTC

Part of the benefit of compiling bytecode (or anything) is specializing code to the context (types, values, etc) in which it appears. While I don't doubt your analysis, it could be the case that compiled C code in question is full of branches that can be folded away when specialized to the context of the query, such as the structure of the rows, the type and values of columns, etc.

Basically all of what you are saying about high-level bytecodes applies to dynamic languages, too. But they benefit highly from specializing each bytecode given static and dynamic context, and shortcutting dataflow through local variables.

paulddraper

0 replies

4h1m

2024-04-30 14:27:43 UTC

A key point to keep in mind is that SQLite bytecodes tend to be very high-level

That's right. "Bytecode" is a spectrum. SQLite's bytecode is higher-level than e.g. WASM, JVM, Python, etc. (Notably, because the original source code is higher-level.)

tanelpoder

3 replies

11h40m

2024-04-30 06:48:54 UTC

4. lots of small building blocks of static machine code precompiled/shipped with DB software binary, later iterated & looped through based on the query plan the optimizer came up with. Oracle does this with their columnar/vector/SIMD processing In-Memory Database option (it's not like LLVM as it doesn't compile/link/rearrange the existing binary building blocks, just jumps & loops through the existing ones in the required order)

Edit: It's worth clarifying that the entire codebase does not run like that, not even the entire plan tree - just the scans and tight vectorized aggregation/join loops on the columns/data ranges that happen to be held in RAM in a columnar format.

paulddraper

0 replies

4h0m

2024-04-30 14:29:17 UTC

lots of small building blocks of static machine code precompiled/shipped with DB software binary

You might call those interpreter operations.

lolinder

0 replies

6h37m

2024-04-30 11:52:19 UTC

Isn't what you're describing just an interpreter, bytecode or otherwise? An interpreter walks through a data structure in order to determine which bits of precompiled code to execute in which order and with which arguments. In a bytecode interpreter the data structure is a sequential array, in a plain interpreter it's something more complicated like an AST.

Is this just the same as "interpret the query plan" or am I missing something?

FridgeSeal

0 replies

7h59m

2024-04-30 10:30:27 UTC

That’s a really cool idea! Is there any writeups or articles about this in more detail?

branko_d

3 replies

12h25m

2024-04-30 06:03:45 UTC

2. compiled then interpreted bytecode

You can also compile to bytecode (when building), and then compile that bytecode to the machine code (when running). That way, you can take advantage of the exact processor that is running your program.

This is the approach taken by .NET and Java, and I presume most other runtime environments that use bytecode.

gpderetta

2 replies

8h47m

2024-04-30 09:41:56 UTC

There is also the option of native code generation from bytecode at install time as opposed of runtime.

usrusr

1 replies

6h59m

2024-04-30 11:30:10 UTC

Aka missing out on all the delicious profiling information approaches with later translation enjoy. There's no simple best answer.

0x457

0 replies

1h29m

2024-04-30 16:59:46 UTC

I'm probably wrong, but I always thought for platforms like .Net and JVM, AoT delivers the same performance as JIT in most cases. Since a lot of information is available before runtime, unlike JS where VM always needs to be read, ditch optimized byte code and go back to the whiteboard.

fancyfredbot

1 replies

8h24m

2024-04-30 10:05:00 UTC

For simple functions which are not called repeatedly you have to invert this list - interpreted code is fastest and compiling the code is slowest.

The complied code would still be faster if you excluded the compilation time. It's just that the overhead of compiling it is sometimes higher than the benefit.

galaxyLogic

0 replies

2h4m

2024-04-30 16:25:07 UTC

I wonder in actual business scenarios isn't the SQL fully known before an app goes into production? So couldn't it make sense to compile it all the way down?

There are situations where the analyst enters SQL into the computer interactively. But in those cases the overhead of compiling does not really matter since this is infrequently done, and there is only a single user asking for the operation of running the SQL.

DeathArrow

1 replies

11h41m

2024-04-30 06:48:08 UTC

There is also JIT bytecode.

paulddraper

0 replies

3h57m

2024-04-30 14:32:21 UTC

Yes, the timing of that compilation of bytecode and machine code can be either AOT or JIT.

For example, Java/JVM compiles bytecode AOT and compiles machine code JIT.

vmchale

0 replies

3h58m

2024-04-30 14:30:32 UTC

APL interpreters are tree-walking, but they're backed by C procedures. Then GCC optimizes quite well and you get excellent performance! With stuff like vector instructions.

Getting on par with GCC/Clang with your own JIT is pretty hefty.

GartzenDeHaes

0 replies

9h21m

2024-04-30 09:08:09 UTC

You can also parse the code into an augmented syntax tree or code DOM and then directly interpret that. This approach eliminates the intermediate code and bytecode machine at the cost of slower interpretation. It's slower due to memory cache and branch prediction issues rather than algorithmic ones.

AtlasBarfed

0 replies

5h11m

2024-04-30 13:17:38 UTC

optimizing runtimes (usually bytecode) can beat statically compiled code, because they can profile the code over time, like the JVM.

... which isn't going to close the cold startup execution gap, which all the benchmarks/lies/benchmarks will measure, but it is a legitimate thing.

I believe Intel CPUs actually sort of are #2. All your x86 is converted to microcode ops, sometimes optimized on the fly (I remember Intel discussing "micro ops fusion" a decade ago I think).

coolandsmartrr

1 replies

12h14m

2024-04-30 06:15:08 UTC

I am amazed that the author (D Richard Hipp) made an effort to find and respond to a tweet that was (1) not directed/"@tted" at him or (2) written in his native language of English (original tweet is in Japanese[1]).

[1] https://twitter.com/gorilla0513/status/1784623660193677762

bena

0 replies

5h39m

2024-04-30 12:49:37 UTC

He might have a google alert for sqlite.org

chrisaycock

28 replies

14h58m

2024-04-30 03:31:28 UTC

SQLite's design docs were the first time I had seen a database use a virtual machine instead of walking a tree. I later noticed VMs in libraries, embedded DSLs, and other applications outside of large general-purpose programming languages. That really drove home for me that VMs could be anywhere and were often a useful step in handling a user's expressions.

bane

6 replies

14h8m

2024-04-30 04:20:57 UTC

VMs really can be. They have a long history in code portability. In the old days it really wasn't uncommon to use an approach centered around some interpretable byte code running in a vm, where the reusable vm was all that needed porting to different architectures and operating systems. This all happened well before Java.

It was really big in gaming, Zork, Sierra games, LucasArts games, and even a few more "action" games like Flashback were all designed around VMs to make porting somewhat sane. Running the byte code is usually not the performance bottleneck in these cases but drawing stuff to the screen is, and the VM had to handle that.

runlaszlorun

4 replies

13h11m

2024-04-30 05:18:20 UTC

And Pascal p-code! Not the first, I’ve heard, but I believe it’s close to being the first.

pjmlp

3 replies

12h31m

2024-04-30 05:57:41 UTC

One of the first was Burroughs B5000,

https://en.wikipedia.org/wiki/Burroughs_Large_Systems

It used an almost memory safe systems language, ESPOL, zero Assembly, all CPU capabilities are exposed via intrisics, one of the first recoded uses of unsafe code blocks, there was tagging and capabilities support, the CPUs were microcoded. All of this in 1961, a decade before C came to be.

ESPOL was quickly replaced by NEWP, although there are very little data when it happened, probly a couple of years later.

Nowadays it is still sold by Unisys under the guise of being a mainframe system for those that value security above all, as ClearPath MCP, and you can get NEWP manual.

https://www.unisys.com/solutions/enterprise-computing/clearp...

https://public.support.unisys.com/aseries/docs/ClearPath-MCP...

kragen

2 replies

12h19m

2024-04-30 06:09:48 UTC

the b5000 was one of the first non-virtual stack machines, but its instruction set isn't any more of a virtual machine than the 8088's or the pdp-10's. there were a number of interpreted stack languages in the 50s, though not nearly as many as there would be later

pjmlp

1 replies

12h4m

2024-04-30 06:25:23 UTC

When one does digital archeology is it quite common to see Assembly referred to as bytecode, when the CPUs are actually interpreters written in microcode.

Another example, all the Xerox PARC workstations, which loaded the respective interpreter (Smalltalk, Interlisp, Mesa, Mesa/Cedar) into the CPU as first step during the boot process.

whartung

0 replies

2024-04-30 18:25:59 UTC

According to 5s on Google, the x86 has always been microcoded. I guess the argument is that “machine language” is the public API of the CPU, and “assembly language” is the higher level script used to, mostly, directly create machine language.

Is Intel microcode able to be changed by anyone? I understand that even CPUs get field updates nowadays.

gpderetta

0 replies

8h42m

2024-04-30 09:47:10 UTC

Don't know about Flashback, but Another World famously was VM based.

anon291

6 replies

13h24m

2024-04-30 05:04:42 UTC

Stack-based VMs, like SQLite's (I think), ARE trees. A stack based VM's bytecode (without DUP and POP) is just the post-order depth-first traversal of the corresponding expression tree. With DUP you have a connected acyclic DAG. With POP you have an acyclic DAG with one or more components. With loops you have a full graph.

When looked at this way, a VM makes the most sense actually because a pointer-heavy tree implementation is just terrible for caching and is super wasteful. Also, most SQL plans are trees (Until you get to WITH RECURSIVE).

I'm working on a CEP in C right now and a stack-based bytecode is simply the most 'obvious' way to represent a tree when you don't want to have to deal with lots of memory allocations. Just pre-allocate a buffer large enough and increment and go.

Either way, try taking something like the following and printing an expression tree like the one below

  LITERAL 2
  LITERAL 3
  LITERAL 4
  MUL
  PLUS

now, write something to produce on a terminal, the following:

You should be able to do this in a simple iteration through the ops. Try it! Then try it with the variations above.

Now, try writing ops to manipulate and rewrite it. It's actually really a fun way to represent trees.

Scaevolus

4 replies

13h6m

2024-04-30 05:22:54 UTC

SQLite's VM is register-based, not stack-based.

ojosilva

1 replies

11h56m

2024-04-30 06:32:53 UTC

Does SQLite's VM have an API? I mean, one where I can build and issue opcodes into directly.

OskarS

0 replies

3h43m

2024-04-30 14:46:00 UTC

Not a public one, because they want to have the freedom to change how it works. But if you type in `EXPLAIN CREATE TABLE x(y);` or `EXPLAIN SELECT 3;` or whatever into `sqlite3`, you can see it. And it's reasonably straightforward to dig into the source code (SQLite is very well-written C) to hook that up yourself if you really wanted to.

notpeter

0 replies

4h7m

2024-04-30 14:21:46 UTC

This was the question I was going to ask! I've recently been diving into Lua 5.x internals and have been extremely impressed with Lua's register-based bytecode implementation. Lua has a stable byte code interface between patch releases (5.4.0 -> 5.4.1, etc) but not between major/minor revisions (5.3 -> 5.4). SQLite on the other hand does not consider this a public interface at all!

"Remember: The VDBE opcodes are not part of the interface definition for SQLite. The number of opcodes and their names and meanings change from one release of SQLite to the next." https://www.sqlite.org/opcode.html#the_opcodes

For anyone interested in understanding how a register-based VM operates I highly recommend:

A No-Frills Introduction to Lua 5.1 VM Instructions by Kein-Hong Man. https://www.mcours.net/cours/pdf/hasclic3/hasssclic818.pdf

bch

0 replies

11h47m

2024-04-30 06:41:50 UTC

It’s been both - was stack, converted to register[0][1].

[0] https://www.sqlite.org/vdbe.html

[1] https://www.sqlite.org/src/info/051ec01f2799e095

Sesse__

0 replies

2h0m

2024-04-30 16:29:01 UTC

Also, most SQL plans are trees (Until you get to WITH RECURSIVE).

WITH RECURSIVE can also generally be implemented tree-style, you just loop a bunch in one of the iterators.

There are some databases, like Umbra, which can have DAG query plans for the benefit of more efficient groupjoin (GROUP BY pushed down into a join). IIRC some of the unnesting patterns can also benefit from non-tree plans.

runlaszlorun

4 replies

14h37m

2024-04-30 03:52:09 UTC

SQLite is the first one I’ve looked at the internals of. Do others walk an AST of the query instead?

hn_throwaway_99

1 replies

14h33m

2024-04-30 03:55:50 UTC

FTA:

Tree-Of-Objects → The input SQL is translated in a tree of objects that represent the processing to be done. The SQL is executed by walking this tree. This is the technique used by MySQL and PostgreSQL.

Sesse__

0 replies

2h25m

2024-04-30 16:03:43 UTC

Note that this only holds true for fairly recent MySQL (MySQL 8.x, not including the oldest 8.0.x releases). 5.7 and older, and by extension MariaDB, instead models pretty much everything as a large fixed-function recursive function that calls itself for each new table in the join, plus some function pointers for handling of GROUP BY and such.

TBH, when it comes to query execution (A JOIN B JOIN C GROUP BY …), I think the difference between SQLite's bytecode and a tree of iterators (the classical Volcano executor) is fairly small in practice; they are quite interchangeable in terms of what they can do, and similar when it comes to performance. The difference between bytecode and tree structure is much larger when it comes to evaluation of individual expressions (a + b * cos(c)), especially since that involves much more of a type system; that is more obviously in the “bytecode is the way to go” camp to me.

runlaszlorun

0 replies

14h21m

2024-04-30 04:07:48 UTC

Sorry, I’d read the article a couple years ago and forgot he goes into depth on the other approaches. My bad.

__s

0 replies

14h34m

2024-04-30 03:55:20 UTC

Yes, postgres for example. It maps pretty close to what you see from `explain` where the ops are pretty high level, reducing interpreter overhead. JIT's big gain is speeding up reading values out of row data

whateveracct

1 replies

14h14m

2024-04-30 04:15:11 UTC

Everything is either a compiler or interpreter

Zambyte

0 replies

13h59m

2024-04-30 04:29:52 UTC

That's why SICP is often brought up in contexts that are not explicitly about interpreters :)

surfingdino

1 replies

13h20m

2024-04-30 05:09:22 UTC

Doesn't MS Word internally run a Forth-like VM? I remember reading an article by someone who decompiled an early MS-DOS version of Word only to discover that there was a VM inside.

jpfr

0 replies

9h37m

2024-04-30 08:52:23 UTC

They called it p-code at the time. The purpose was (purported) to simplify the porting between architectures.

https://casadevall.pro/articles/2020/11/compiling-word-for-w...

bitwize

1 replies

6h30m

2024-04-30 11:58:50 UTC

Wozniak found that 16-bit arithmetic was much easier to perform on a 16-bit machine. So he wrote SWEET16, a VM with its own bytecode, to work with 16-bit pointers in Apple's Integer BASIC.

Similarly, the TI-99/4 series of computers had a cumbersome way of accessing memory, because most of the RAM in the machine was exclusively controlled by the video chip and could only be accessed by poking a few registers. So the designers implemented a bytecode VM in ROM called GPL (Graphics Programming Language) that would make accesses to this video RAM seem like straightforward RAM accesses. TI encouraged the use of GPL to develop games and other applications for the system. Unfortunately, the fact that it was interpreted by the CPU, plus the fact that the CPU could only access video-chip memory during the horizontal and vertical retrace intervals, made GPL execution very slow. And since the machine's in-ROM BASIC interpreter was written in GPL, you now know why BASIC programs on the TI-99/4 and /4A crept along at the most leisurely of paces.

So VMs were definitely a thing used to make certain kinds of tradeoffs, even way back in the 8-bit days when systems had mere kilobytes of memory. Who knows how many might be lurking in a modern system!

krallja

0 replies

5h26m

2024-04-30 13:03:06 UTC

The unfortunate history of SWEET16 is that it was never used for its intended purpose - https://retrocomputing.stackexchange.com/a/14513/11579

naasking

0 replies

5h6m

2024-04-30 13:23:08 UTC

That really drove home for me that VMs could be anywhere and were often a useful step in handling a user's expressions.

I think it's a special case of the fact that finite state machines are everywhere you look.

kqr

0 replies

13h42m

2024-04-30 04:47:09 UTC

Or indeed any time complexity need to be layered. A VM design doesn't even have to have an explicit bytecode compilation stage -- you can write an interpreter that runs as the instructions are issued.

David Parnas talks a lot about this as a way to reliable modularisation: https://two-wrongs.com/deliberate-abstraction.html

hiAndrewQuinn

0 replies

13h59m

2024-04-30 04:29:58 UTC

VMs are a persistently underrated tool by people who think the conversation begins and ends at VirtualBox. They are a phenomenal way to constrain the surface area of what one's own code has to target, and that's always a plus.

megadal

17 replies

13h19m

2024-04-30 05:10:13 UTC

Perhaps my understanding is off, but I am pretty sure parsing and translating SQL into bytecode still involves an AST.

Just that query processing itself is done from the bytecode (produced presumably from an AST or something similar) rather than directly from the AST itself.

If I'm right I can't really see how this performs better unless you're excluding the parsing step from benchmarks

lifthrasiir

11 replies

13h16m

2024-04-30 05:13:23 UTC

You don't need one, Lua is another example where no AST is ever generated. In some sense the resulting bytecode closely corresponds to the AST that would have been generated though.

megadal

10 replies

13h3m

2024-04-30 05:26:06 UTC

Genuinely asking as parsing without an AST is something I've never seen explained: How do you go from source code to bytecode without an AST?

Isn't the bytecode just a flattened representation of an AST obtained by some sort of tree traversal?

This seems to imply an AST is involved in the generation of the bytecode

ufo

1 replies

6h26m

2024-04-30 12:02:48 UTC

It's as if they generated an AST and then immediately converted it to bytecode, all at once.

The key restriction is that you must be able to generate the bytecode in a single pass over the source code. For example, the compilation of a function at the top of the file must not depend on declarations further down the file.

vidarh

0 replies

5h37m

2024-04-30 12:52:24 UTC

Depending on the form the dependencies takes you can handle that too. E.g. if it's "just" about knowing addresses, then just building up a list of relocation records where you need to patch in addresses as they become known works just fine. Of course the more work like that you have to do, the less you save over just building an AST anyway.

vidarh

0 replies

5h39m

2024-04-30 12:50:21 UTC

Consider that if the bytecode is just a flattened representation, then instead of generating and traversing the tree, you can just traverse what would have been turned into the tree in the same traversal order directly.

E.g. imagine you're in a leaf production for some simple grammar for an operator that can only take integers, w/all error handling ignored:

     parse_fooexpr() {
          left = parse_int
          emit_int(left)
          expect_token(FOO_OP)
          right = parse_int
          emit_int(right)
          emit_foo_op()
     }

Whether emit_int outputs a "push <someint>" VM instruction to a bytecode buffer, or constructs an IntNode and pushes it to an argument stack, and whether "emit_foo_op" emits a "foo_op" bytecode instruction or pops the argument stack and constructs a FooOpNode(...argument stack values) AST node is an implementation detail in terms of the parser.

Wirth's compilers usually didn't use an AST, and his book on compiler construction contains a thorough walkthrough on generating code as you parse, if you want to see it explained in a lot more detail and w/tricks to generate better code than the most naive approaches will.

The difficulty with this approach comes when you want to apply optimisations or where it's inconvenient to generate the code in parse-order because the language tries to smart about what feels better to write, but you can get pretty far with this method.

thewakalix

0 replies

12h17m

2024-04-30 06:12:24 UTC

See also: DOM versus streaming.

lolinder

0 replies

6h24m

2024-04-30 12:04:40 UTC

The second half of Crafting Interpreters [0] does this—rather than outputting the AST, the bytecode interpreter outputs bytecode directly. Essentially, the process of creating the AST is already very similar to the tree traversal you'll do to unpack it (recursive descent parsing is a traversal of the tree, just in the call stack rather than over a real data structure), so if you don't need the AST you can skip it and just start outputting bytecode as if you were already walking the tree.

[0] http://craftinginterpreters.com/

lifthrasiir

0 replies

12h47m

2024-04-30 05:41:40 UTC

Have you used any parser generator like yacc/bison? They have "actions", which are arbitrary codes that will be executed when some grammar production is detected. For example, `expr ::= expr mulop expr { some code }` will execute `some code` when a multiplicative expression is detected, where intermediate results from two `expr`s and `mulop` are available to that code. This concept of actions generally applies to all sort of parsers, not just generated parsers.

Those actions would typically allocate and build (partial) ASTs, but you can do anything with them. You can for example directly evaluate the subexpression if your grammar is simple enough. Likewise bytecodes can be generated on the fly; the only concern here is a backward reference, which has to be patched after the whole expression block gets generated, but otherwise you don't have to build any tree-like structure here. (Most practical parsers only need a stack to function.)

ksherlock

0 replies

5h36m

2024-04-30 12:53:00 UTC

Take a look at the evergreen Let's Build a Compiler, by Jack Crenshaw

https://compilers.iecc.com/crenshaw/

Chapter 2 and 3, expression parsing, should give you the idea. Instead of building the AST, you build code. So you're cutting out the middle man (AST).

kccqzy

0 replies

11h46m

2024-04-30 06:42:31 UTC

Another way to think about this is to imagine you are working in a lazy-by-default language like Haskell. The code might seem to build up an AST and then evaluate it, but (disregarding parse errors) the AST is never materialized in memory at once: the tree might only have one level, with its children represented by thunks that continue to parse more of the source code. (If you are unfamiliar with Haskell laziness, imagine that the so-called tree has children that are functions to produce the next level of the tree.) Of course you will need a carefully designed bytecode in order to generate bytecode from AST where the children is not yet known.

kazinator

0 replies

12h29m

2024-04-30 06:00:13 UTC

How you go from source to target code without an AST is that the syntax-directed translation step in your implementation (that which would build the AST) doesn't bother with that and just builds the output code instead. The extra traversal is skipped; replaced by the original parser's traversal of the raw syntax.

E.g. pseudo-Yacc rules for compiling the while loop in a C-like notation.

  while_loop : WHILE '(' expr ')' statement
               {
                   let back = get_label();
                   let fwd = get_label();
                   let expr = $3; // code for expr recursively generated
                   let stmt = $5; // code for statement, recursively generated
                   $$ = make_code(stmt.reg(),
                                  `$back:\n`
                                  `${expr.code()}\n`
                                  `BF ${expr.reg}, $fwd\n`  // branch if false
                                  `${stmt.code()}\n`
                                  `JMP $back\n`
                                  `$fwd:\n`)                                
               }
               ;

Every code fragment coming out of a rule has .code() and .reg(): the generated code, which is just a string, and the output register where it leaves its value. Such representational details are decided by the compiler writer.

The while statement produces no value, so we just borrow the statement's .reg() as a dummy in the call to make_code; our rule is that every code fragment has an output register, whether it produces a value or not.

When the LALR(1) parser reduces this while loop rule to the while_loop grammar symbol, the expr and statements have already been processed; so the rule action has ready access to the code objects for them. We just synthesize a new code object. We grab a pair of labels that we need for the forward and back jump.

I'm assuming we have a vaguely JS-like programming language being used for the grammar rule actions, in which we have template strings with interpolation, and adjacent strings get merged into one. The bytecode assembly is line oriented, so we have \n breaks.

One possible kind of expression is a simple integer constant, INTEGER:

  expr : INTEGER 
         {
           let reg = allocate_reg();
           let val = $1
           $$ = make_code(reg,
                          `LOAD $reg, #$val\n`)
         }

One possible statement is an empty statement dented by empty curly braces:

  statement : '{' '}'
              {
                $$ = make_code(R0,  // dedicated zero register
                               ""); // no-code solution
              }

So then when we have while (1) { } we might get R1 allocated in the expr rule, like this:

  LOAD R1, #1\n   ; output register is R1

then in the while loop, things get put together like this:

  L0:\n           ; back label
  LOAD R1, #1\n   ; code for expr
  BF R1, L1\n     ; branch if false to L1
                  ; no code came from empty statement
  JMP L0          ; back branch
  L1:\n           ; fwd label

UK-AL

0 replies

3h36m

2024-04-30 14:52:53 UTC

Parse Tree and AST are slightly different. A parse tree can be implict. A recursive descent parser can can explore the parse tree without there being an explict parse tree data structure.

miloignis

3 replies

13h13m

2024-04-30 05:16:18 UTC

An AST being generated as an intermediate step is mentioned in the article, at least in passing in section 2.4.

The reason bytecode is generally faster (not just for SQL, but in most interpreted languages you may use (Python, etc)) is that walking the AST is relatively expensive and doesn't treat caches nicely. Bytecode is generally located next to each other in memory, and you can make the bytecode fetch/dispatch pretty tight, relative to indirect function calls on an object in an AST.

rhdunn

2 replies

12h56m

2024-04-30 05:32:42 UTC

Another advantage to that is that it avoids the branching of the while loop in the interpreter that iterates over the AST, providing better instruction pipelining with having all the run code next to each other.

The downside -- especially for dynamic languages like JavaScript -- is that you need to keep all of the type checks and fast-paths in the code, resulting in larger code blocks. With more type analysis you could group fast-path instructions together (e.g. within a while or for loop) but that takes time, which is typically why a JIT engine uses multiple passes -- generate the slower machine code first, then improve the fast-path blocks for code that is long running.

epcoa

1 replies

10h14m

2024-04-30 08:14:42 UTC

Another advantage to that is that it avoids the branching of the while loop in the interpreter that iterates over the AST

Huh? A bytecode interpreter still ends up branching based on the decoded value of the byte codes. The VM bytecodes are not directly executed by the CPU (at least not usually, Jazelle and some older P-code stuff being rare exceptions).

This is what it looks like for the example being discussed ("a massive switch statement"): https://github.com/sqlite/sqlite/blob/b11daa50f9ea11c332bb59...

Perhaps confusing bytecode generation and interpretation with JIT? JIT is often paired with a bytecode but neither is dependent on the other.

rhdunn

0 replies

6h57m

2024-04-30 11:32:21 UTC

The parent mentioned avoiding the dispatch of a virtual function call used to evaluate the AST/bytecode so I was assuming it was a minimal JIT instead of running the resulting bytecode in an interprative loop.

But yes, if you are interpreting the intermediate VM bytecode you will still have a switch or dispatch branch when evaluating each instruction.

katzenversteher

0 replies

12h54m

2024-04-30 05:35:19 UTC

Quote from the article:

"The bytecode generated by SQLite is usually smaller than the corresponding AST coming out of the parser. During initial processing of SQL text (during the call to sqlite3_prepare() and similar) both the AST and the bytecode exist in memory at the same time and so more memory is used then. But that is a transient state. The AST is quickly discarded and its memory recycled [...]"

userbinator

13 replies

14h26m

2024-04-30 04:03:13 UTC

The problem of rendering a tree-of-objects as a table is sufficiently difficult that nobody does it, as far as I know. Hence, no tree-of-objects database engine provides the level of detail in their "EXPLAIN" output that SQLite provides.

I believe Microsoft SQL Server uses an object tree internally, and yet its query plan output is a table:

https://learn.microsoft.com/en-us/sql/t-sql/statements/set-s...

bryancoxwell

4 replies

8h26m

2024-04-30 10:03:01 UTC

I don’t doubt the author, but what is it that makes rendering a tree of objects to a table a difficult problem? Is that not what browsers do when they render a table element?

mfoc

2 replies

3h4m

2024-04-30 15:24:38 UTC

Quote from the article:

"A tree-of-objects representation is more difficult to publish in a human-readable form. The objects that comprise the tree tend to all be very different, and thus it is tricky to come up with a consistent and simple table representation with which to display the objects. Any any such table representation that you do come up with would almost certainly have more than six columns, probably many more. The problem of rendering a tree-of-objects as a table is sufficiently difficult that nobody does it"

To further elaborate on this important point.

There is an 'impedance mismatch' (conceptual difficulty mapping between the two logic models) between the tree abstract data type and the table abstract data type. Specifically, there are four key differences between the simple table data structure and the more complex tree data structure that makes mapping between them a non-trivial operation.

Hierarchical: A table has a flat representation; a tree has a hierarchical representation.

Order: The order of the rows in a table typically do not matter (they may have a unique rowid). The order of branches (nodes) and leaves in a tree is important, and the ordering itself in an encoding of valuable information.

Semi-structured: A table has a fixed structure (rows multiplied by columns). A tree has a flexible structure - an arbitrary combination of branches (internal nodes) and leaves (terminal nodes). Semi-structured data has a structure that may not necessarily be known in advance, the tree has irregular and variable formation; the tree may have branches with missing or supplementary nodes.

Meta-data: The information describing the meaning of the data in a table is typically stored separately from the table - consequently a schema is often mandatory. A schema is optional for a tree abstract data type.

As an aside, I have been visiting hacker news almost daily since 2010. This is my first comment on hacker news. I want to say thank you to the community for the many valuable insights over this years.

galaxyLogic

1 replies

1h53m

2024-04-30 16:35:37 UTC

Congratulations for your first comment!

What I want to do is represent my tree-structure as a table in a relational database and then be able to efficiently get the tree structure back by transforming the table-representation back into a tree.

Further I would like to do that in plain standard SQL. This must be a common problem, any documented solutions out there?

mfoc

0 replies

1h11m

2024-04-30 17:17:30 UTC

Thank you.

Great question - you have touched on the key difference between a labeling scheme and an encoding scheme for tree data structures.

As mentioned previously, the tree is an abstract data type, that is to say, a conceptual model that defines the nodes in a tree, and their relationships.

To be able to evaluate a expression that processes a tree, one needs a labeling scheme. The purpose of a labeling scheme is to assign unique labels to each node in the tree and these labels must facilitate node ordering, and often (but not always) a labeling scheme will permit the reconstruction of the tree structure.

However, no labeling scheme captures the node type, names or the content stored at the nodes. For that we need an encoding scheme. An encoding scheme is constructed upon a labeling scheme and augments it with the information necessary to fully represent the tree in a table-like data structure. In answer to your question, it also permits the full transformation from the table representation to the original tree structure.

Thus, it sounds like what you are looking for is an encoding scheme.

There are many different labeling schemes out there for tree structure data, and virtually all of them can be augmented with additional information to construct a complete encoding scheme. Concerning documented solutions - I have not been active in this space for a number of years, so off the bat - I don't have a recommended documented solution to point you too.

But to help, I will put a link to my PhD thesis [1] which gives a more in-depth understanding of labeling schemes and encoding schemes for tree structured data with an example of a simple implementation of an encoding scheme enabling the full transformation from the table representation to the original tree structure (pages 5-9) and a survey of the advantages and disadvantages of existing labeling schemes concerning their usefulness to be part of an encoding scheme you could use in your solution (see chapter 2)

Caveat 1: My thesis was written in the context of updating dynamic (XML) trees but it addresses the transformation between tree and table data structures.

Caveat 2: The thesis was written 11 years ago, but every now and then I have kept in touch with the latest developments in the area, and to my knowledge, there have been no major developments since.

I hope it helps.

[1]: https://doras.dcu.ie/19316/

IsTom

0 replies

8h20m

2024-04-30 10:08:51 UTC

At least they're hard to display on a terminal. Reading EXPLAIN ANALYZE diagrams from psql does not spark joy.

yayr

2 replies

6h42m

2024-04-30 11:47:00 UTC

Actually, SAP HANA has a nice tool called PlanViz for that in Eclipse. A google image search gives you a good impression how that works. There are also some blogs about it, e.g. https://community.sap.com/t5/enterprise-resource-planning-bl...

Cthulhu_

1 replies

5h48m

2024-04-30 12:41:27 UTC

I love how the example SQL query uses abbreviated columns that sound like runes:

    select
    --   count(*)
    a.mandt, b.rldnr, a.bukrs, a.gjahr, a.belnr, b.docln, a.buzei
    --     *
    from       BSEG    as a
    innerjoin  ACDOCA  as b
    on    b.rclnt  = a.mandt
    and   b.rbukrs = a.bukrs
    and   b.gjahr  = a.gjahr
    and   b.belnr  = a.belnr
    where a.mandt  = '715'
    and   b.rldnr  = '0L'
    and   b.docln  = '000001'
    and   a.buzei  = '001'
    and   a.gjahr  = '2018'
    --and   a.gjahr = '2017'
    --and   a.gjahr = '2019'
    --order by a.mandt, b.rldnr, a.bukrs, a.gjahr, a.belnr, b.docln, a.buzei
    limit 200;

Just realised this is probably from German too, "jahr" being year.

yayr

0 replies

5h25m

2024-04-30 13:03:59 UTC

yeah, these accounting table field names have not changed since around 30 years, which is not necessarily a bad thing...

for a human readable view of this you can look e.g. here https://docsfortec.com/sap/S4/tables/ACDOCA

jiggawatts

1 replies

14h9m

2024-04-30 04:20:09 UTC

Typically you get XML for showplan because it can represent the tree structure better.

kruador

0 replies

6h53m

2024-04-30 11:36:04 UTC

You get XML for estimated execution plans if you use SET SHOWPLAN_XML ON. You get the actual execution plan XML along with your results if you use SET STATISTICS XML. You can see cached execution plans in the sys.dm_exec_query_plan dynamic management view.

The estimated/actual execution plan feature in SQL Server Management Studio was changed to use XML in SQL Server 2005. The SQL Server team are only adding new information to the XML representation, not the text representation.

The name "estimated" execution plan is a bit misleading. If you executed the statement(s) with the current indexes and statistics, that's how it would be executed. I think the "estimated" part of the name is because the number of rows returned from each node, and the execution time of each node, is an estimate.

Assuming that the indexes and statistics don't change, you should expect the "actual" execution plan to have the same shape. It will just be annotated with the number of rows that were actually retrieved as well as the estimated numbers. SQL Server doesn't re-plan if it turns out the number of rows was greater (or fewer) than expected.

As a SQLite and SQL Server user, I find that I would like something more detailed from SQLite than EXPLAIN QUERY PLAN (which just tells you the join order), but less detailed than the bytecode. Particularly I want to know why it chose that join order or to use that index.

zachmu

0 replies

2024-04-30 18:19:42 UTC

Dolt's EXPLAIN output prints the execution tree directly. E.g.:

    explain select * from xy join uv on (x = u and u  > 0) where u < 2;
    Project
     ├─ columns: [xy.x:2!null, xy.y:3, uv.u:0!null, uv.v:1]
     └─ LookupJoin
         ├─ IndexedTableAccess(uv)
         │   ├─ index: [uv.u]
         │   ├─ static: [{(0, 2)}]
         │   ├─ colSet: (3,4)
         │   ├─ tableId: 2
         │   └─ Table
         │       ├─ name: uv
         │       └─ columns: [u v]
         └─ IndexedTableAccess(xy)
             ├─ index: [xy.x]
             ├─ keys: [uv.u:0!null]
             ├─ colSet: (1,2)
             ├─ tableId: 1
             └─ Table
                 ├─ name: xy
                 └─ columns: [x y]

winternewt

0 replies

7h1m

2024-04-30 11:27:49 UTC

Same with PostgreSQL. You can choose between text output (one table row per line), XML or JSON. But none of them are ordered according to a strict execution order like bytecode would be. To me that makes perfect sense though because if represents what can be parallelized in the plan. The bytecode representation in SQLite seems to have the inherent limitation that it is single threaded.

slaymaker1907

0 replies

12h25m

2024-04-30 06:04:00 UTC

It can also execute a query incrementally. In fact, certain features rely on this behavior. “Watch live data” (in SSMS) for extended events is actually an infinite table-valued function. It’s not exactly rocket science to do in a database, you just need to model data sources and operators as iterators.

alex_smart

11 replies

12h43m

2024-04-30 05:45:54 UTC

I recently implemented my own expression evaluator in java for in-memory data frames, and once you think about doing that deeply, you very quickly understand the need for bytecode. If you directly evaluate the expression using a tree representation, you basically have to do a whole lot of branching (either via switch statements or polymorphism) for every single line of useful operation. Yes, the branch predictor kicks in and it means that your code wouldn’t be as slow as if it didn’t, but it is still measurably slower than if you converted the expression into bytecode once and just ran that on all rows instead.

eklavya

8 replies

11h48m

2024-04-30 06:40:45 UTC

Bytecode will only impact packing, so more efficient ram, cache and cpu wise. But I don't understand how it would help with branching? You still have to make the same decisions? As in the bytecode executor still needs to do differnt things based on the op code, its not in hardware.

hickelpickle

3 replies

10h33m

2024-04-30 07:56:03 UTC

There is threaded bytecode as well, which uses direct jumping vs a switch for dispatch. This can improve branch prediction, though it is a debated topic and may not offer much improvement for modern processors.

immibis

1 replies

9h7m

2024-04-30 09:21:53 UTC

How does it know where to jump to?

chuckadams

0 replies

4h15m

2024-04-30 14:14:16 UTC

The jump target is compiled into the bytecode, so rather than return to the big switch statement, it jumps straight to the next opcode's implementation. The process is called "direct threading". These days a decent switch-based interpreter should fit in cache, so I'm not sure direct threading is much of a win anymore.

kaba0

0 replies

3h11m

2024-04-30 15:18:27 UTC

Do you have perhaps some links/references on that?

I have once tried benchmarking it by writing a tiny VM interpreter and a corresponding threaded one with direct jumps in Zig (which can force inline a call, so I could do efficient direct jumps) and I have - to me surprisingly- found that the naive while-switch loop was faster, even though the resulting assembly of the second approach seemed right.

I wasn’t sure if I saw it only due to my tiny language and dumb example program, or if it’s something deeper. E.g. the JVM does use direct threaded code for their interpreter.

alex_smart

3 replies

8h5m

2024-04-30 10:24:19 UTC

Consider evaluating an expression like "col1 + col2 == col3"

A tree based evaluator has to do something like -

    for (int i = 0; i < df.size(); i++) {
        // switch on expression type (column, binary operator, unary operator etc)
        // switch on expression operator type (add, subtract, equals etc)
        // switch on column type (int, long, float etc)
        // actual evaluation
    }

whereas a bytecode based evaluator is able to run something equivalent to -

    for (int i = 0; i < df.size(); i++) {
        result[i] = df.getLong(column1Index, i) + df.getLong(column2Index, i)) == df.getLong(column3Index, i)
    }

naasking

2 replies

4h59m

2024-04-30 13:29:51 UTC

You still have to branch on opcode selection which you're omitting in your translation. Branching on "expression type" and column type in your first example is also unnecessary. Bytecodes have different opcodes to operate on different types and different arities, so in both cases you have only one unpredictable branch for each operation/opcode.

The main benefit of bytecodes is then cache friendliness, by removing all of those unpredictable loads and stores to obtain the expression details (opcode, arguments, etc.). All of those are inlined into the bytecode array which is already in cache.

alex_smart

1 replies

1h15m

2024-04-30 17:14:26 UTC

Branching on "expression type" and column type in your first example is also unnecessary

It’s not?

naasking

0 replies

56m

2024-04-30 17:32:42 UTC

It is unnecessary, only branching on expression operator type is strictly necessary. You're trying to abstract too much in your AST which yields an inefficient interpreter. I can invent an inefficient bytecode too, but that wouldn't be a correct evaluation of how fast bytecode interpretation could be when done right.

sitkack

1 replies

9h58m

2024-04-30 08:30:34 UTC

You should look at https://janino-compiler.github.io/janino/ it can compile Java into class files in memory that can be directly executed.

alex_smart

0 replies

8h27m

2024-04-30 10:01:48 UTC

I found this by looking at spark source code to figure out what they were doing under the hood to deal solve this problem. :)

manx

10 replies

10h41m

2024-04-30 07:48:09 UTC

I'm wondering if one could write this bytecode directly (or with a higher level imperative language) instead of SQL. Often, the programmer knows exactly which index lookups need to happen in a loop, while it seems like a burden to express that in SQL. This might also be an opportunity to create a different type safe dsl for database access.

thayne

4 replies

10h32m

2024-04-30 07:57:14 UTC

I was wondering the same thing. And in particular if a new query language that avoided many of the pitfalls of SQL could compile down to that bytecode and avoid having to deal with SQL as an intermediate representation. Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.

nraynaud

1 replies

8h59m

2024-04-30 09:29:51 UTC

I'm puttig my wish list here:

- be able to put the projection in a varable and reuse it (and I think orm people might love it)

- have a quick way to forward the the non-aggregated fields of projection to a group by (maybe with the aforementionned variables)

TachyonicBytes

0 replies

5h41m

2024-04-30 12:48:22 UTC

The DuckDB API I was talking above seem to already meet your use-cases?

- Does this[1] solve the group by wish?

- Depending on what you mean by projection, maybe this[2] or this[3]?

[1] https://duckdb.org/docs/api/python/relational_api#aggregatee...

[2] https://duckdb.org/docs/api/python/relational_api#projectexp...

[3] https://duckdb.org/docs/api/python/expression#column-express...

forinti

0 replies

3h52m

2024-04-30 14:36:44 UTC

A client-server database would hash the SQL and cache the output of the parser. The end result is the same.

Maybe SQLite could have a similar mechanism, but the cache stays on disk or an external memory cache.

astrobe_

0 replies

1h17m

2024-04-30 17:11:47 UTC

Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.

I think we already kind of have that already; one can prepare a query/statement and then use it multiple times. Regex engines also do that, except that in the case of SQlite one can bind different parameter values.

Programmers worried about SQL lexing/parsing times can compile their most used queries once for all at programmer startup. Plus, one can sometimes break a query with annoyingly variable parts into smaller queries glued with SQLite API calls.

wruza

2 replies

8h20m

2024-04-30 10:09:07 UTC

I was advocating for it for decades, but everyone dismisses it with “you don’t know better than rdbms”. That’s almost a religion. Despite the fact that most of the tables most people have are never any big (except for a few oltps that you ought to leave to specific sql server wizards anyway) and you usually do have the idea how it should work. SQL is a cool solution for a set of problems that has a very non-zero xor with a set of problems that we actually have. And most of our complex queries are trivially expressible in imperative loops. Index lookup and a complex plan building are key features of an rdbms, everything else it steals from you and forces to use an arcane inconvenient language that in practice isn’t even a standard. Make plan building and indexing core ops and leave everything else to some sort of a general purpose bytecode, everyone will be happier and will create dozens of DSLs for all their needs.

lqet

1 replies

6h17m

2024-04-30 12:11:52 UTC

I was advocating for it for decades, but everyone dismisses it with “you don’t know better than rdbms”. That’s almost a religion.

Which is simply not true. Query optimization is done heuristically, for the simple reason that you usually need to run the query to get the information required for "perfect" optimization. If the RDBMS really knew better, it wouldn't offer query hints.

kayodelycaon

0 replies

3h39m

2024-04-30 14:50:14 UTC

Postgres doesn't offer query hints. ;)

asvitkine

0 replies

4h18m

2024-04-30 14:10:32 UTC

The main downside is now you're making the bytecode an API which means all future changes need to be backwards compatible.

TachyonicBytes

0 replies

5h49m

2024-04-30 12:40:19 UTC

Something akin to this is available[1] in DuckDB, itself started as a "shell" over SQLite. Using Python, you can compose your SQL plans as you like.

I recall a previous discussion about the SQLite bytecode and potential alternatives, where the main idea was that, if SQLite had gone the other way, you could have a much more general engine, where you could achieve something like DuckDB itself just as a set of extensions.

Reading the draft, it doesn't seem that extensibility of SQLite was a main factor in deciding. Maybe this trade-off also covers extensions somehow, which would be nice to see in the final document.

[1] https://duckdb.org/docs/api/python/expression

MrBuddyCasino

6 replies

13h26m

2024-04-30 05:03:23 UTC

I was surprised the text didn’t mention one major difference between the byte code approach vs AST: coupling.

When your database engine runs in-process, there is no possibility of the server and the client library having diverging versions. But this is common with traditional databases.

Once you bake in the execution steps („how to execute“) instead of describing the query via AST („what to execute“), an important part of the execution logic now lives in the driver.

I suspect this complicates version upgrades and bugfixes, because the database is less self-contained.

Not an issue for sqlite, potentially disastrous for mysql.

tadfisher

3 replies

13h6m

2024-04-30 05:22:43 UTC

Do clients typically communicate with the server in some AST representation instead of, well, SQL? I'd be surprised if that much parsing/planning happens on the client.

MrBuddyCasino

2 replies

12h58m

2024-04-30 05:31:15 UTC

Since prepared statements are created by the driver, I was assuming this was the case - but I might be completely wrong here.

layer8

0 replies

7h23m

2024-04-30 11:06:17 UTC

No, prepared statements are server-side.

_flux

0 replies

12h32m

2024-04-30 05:57:28 UTC

Converting a SELECT to a PRPEPARE does not really require parsing the complete query—or even it it did, some small concessions for this could be implemented in the line protocol to enable the server to do the prepared statement out of client query.

I don't believe *any* SQL client library actually tries to parse e.g. PostgreSQL itself at any point of processing. You can read what the PostgreSQL protocol does here: https://www.postgresql.org/docs/current/protocol-flow.html#P...

evanelias

0 replies

3h39m

2024-04-30 14:50:06 UTC

an important part of the execution logic now lives in the driver

potentially disastrous for mysql

This is completely incorrect. The MySQL client side isn't responsible for execution steps and this isn't part of its wire protocol at all.

With prepared statements, the server parses the statement and gives the client back a handle, which is basically just a unique identifier that the client can use to invoke the prepared statement.

Sesse__

0 replies

1h57m

2024-04-30 16:32:16 UTC

Not an issue for sqlite, potentially disastrous for mysql.

MySQL _completely_ switched execution paradigm through the course of a couple of 8.0.x sub-releases, generally without anyone noticing.

Dwedit

6 replies

14h1m

2024-04-30 04:28:15 UTC

Running bytecode is much lower latency than compiling into native code. If you're not bottlenecked by running the bytecode (as opposed to memory or disk speed), you don't really have to JIT it any further into native code.

winrid

3 replies

13h53m

2024-04-30 04:35:46 UTC

Yeah, but nobody is seriously considering that unless maybe for huge prepared statements. The argument is usually bytecode vs parser and associated data structures.

_flux

2 replies

12h39m

2024-04-30 05:49:39 UTC

PostgreSQL is not only considering it, they're doing it! https://www.postgresql.org/docs/current/jit-reason.html

I don't have personal experience on it, but I've read that in practice it's not worth the effort—at least not yet. Apparently there are some issues with it and it barely speeds up queries (except perhaps certain ones?). I imagine this could be in big part because LLVM is not really a good fit for JIT where you want to spend very little time to do the compilation itself.

winrid

0 replies

9h42m

2024-04-30 08:47:03 UTC

Interesting, thanks. I imagine they will have a query planner for the query planner to determine to JIT or not :)

masklinn

0 replies

12h13m

2024-04-30 06:15:36 UTC

Yeah my experience of the pg jit is mostly negative, it’s quite slow and it has a hard time estimating the cost of interpreted v compilation + compiled execution so more often than not it’s actively detrimental. It might fare better if you make heavy use of a limited number of prepared statements.

rhdunn

0 replies

12h41m

2024-04-30 05:48:20 UTC

Which is why JavaScript engines (and JIT compilers for other languages) are typically designed to:

1. start interpreting the code once it has been parsed, and start jitting a function being called;

2. generate naive bytecode for the function that generates native code equivalents of the run actions of the AST (including some fast-path code for simple cases such as adding two 32-bit integers, and falling back to function calls to perform the add on more complex types);

3. generate more optimal code for sequences of instructions in the background, such as entire for/while loops, then patch in calls to those fast-path versions when ready.

That way you can start running the code immediately after parsing it, and can switch to the faster versions when they are ready if the code takes longer to run in the slower version.

gpderetta

0 replies

8h5m

2024-04-30 10:24:20 UTC

Depends how much compilation you want to do. For example converting threaded code into native direct jumps need not be more expensive than plain bytecode. Of course the gains are also limited.

pdubroy

4 replies

3h59m

2024-04-30 14:29:58 UTC

I think most people associate bytecode VMs / interpreters with general-purpose programming languages, but it's a surprisingly useful concept in other contexts.

Sometimes bytecode VMs appear in unexpected places! A few that I'm aware of:

- eBPF, an extension mechanism in the Linux kernel - DWARF expression language, with interpreters in debuggers like GDB and LLDB - The RAR file format includes a bytecode encoding for custom data transformation

More here: https://dubroy.com/blog/bytecode-vms-in-surprising-places/

xymostech

0 replies

2h50m

2024-04-30 15:38:42 UTC

The original TeX Fonts stored their metrics in TFM (short for TeX font metrics) files, which contains a bytecode interpreter for calculating ligatures and kerning between characters. I learned about that when I tried reading the files myself.

From what I can tell, modern fonts using OpenType just have tables to accomplish something similar now, in the form of the GSUB and GPOS tables?

Documentation for the TFM format here: https://tug.org/TUGboat/Articles/tb02-1/tb02fuchstfm.pdf (search for lig/kern)

omoikane

0 replies

42m

2024-04-30 17:46:36 UTC

I believe these fall under "interpreter pattern":

https://en.wikipedia.org/wiki/Interpreter_pattern

https://gameprogrammingpatterns.com/bytecode.html

https://wiki.c2.com/?InterpreterPattern

I first read about it from the "Design Patterns" book, which I would recommend to everyone.

astrobe_

0 replies

1h16m

2024-04-30 17:12:40 UTC

Also regular expression engines.

WickedSmoke

0 replies

3h29m

2024-04-30 14:59:48 UTC

Bytecode is great for tiny domain languages and I use them in many projects.

Ultima IV used one to animate sprites on it's title screen map. For the next version of XU4 I implemented three bytecode interpreters to script the entire title sequence. There's a high level presentation interpreter, a GPU rendering interpreter, and one for the Ultima IV bytecode.

adontz

3 replies

12h15m

2024-04-30 06:14:18 UTC

Looks like SQLite could benefit from copy-and-patch JIT compiler.

Someone

1 replies

8h28m

2024-04-30 10:00:47 UTC

I think that would make it cross a border where ‘Lite’ no longer applies.

It also would be quite a challenge given their long-term-support statement (https://www.sqlite.org/lts.html):

“Cross-platform Code → SQLite runs on any platform with an 8-bit byte, two's complement 32-bit and 64-bit integers, and a C compiler. It is actively tested on all currently popular CPUs and operating systems. The extreme portability of the SQLite code and file format will help it remain viable on future platforms.”

adontz

0 replies

7h23m

2024-04-30 11:06:25 UTC

There is no implied requirement to support JIT everywhere.

samatman

0 replies

2h31m

2024-04-30 15:58:07 UTC

I think this is unlikely for SQLite (other comments cover why it probably wouldn't happen even if it were likely to benefit), but I happened to have the copy and patch paper open in a tab, so I'll take the opportunity to share it here.

https://sillycross.github.io/assets/copy-and-patch.pdf

It has great potential for many applications, if not this particular one.

pizlonator

2 replies

4h16m

2024-04-30 14:13:15 UTC

That’s awesome, I didn’t know SQLite had a bytecode.

Based on my experience writing interpreters, I know that bytecode is faster to interpret.

(Source: I wrote JavaScriptCore’s interpreter and it’s optimizing JITs. I’ve also worked on other language implementations. I’ve seen the AST vs bytecode thing play out more than once.)

andruc

1 replies

3h29m

2024-04-30 15:00:09 UTC

Faster to interpret than what?

jjice

0 replies

1h49m

2024-04-30 16:40:04 UTC

I assume they meant faster to interpret than a tree of objects, since that's the other concept discussed in the article.

euroderf

2 replies

8h5m

2024-04-30 10:24:02 UTC

Doesn't the Go language also execute bytecode ? There's gotta be a hyperefficient hybrid lurking in there somewhere.

zadokshi

0 replies

7h7m

2024-04-30 11:22:03 UTC

go compiles to binary files and libraries. it’s more like c than Java

euroderf

0 replies

50m

2024-04-30 17:38:47 UTC

Aah OK my bad.

https://github.com/teh-cmc/go-internals/blob/master/chapter1...

zoomablemind

1 replies

5h5m

2024-04-30 13:24:16 UTC

How does this reflect on SQLite's possibilty of parallelizing some of the operations, like aggregates, for example?

Ericson2314

0 replies

5h3m

2024-04-30 13:26:02 UTC

At the bottom, it it says that is a downside

jiehong

1 replies

11h17m

2024-04-30 07:11:31 UTC

SQLite EXPLAIN plan is indeed represented as a table, but I don’t find it necessarily that much easier to read and understand.

I often miss having the cardinality and the amount of bytes read for each part of the query like Oracle query plans provide.

Or is everyone really that happy with SQLite query plans?

bvrmn

0 replies

11h12m

2024-04-30 07:17:05 UTC

There is EXPLAIN QUERY PLAN which outputs usual high level plan description. But there is no easily reached disk/cache usage stats.

bambax

1 replies

11h25m

2024-04-30 07:03:48 UTC

Typo: A prepared statement is an object that represents the steps needed [to] accomplish the input SQL.

sgbeal

0 replies

10h48m

2024-04-30 07:41:22 UTC

Fixed, thank you for the report. It won't be visible on the site until the next time it's rebuilt.

matharmin

0 replies

8h4m

2024-04-30 10:24:47 UTC

What I find interesting is that this approach means the query plan is chosen when "compiling" the statement, not when running it. However, there are still some cases where SQLite will recompile the statement when running, such as when the schema changed.

groue

0 replies

7h50m

2024-04-30 10:38:37 UTC

Out of curiosity, I asked on the SQLite forum how the query optimizer fits in this schema: https://sqlite.org/forum/forumpost/93f9bfcec0

dittonedo

0 replies

4h20m

2024-04-30 14:08:42 UTC

I just wonder why can't hackers write homoiconic (if thats the word), languages that get array tokens and provide results like so. I often wonder what if sqlite was also designed to have its statements like ('select '* 'from 'table), it would also had been better for third part libraries, to implement their orms.

Since all of this (byte coding) (AST) is done by almost many programming languages I ever explored (Python,Ruby, lua, ...). It would have been awesome, if they were easily parse-able I guess.

Most of the ORM implementations are very error prone to those design decisions. (Especially SQL databases).

chucke1992

0 replies

9h32m

2024-04-30 08:57:16 UTC

Well SQLite is basically a religious cult at this point. I am always impressed by it.

carterschonwald

0 replies

5h38m

2024-04-30 12:51:13 UTC

So one thing that’s not often talked about is that with the right invariants/program transformations, the two approaches are equivalent :)

Retr0id

0 replies

9h38m

2024-04-30 08:50:55 UTC

Hence, no tree-of-objects database engine provides the level of detail in their "EXPLAIN" output that SQLite provides.

I've seen some cool graphical visualisation tools for postgres http://tatiyants.com/postgres-query-plan-visualization/