HN comments for: Redis re-implemented with SQLite

akie

42 replies

23h41m

2024-04-14 18:43:56 UTC

I would love to have a Redis alternative where I don't have to think about whether or not the dataset fits into memory.

yuppiepuppie

21 replies

23h33m

2024-04-14 18:52:08 UTC

Curious, What’s the use case?

qwertox

14 replies

23h29m

2024-04-14 18:55:25 UTC

My same thought, because some don't require much memory.

akie

13 replies

23h19m

2024-04-14 19:06:11 UTC

The use case is caching 20 million API responses that almost never change, each about 20kb of JSON, for a high traffic site.

Yes, I can pay for a 400Gb RAM instance of Redis, but it's expensive.

I can also cache it on disk, but then I need to think about cache expiration myself.

Or I can use something appropriate like a document database, but then I need additional code & additional configuration because we otherwise don't need that piece of infrastructure in our stack.

It would be a lot easier if I could just store it in Redis with the other (more reasonably sized) things that I need to cache.

yuliyp

1 replies

22h2m

2024-04-14 20:23:09 UTC

You're trying to get redis to be what it isn't. Use a thing that has the properties you want: a document or relational database. If you insist on this then running a system that allows a ton of swap onto a reasonably fast disk might work, but is still gonna perform worse than a system that's designed for concurrently serving queries of wildly differing latencies.

minitech

0 replies

18h22m

2024-04-15 00:03:11 UTC

Yes, that was the context: a Redis alternative to support the same operations but be what Redis isn’t.

danpat

1 replies

22h13m

2024-04-14 20:11:52 UTC

Or shard it - divide your objects up based on some criteria (hash the name of the object, use the first N digits of the hash to assign to a shard), and distribute them across multiple redis instances. Yes, you then need to maintain some client code to pick the right redis instance to fetch from, but you can now pick the most $/memory efficient instance types to run redis, and you don't have to worry about introducing disk read latency and the edge cases that brings with it.

Edit: looks like redis has some built-in support for data sharding when used as a cluster (https://redis.io/docs/latest/commands/cluster-shards/) - I haven't used that, so not sure how easy it is to apply, and exactly what you'd have to change.

yuliyp

0 replies

22h5m

2024-04-14 20:19:22 UTC

Sharding doesn't help here at all. They'd still need the same amount of RAM to house all the data in redis.

tayo42

0 replies

16h43m

2024-04-15 01:41:25 UTC

Have you looked at varnish for caching api responses? Varnish let's you back it with disk and relies on page cache to keep more accessed items in memory.

If the reverse proxy thing doesn't work I think memcached has two level storage like that now iirc

seddonm1

0 replies

21h49m

2024-04-14 20:36:03 UTC

In other abuses of SQLite, I wrote a tool [0] that exposes blobs in SQLite via an Amazon S3 API. It doesn't do expiry (but that would be easy enough to add if S3 does it).

We were using it to manage a millions of images for machine learning as many tools support S3 and the ability to add custom metadata to objects is useful (harder with files). It is one SQLite database per bucket but at the bucket level it is transactional.

0: https://github.com/seddonm1/s3ite

reese_john

0 replies

22h9m

2024-04-14 20:15:30 UTC

You could try using Amazon S3 Express, a low-latency alternative for S3 buckets [0]. I imagine cache invalidation would be relatively simple to implement using lifecycle policies.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-exp...

phamilton

0 replies

21h45m

2024-04-14 20:40:08 UTC

A few things:

Redis Data Tiering - Redis Enterprise and AWS Elasticache for Redis support data tiering (using SSD for 80% of the dataset and moving things in and out). On AWS, a cache.r6gd.4xlarge with 100GB of memory can handle 500GB of data.

Local Files

I can also cache it on disk, but then I need to think about cache expiration myself.

Is the challenge that you need it shared among many machines? On a single machine you can put 20 million files in a directory hierarchy and let the fs cache keep things hot in memory as needed. Or use SQLite which will only load the pages needed for each query and also rely on the fs cache.

S3 - An interesting solution is one of the SQLite S3 VFS's. Those will query S3 fairly efficiently for specific data in a large dataset.

mannyv

0 replies

12h59m

2024-04-15 05:26:13 UTC

If you're caching 20m api requests that almost never change use s3. I mean heck, it's cheaper, can be cdn'd, and can be pipelined by the browser.

kiitos

0 replies

21h21m

2024-04-14 21:03:49 UTC

Redis is an in-memory cache by definition. If you don't want to cache in-memory, then don't use Redis.

emmanueloga_

0 replies

7h34m

2024-04-15 10:50:59 UTC

Check memcached "extstore" [1].

1: https://github.com/memcached/memcached/wiki/Extstore

Nican

0 replies

23h9m

2024-04-14 19:16:13 UTC

This looks like a good use case for ScyllaDB with Compression and TTL. It is pretty simple to setup a single-node instance.

If you rather have something in-process and writes to disk, to avoid extra infrastructure, I would also recommend RocksDB with Compression and TTL.

0cf8612b2e1e

0 replies

22h15m

2024-04-14 20:10:13 UTC

Would DiskCache work for you? Runs via SQLite memory or persisted file database. Thread safe, has various expiration controls, etc.

https://grantjenks.com/docs/diskcache/tutorial.html

jitl

4 replies

23h23m

2024-04-14 19:01:53 UTC

Redis drops data semi randomly when under memory pressure.

If you use Redis for queue tasks (this is popular in Rails and Django/Python web services), that means that during an incident where your queue jobs are getting added faster than they’re removed, you’re going to lose jobs if the incident goes on long enough.

kiitos

1 replies

22h45m

2024-04-14 19:39:14 UTC

Well, of course! Redis is not (and has never been) a database, it's a data structure server, at best described as a cache. If jobs are added faster than they're removed, this is straight queueing theory 101 -- ideally you'd reject jobs at add-time, but otherwise you have to drop them.

gnarbarian

0 replies

22h8m

2024-04-14 20:16:41 UTC

Right. I think Redis hitting the disk would be a terrible tradeoff compared to making a new backend call. it probably wouldn't save you much time and I imagine it would lead to very strange and unpredictable behavior on the front end or trying to debug latency or data issues downstream

prisenco

0 replies

22h20m

2024-04-14 20:04:49 UTC

Since Redis is an in-memory cache, and already doesn't guarantee the data, would it make sense to set PRAGMA SYNCHRONOUS on nalgeon to OFF to boost performance to something closer to standard Redis?

byroot

0 replies

23h16m

2024-04-14 19:08:46 UTC

That depends on how the `maxmemory-policy` is configured, and queue systems based on Redis will tell you not to allow eviction. https://github.com/sidekiq/sidekiq/wiki/Using-Redis#memory (it even logs a warnings if it detects your Redis is misconfigured IIRC).

noncoml

0 replies

23h11m

2024-04-14 19:13:42 UTC

Using a hammer like a screwdriver

alerighi

9 replies

22h11m

2024-04-14 20:13:57 UTC

At that point why using Redis entirely? You can use any DBMS you want, either relational or NoSQL. The advantage of Redis is that it is a memory cache, if you take out the memory from it, just use Postgres or whatever DBMS you are using (I say Postgres because it has all the features of Redis).

dalyons

8 replies

21h18m

2024-04-14 21:06:48 UTC

Postgres has nowhere near all the features of redis. Go and have a look at the redis command’s documentation. They’re not even really similar at all, once you get past basic GET/SET stuff.

pshc

3 replies

21h12m

2024-04-14 21:13:13 UTC

I feel like one could implement most Redis commands as functions or PL/SQL using native Postgres hstore and json. Could be an interesting translation layer.

out_of_protocol

2 replies

8h6m

2024-04-15 10:19:00 UTC

Well, you can't do O(1) part for most commands :) redis can do a lot of stuff at near constant (and superfast) speed. Databases do become slower when your dataset is big

beepbooptheory

1 replies

7h12m

2024-04-15 11:12:36 UTC

But if its that big, how are you fitting it in your RAM with Redis anyway?

out_of_protocol

0 replies

5h37m

2024-04-15 12:47:18 UTC

You can buy servers with 1TB RAM for reasonable price. Also, you don't need to go this big, even moderate-sized tables are slow. Indexes are usually log(n) at best (and eating more RAM anyway, you can get said log(n) if all index data is in memory, and at this point regular sql db eating more ram than redis), and insert time is slowing down since you need to update index as well (and maybe reorg stuff around)

from-nibly

3 replies

21h12m

2024-04-14 21:12:50 UTC

Can you name an explicit thing that postgres does not do that redis does?

jitl

2 replies

20h52m

2024-04-14 21:32:38 UTC

This is silly, Postgres doesn’t speak the Redis wire protocol. You will need a large army of connection proxies to get a Postgres database to handle the number of connections a single Redis shrugs off with no sweat.

Maybe you like this answer more: At the end of the day you can embed a bunch of Turing-complete programming languages in Postgres, and Postgres can store binary blobs, so Postgres can do literally anything. Can it do it performantly, and for low cost? Probably not. But if you put in enough time and money I’m sure you can re-implement Redis on Postgres using BLOB column alone.

Here’s a simpler answer: cuckoo filter is available out of the box in Redis, 2 seconds of Googling I didn’t find one for Postgres: https://redis.io/docs/latest/develop/data-types/probabilisti...

codetrotter

1 replies

20h29m

2024-04-14 21:56:11 UTC

Not sure if this one could be used in order to do what you want but maybe?

https://www.postgresql.org/docs/current/bloom.html

Have a look

pgaddict

0 replies

8h43m

2024-04-15 09:42:00 UTC

No, that's a completely different thing - an index access method, building a bloom filter on multiple columns of a single row. Which means a query then can have an equality condition on any of the columns.

That being said, building a bloom/cuckoo filter as a data type would be quite trivial - a basic version might take an hour or two, the more advanced stuff (to support partial aggregates etc) might take a bit more. Ultimately, this is not that different from what postgresql-hll does.

jetbalsa

5 replies

23h35m

2024-04-14 18:49:29 UTC

I've used SSDB[0] in the past for some really stupid large datasets (20TB)_and it worked really well in production

[0] https://github.com/ideawu/ssdb

PlutoIsAPlanet

2 replies

23h30m

2024-04-14 18:54:34 UTC

Its also worth checking out kvrocks, which is a redis interface on top of rockdb that's part of the Apache project, and very well maintained.

mathfailure

1 replies

18h50m

2024-04-14 23:34:54 UTC

And which is not in-memory at all.

PlutoIsAPlanet

0 replies

17h50m

2024-04-15 00:35:02 UTC

It can cache in-memory using RocksDBs caching mechanisms.

welder

1 replies

23h4m

2024-04-14 19:21:10 UTC

I switched from SSDB to Kvrocks recently, because SSDB is abandoned and the author missing for 3 years now. I used to recommend SSDB, but now there's better alternatives available:

https://github.com/apache/kvrocks

https://github.com/sabledb-io/sabledb

akie

0 replies

22h49m

2024-04-14 19:35:45 UTC

These are great recommendations, thanks!

Nican

1 replies

23h19m

2024-04-14 19:05:47 UTC

As usual, there is a spectrum of data safety vs. performance. Redis is at the "very fast, but unsafe" side of the scale.

ScyllaDB for me is in the middle of being high performance key-value store, but not really supporting transactions. FoundationDB is another one that I would consider.

tyre

0 replies

19h59m

2024-04-14 22:25:14 UTC

Depends on the kind of safety you’re looking for. Redis is entirely safe from concurrency issues because it’s single-threaded. It supports an append-only file for persistence to disk.

GraemeMeyer

1 replies

23h29m

2024-04-14 18:55:15 UTC

I think Garnet might be what you're looking for: https://www.microsoft.com/en-us/research/project/garnet/

sebazzz

0 replies

10h22m

2024-04-15 08:02:44 UTC

Doesn't support lua at the moment. Some connectors to Redis like .NET IDistributedCache use lua.

nalgeon

12 replies

23h12m

2024-04-14 19:13:04 UTC

I'm a big fan of both Redis and SQLite, so I decided to combine the two. SQLite is specifically designed for many small queries[1], and it's probably as close as relational engines can get to Redis, so I think it might be a good fit.

[1]: https://sqlite.org/np1queryprob.html

sesm

4 replies

20h1m

2024-04-14 22:23:22 UTC

What are the project goals? I assume it's a drop-in replacement for Redis that is supposed to be better in certain cases? If yes, then what cases do you have in mind?

nalgeon

1 replies

19h35m

2024-04-14 22:49:51 UTC

The goal is to have a convenient API to work with common data structures, with an SQL backend and all the benefits it provides. Such as:

— Small memory footprint even for large datasets.

— ACID transactions.

— SQL interface for introspection and reporting.

sesm

0 replies

4h0m

2024-04-15 14:24:57 UTC

So the goal is to have a Redis-like API but not actually be an in-memory data store and reduce memory consumption this way? For example, for a project/service that started with Redis, but then priorities shifted and small memory footprint became more important than performance? Did I get it right?

grobbyy

1 replies

17h30m

2024-04-15 00:55:01 UTC

I can tell you my use case. I have a lot of systems which I'd like to use on a single computer AND at scale.

That's sometimes even just for development work.

A lot of these use a common API to a more complex distributed store, as well as to something simple like files on disk, in memory, or SQLite.

I'm most cases, it's one user at a time, so performance doesn't matter, but simplicity does.

It can also be for the project which has a 1 percent chance of going viral.

Etc. But I find relatively few cases between truly small scale and large scale.

sesm

0 replies

3h59m

2024-04-15 14:25:44 UTC

Local dev drop-in replacement for Redis seems like a really good usecase, thanks!

justinclift

4 replies

15h21m

2024-04-15 03:03:39 UTC

That's pretty cool. Reckon it would work with existing code that calls Redis over the wire for RQ?

  https://python-rq.org

This RQ stuff has been a pain with a recent project because only Python seems to use it, so once an RQ job has been submitted only Python based things can do anything with it. :(

If Redka works as a backend replacement, we could potentially have non-Python things check the SQLite database instead.

nalgeon

1 replies

13h49m

2024-04-15 04:35:30 UTC

It works with redis-py (which python-rq uses), but I doubt it will be any good in this case. python-rq seems to use Lua scripting in Redis, which is not planned for 1.0. I'd rather not add it at all, but we'll see.

justinclift

0 replies

13h45m

2024-04-15 04:40:01 UTC

No worries. :)

Spivak

1 replies

5h11m

2024-04-15 13:13:51 UTC

Highly recommend Faktory (by the folks who brought you Sidekiq) as a language agnostic job server.

justinclift

0 replies

4h48m

2024-04-15 13:36:57 UTC

Thanks, looks potentially interesting: https://github.com/contribsys/faktory

And it's written in Go. :)

yawaramin

0 replies

16h59m

2024-04-15 01:25:38 UTC

Any plans to support Redis' memory-based eviction policy eg `volatile-lru`? To me a great benefit of Redis is its guarantee of stable memory usage.

b33j0r

0 replies

19h59m

2024-04-14 22:26:07 UTC

I love this, it’s the solution that makes sense for 90% of the times I have used redis with python.

I’ve made several versions of this, and to be honest, it ended up being so straightforward that I assumed it was a trivial solution.

This is pretty well-planned. This is 100% the way to go.

Heh. I took a detour into making my idea of “streams” also solve event sourcing in native python; dumb idea, if interesting. Mission creep probably killed my effort!

Nice work

surfingdino

8 replies

19h52m

2024-04-14 22:32:20 UTC

Back when Foursquare made MongoDB famous someone posted a PoC of a NoSQL DB implemented in MySQL. It did not seem to have caught on, but it did make me think of how much performance is traded for helping us not to reinvent SQL every time we need a DB. I like experiments like this one, they sometimes lead to new projects.

coredog64

6 replies

19h10m

2024-04-14 23:14:44 UTC

I believe DynamoDB uses MySQL under the hood as it’s storage engine. Or, if it doesn’t currently that it did for several years.

efilife

5 replies

18h45m

2024-04-14 23:40:11 UTC

Its*

https://youryoure.com/?its

ukuina

4 replies

15h8m

2024-04-15 03:16:48 UTC

Please do not post replies that are trivial spelling or grammatical corrections; they add little to the discourse.

To parent: I think all datastores can boil down to a SQL interface eventually. Someday a SQL-interface may dynamically choose a bespoke backend for your schema (or even sub-schema!)

efilife

2 replies

12h12m

2024-04-15 06:12:53 UTC

Isn't this a forum about curiosity and learning?

lomase

1 replies

11h56m

2024-04-15 06:29:11 UTC

Your comment provides nothing to the conversation, wich is the user Redis reimplementation in SQL.

efilife

0 replies

11h19m

2024-04-15 07:05:41 UTC

From what I could find, I didn't break any rule, whlist teaching an user something he didn't know. Win win?

surfingdino

0 replies

11h22m

2024-04-15 07:02:48 UTC

Indeed, they can. It's interesting that some noSQL projects add a stripped down SQL API eventually.

adobrawy

0 replies

9h21m

2024-04-15 09:04:06 UTC

MySQL is NoSQL: https://www.mysql.com/products/enterprise/document_store.htm...

nasretdinov

7 replies

22h19m

2024-04-14 20:05:27 UTC

By the way, I noticed you're using SetMaxConnections(1), however in WAL mode (which you're using) SQLite does support writes that don't block reads, so you might benefit from allowing read concurrency (in theory).

nalgeon

6 replies

22h4m

2024-04-14 20:20:26 UTC

Yeah, it's explained in the code[1]

SQLite only allows one writer at a time, so concurrent writes will fail with a "database is locked" (SQLITE_BUSY) error.

There are two ways to enforce the single writer rule:

1. Use a mutex for write operations.

2. Set the maximum number of DB connections to 1.

Intuitively, the mutex approach seems better, because it does not limit the number of concurrent read operations. The benchmarks show the following results:

- GET: 2% better rps and 25% better p50 response time with mutex

- SET: 2% better rps and 60% worse p50 response time with mutex

Due to the significant p50 response time mutex penalty for SET, I've decided to use the max connections approach for now.

[1]: https://github.com/nalgeon/redka/blob/main/internal/sqlx/db....

nasretdinov

2 replies

22h0m

2024-04-14 20:24:36 UTC

How about having two pools, one for writes only, and the other one for reads? SQLite allows you to open the DB more than in one thread per application, so you can have a read pool and a write pool with SetMaxConnections(1) for better performance. This of course also means that reads should be handled separately from writes in the API layer too.

nalgeon

1 replies

21h58m

2024-04-14 20:26:40 UTC

Thought about it, decided to start with simpler and good enough option. The goal here is not to beat Redis anyway.

nasretdinov

0 replies

21h52m

2024-04-14 20:32:50 UTC

Well I agree, that's a good starting point. You probably won't be able to beat Redis with SQLite anyway :), although given that WAL mode allows for concurrent reads it might give it a large enough performance boost to match Redis in terms of QPS if the concurrency is high enough.

kiitos

0 replies

21h13m

2024-04-14 21:12:05 UTC

The benchmarks show the following results

Where are the benchmarks?

avinassh

0 replies

15h15m

2024-04-15 03:10:12 UTC

Have you tried setting BUSY_TIMEOUT and benched it? I believe it would be better too.

https://www.sqlite.org/pragma.html#pragma_busy_timeout

Sytten

0 replies

21h44m

2024-04-14 20:40:27 UTC

This is really not true in WAL mode with synchronous NORMAL, this was only true with the default journal mode and a lot of people are misusing sqlite because of that. You still have one writer at a time but you wont get the SQLITE_BUSY error.

You can check the documentation [1], only some rare edge cases return this error in WAL. We abuse our sqlite and I never saw it happen with a WAL db.

[1] https://www.sqlite.org/wal.html#sometimes_queries_return_sql...

kiitos

7 replies

22h43m

2024-04-14 19:41:17 UTC

The entire value proposition of Redis is that it operates out of memory, and therefore has memory-like performance. (edit: And doesn't provide the benefit of, and therefore pay the costs related to, ACID-like consistency guarantees.) If you move it to disk (edit: Or try to assert ACID-like consistency or transactional guarantees) there's little reason to use Redis any more.

qbane

2 replies

22h40m

2024-04-14 19:44:23 UTC

SQLite does allow one to keep the entire database in memory: https://www.sqlite.org/inmemorydb.html

j-pb

1 replies

21h56m

2024-04-14 20:29:00 UTC

But is still orders of magnitude slower than a hash-map.

SJC_Hacker

0 replies

18h38m

2024-04-14 23:47:02 UTC

If you want ordering, then a hash-map doesn't help you.

PhilipRoman

1 replies

22h35m

2024-04-14 19:49:49 UTC

The entire value proposition of Redis is that it operates out of memory

Not familiar with Redis specifically, but I doubt this idea. You can run anything on top of a ramdisk (granted, you can save a few pointer additions and get rid of some safety checks if you know you're working with memory)

yuliyp

0 replies

21h59m

2024-04-14 20:26:09 UTC

Sure you can run things off a ramdisk, but the way you lay out data to achieve high performance from disk vs from RAM is different (disk assumes that you read pages of data at once, and tries to avoid reading extra pages, while RAM assumes that you read cache lines of data at once).

egeozcan

0 replies

22h40m

2024-04-14 19:45:03 UTC

You can also create an in-memory sqlite database though.

77pt77

0 replies

22h24m

2024-04-14 20:00:29 UTC

You can tun sqlite in memory by using the filename

:memory:

jhatemyjob

5 replies

23h28m

2024-04-14 18:56:23 UTC

This is a great idea and I am glad it is BSD licensed. Unfortunately the execution is somewhat lacking. SQLite is best suited for embedded / clientside applications with minimal dependencies. The author of this project decided to use Go and make it a service.

nalgeon

2 replies

23h25m

2024-04-14 18:59:25 UTC

Did I?

Both in-process (Go API) and standalone (RESP) servers.

In-process means that the database is "embedded / clientside" in your terms.

jhatemyjob

1 replies

23h24m

2024-04-14 19:01:05 UTC

It's a server.

orthecreedence

0 replies

13h20m

2024-04-15 05:04:57 UTC

Do you usually embed redis?

merlinran

0 replies

23h13m

2024-04-14 19:11:40 UTC

What would you use Redis or substitutions for in embedded/ clientside applications? Seriously asking.

leetrout

0 replies

23h21m

2024-04-14 19:03:59 UTC

SQLite is best suited for embedded / clientside applications with minimal dependencies.

Often repeated and certainly rooted in truth but there was a healthy discussion on here the other day[0] where tptacek shared a link in a comment[1] to a related blog post about getting more scale out of using SQLite serverside.

0: https://news.ycombinator.com/item?id=39975596

1: https://kerkour.com/sqlite-for-servers

whoitwas

3 replies

18h56m

2024-04-14 23:28:29 UTC

I use redis as a cache layer in front of the DB. I don't understand the concept.

SJC_Hacker

2 replies

18h47m

2024-04-14 23:37:48 UTC

Redis is a KV store.

With this you can run a full SQL query.

I did something similiar for my small baseball statistics website switchfielder.com/baseball. On server startup, I copy the relevant tables into memory.

whoitwas

1 replies

18h35m

2024-04-14 23:50:09 UTC

I thought the point of Redis was to avoid hitting DB. Why not just use SQL?

SJC_Hacker

0 replies

15h34m

2024-04-15 02:50:38 UTC

Because you are guaranteeing everything is in memory, so its very fast and you aren't relying on the DB memory caching. Also it is in-process, which (supposedly) has some benefits as well. Theoretically DBs such as Postgres can be set up similarly, although they will be out of process, and in the case of Postgres, behind at least a UNIX domain socket. This may be a difference of maybe ~ms, I"d love to see some benchmarks ... maybe I'll try that when I have time. Could make a decent Medium article ...

For an OLAP (i.e., no writes), small databases with complicated queries(joins, grouping, partitioning, ordering, etc.) this works very well. Yes this is somewhat of an edge case, but I'd bet small shops could fit their entire DB into one of the bigger (several hundred GB) instances.

jitl

3 replies

21h31m

2024-04-14 20:53:20 UTC

I’m not sure to what degree you want to follow the Redis no concurrency “everything serialized on one thread” model.

You can get substantially better performance out of sqlite by using the lower level https://github.com/crawshaw/sqlite, turning on WAL etc, using a connection per goroutine for reads, and sending batches of writes over a buffered channel / queue to a dedicated writer thread. That way you can turn off SQLite’s built in per-connection mutex but still be thread safe since each connection is only used on a single thread at a time.

For this use-case you will also probably save a lot of time if you use some large arena-style buffers (probably N per conn?) and copy incoming parameter bytes from the network request/socket to the buffer, or copy straight from sqlite out to the socket, instead of allocating and passing around a bunch of individual strings. Boxing those strings in interface{} (as done by the high level sql stdlib) slows things down even more.

None of this is necessary to get usable perf, even decently good perf, just sharing some tips from my experience trying to get absolutely maximum write throughput from SQLite in Golang.

SJC_Hacker

1 replies

18h49m

2024-04-14 23:35:15 UTC

You can get substantially better performance out of sqlite by using the lower level https://github.com/crawshaw/sqlite, turning on WAL etc, using a connection per goroutine for reads, and sending batches of writes over a buffered channel / queue to a dedicated writer thread. That way you can turn off SQLite’s built in per-connection mutex but still be thread safe since each connection is only used on a single thread at a time.

Would this protect against a row update in the middle of a read? e.g. would a row at least be internally consistent at the time it was read?

lanstin

0 replies

18h2m

2024-04-15 00:22:57 UTC

Yes you would still get SQLite file locking; the thing you aren't using is SQLite thread locking around its own structures (according to GP, I don't know myself). But be sure not to use NFS mount for the files, the locking there is not that great.

nalgeon

0 replies

20h41m

2024-04-14 21:43:42 UTC

Great tips, thank you! The thing is, getting maximum throughput is not the goal of the project (at least not at this stage). I'm using reasonable SQLite defaults (including WAL), but that's it for now.

xrd

1 replies

19h57m

2024-04-14 22:27:42 UTC

Go plus SQLite is producing some terrific projects. I love Pocketbase and this looks great as well.

sitkack

0 replies

19h42m

2024-04-14 22:42:49 UTC

https://github.com/pocketbase/pocketbase

tehbeard

1 replies

23h10m

2024-04-14 19:14:53 UTC

reimplement the good parts of Redis

Seems to be missing streams, hyperloglog and pubsub though, so mostly just the kv part of the side protocol with a different backend?

nalgeon

0 replies

23h8m

2024-04-14 19:16:53 UTC

Can't fit everything in 1.0, has to start with something. If the community is interested in the project, there will be more.

redskyluan

1 replies

19h53m

2024-04-14 22:31:20 UTC

I would personally not recommend implementing a Redis protocol on top of SQLite, as I've seen too many failed cases like this. Users may perceive your product as a drop-in Redis replacement, and it may work fine in a PoC, but once it hits production, SQLite's performance and scalability will be severely challenged.

It's much better to use RocksDB as the underlying storage engine for such a solution. RocksDB can provide the performance and scalability required.If you need a distributed solution, I would suggest looking at TiKV or FoundationDB. These are both excellent distributed storage systems that can handle production workloads much better than a SQLite-based approach.

marcrosoft

0 replies

18h57m

2024-04-14 23:27:45 UTC

This “SQLite isn’t good for production” mentality really needs to die. It’s a myth.

larodi

1 replies

22h37m

2024-04-14 19:47:17 UTC

Potentially many things like session mgmt, queues, document graphs, etc, can be done right with simple facilities like tables. Tables represent sets, and set algebra seems very common in data representations. Thing is how the sets are combined, i.e. related. This' essentially API-to-SQL-in-Redis-clothes. Kudos to the author.

nalgeon

0 replies

21h53m

2024-04-14 20:32:09 UTC

Thank you! I also think that the relational model can get you pretty far if you don't need to squeeze every last bit of performance out of the program. And the added benefit of using a battle-tested SQL engine is far fewer storage-related bugs.

TimTheTinker

1 replies

18h41m

2024-04-14 23:43:16 UTC

I'm waiting for someone to implement the Redis API by swapping out the state machine in TigerBeetle (which was built modularly such that the state machine can be swapped out).

https://tigerbeetle.com/

himujjal

0 replies

16h3m

2024-04-15 02:21:43 UTC

This is state of the art. It will take time before Tigerbeetle themselves release their state machine as a library.

They have done some incredible job. Implementing things I didnt know existed about software/hardware.

sudhirj

0 replies

4h17m

2024-04-15 14:08:12 UTC

Did this for DynamoDB over the pandemic. Helped me learn both the Redis API and DynamoDB. https://github.com/dbProjectRED/redimo.go

Want to do the same thing with Postgres as well.

The Redis API and data structures are really nice and have been tuned to be useful in a wide variety of situations. Implementing these APIs is different concurrent, compute and storage models is a nice way to learn them and get options open.

poidos

0 replies

22h52m

2024-04-14 19:32:40 UTC

Wonderful idea and execution!

mrinterweb

0 replies

19h26m

2024-04-14 22:58:44 UTC

A lot redis's fantastic performance evaporates when network IO is added to the equation. Of course, you can host your own redis, but as soon as people start using a SaaS hosted redis service, performance usually takes a huge hit. If this makes running your own redis compatible key/val store easier for people in your own cluster, that's a win.

moomoo11

0 replies

12h36m

2024-04-15 05:48:42 UTC

How often do you use any advanced redis features?

lukevp

0 replies

19h13m

2024-04-14 23:12:07 UTC

Does this or Garnet have more coverage of Redis commands? I'm looking to embed a Redis-compatible subset of functionality in a program for local debugging purposes, and there will be an API in between that can protect against Redis commands that lack coverage.

c4pt0r

0 replies

18h54m

2024-04-14 23:30:50 UTC

Not really wanting to comment on the use of this program (it's useful), but just from the quality of the code, it's very good!

andrewfromx

0 replies

14h20m

2024-04-15 04:04:46 UTC

nice! Now just get cluster support and aws to offer elasticredka?

TheChaplain

0 replies

21h55m

2024-04-14 20:29:42 UTC

Hmm, am I the only one who is not worried?

Although I don't really see anything in the license change that would prevent me from using it at both home and business, Redis seem "complete" functionality wise so using a pre-license-change version can't hurt even long-term I think.

PeterZaitsev

0 replies

3h57m

2024-04-15 14:28:07 UTC

I think what would be interesting is integrating with dsqlite or other replicated solution. Performance is no the only requirement