HN comments for: Garnet – A new remote cache-store from Microsoft Research

west0n

46 replies

14h30m

2024-03-19 03:55:26 UTC

From the benchmark performance charts(https://microsoft.github.io/garnet/docs/benchmarking/results...), the throughput of the GET command exceeds that of Dragonfly by more than tenfold. While 50% latency is slightly higher than Dragonfly, the 99th percentile is slightly lower than Dragonfly. Both the throughput and latency of Garnet and Dragonfly are far better than Redis, indicating that Redis may require a significant performance optimization.

whimsicalism

21 replies

14h23m

2024-03-19 04:02:32 UTC

surprised to see a garbage collected language project (C# for Garnet) beat redis/dragonfly

blackoil

10 replies

13h13m

2024-03-19 05:11:57 UTC

The investment in optimization in CLR or JVM can be huge as they impact millions of applications. While each C / C++ code will have to be hand optimized.

Also limits on number of people in given time who can write optimal C code vs C# will also make managed code better.

rfoo

5 replies

12h25m

2024-03-19 05:59:58 UTC

Plus that it's easier to have unmanaged code in C# than in Java.

larodi

3 replies

11h36m

2024-03-19 06:49:25 UTC

Is it possible to have unmanaged code in Java at all… or you mean linking libs and exposing them through Java API in the lang?

rfoo

2 replies

10h56m

2024-03-19 07:29:20 UTC

Ah, I meant Java code manipulating off-heap memory. For Java I don't think there's a way to mix managed and unmanaged code like you do in C#/C++ CLR.

pjmlp

0 replies

9h4m

2024-03-19 09:20:49 UTC

Panama has helped into that direction, it isn't like C# though, for feature parity we will eventually need Valhalla, which since they want to keep existing JARs working, it is taking its time.

Ignoring the type systems from Eiffel, Objective-C, Oberon and Modula-3 linage, even though they were inspirations for Java, has been shown to have been a bad decision in hindsight.

neonsunset

0 replies

10h49m

2024-03-19 07:36:37 UTC

In C# you usually don't want to mix in C++. Manipulating off-heap memory is indeed easy - nowadays pointers to both object interiors, stack and unmanaged memory can be represented as `ref T` where T is byte, int, etc.

These are then subsequently wrapped by `Span<T>` and `ReadOnlySpan<T>` respectively. This way a span can be a slice of memory that can have any origin:

    var fromStack1 = (stackalloc byte[32]);
    var fromStack2 = (Span<byte>)[0x20, 0x20, 0x20, 0x20];
    var fromHeap = new byte[32].AsSpan();
    unsafe
    {
        var ptr = NativeMemory.Alloc(32);
        var fromMalloc = new Span<byte>(ptr, 32);
        // Don't forget to free :)
    }

pjmlp

0 replies

9h13m

2024-03-19 09:11:55 UTC

True, I still look forward to have Valhala, eventually.

theLiminator

3 replies

11h2m

2024-03-19 07:23:40 UTC

Couldn't you just replace the CLR and JVM with llvm and gcc and then replace C with assembly and arrive at the same conclusion.

blackoil

1 replies

10h20m

2024-03-19 08:05:31 UTC

Now commenting beyond my expertise, I understand both ecosystems have developed orthogonally, C#/Java assumes code to be higher level and "dumb" and optimization are in library/CLR/JVM. While C/C++ are developed with code at lower level and developer making sure code is optimized.

Now, I could be talking out of ass, haven't done enough C/C++ coding in over a decade.

balls187

0 replies

4h35m

2024-03-19 13:50:18 UTC

You can write poorly optimized C#.

And the intel C++ compiler has provided significant performance for decades via optimization without needing to optimize by hand.

Dumble

0 replies

10h43m

2024-03-19 07:42:18 UTC

I'm not the person you are replying too, but I believe that of course, the pattern holds if you keep shifting it down. I.E. using a faster CPU will speed up all programs running on it, each (already optimized) ASIC has to be optimized further individually.

latchkey

4 replies

13h37m

2024-03-19 04:48:42 UTC

I think it is rare, but isn't impossible. I remember in the 2000's watching someone do a demo of Java/JVM beating the pants off a C++ application. If I remember correctly, it was something about how the JIT was doing an optimization that you would have had to write assembly in order to optimize it to the same level.

thayne

1 replies

12h29m

2024-03-19 05:56:14 UTC

There are at least two reasons I can think of why a JIT language with GC can outperform C/c++:

1. Memory management with a GC often has higher throughput. With the downside that you can have high latency when a garbage collection occurs.

2. The JIT compilation can potentially do a better job of optimizing, because it has information about how the code has been run so far.

It is possible for c or c++ (or rust) to get get the first using alternative memory management strategies, and the second by hand optimizing, and/or using profile guided optimization.

pjmlp

0 replies

9h13m

2024-03-19 09:12:45 UTC

3. The runtime also supports C++, and one emits the same MSIL code sequence that the C++ compiler would generate.

Which is why WASM isn't that great novelty for grey beards.

nurettin

1 replies

13h5m

2024-03-19 05:20:13 UTC

There were coding competition benchmarks at the time putting javac against the underdeveloped gcc 2.95 of the time. The trick they did with java programs was to allocate a region of memory only once and reuse that whenever they needed more, simulating a stack. Then the programs were benchmarked as hot start and cold start. Hot start timings were used to bench against C++ programs. If the algorithm they used was better, this sometimes resulted in head to head or better performance.

latchkey

0 replies

13h1m

2024-03-19 05:24:32 UTC

I remember that one too, but this one was not just the hot memory trick, which seems relatively easy. It was entirely JIT based, where the JIT happened to pick a better path than gcc.

Thaxll

1 replies

4h33m

2024-03-19 13:52:22 UTC

Because it's not the language that matters here, the performance comes from the architecture of the storage among other things.

I've seen Go code best Rust one, really when comparing languages you should look at the same implementation, if the design is completely different its not really comparable.

surajrmal

0 replies

3h50m

2024-03-19 14:35:31 UTC

While true, gc tends to affect tail latencies as it can increase variance. Obviously not always the case, but without sufficient planning it's likely. Languages with less stop the world problems tend to trade it for throughput, however if the application is not CPU bound it doesn't matter. The same can happen in a non gc language where a similar lack of allocation planning is done, but it tends to be considered more carefully because those languages really put allocations in your face.

pjmlp

0 replies

9h15m

2024-03-19 09:10:40 UTC

Not all garbage collected languages are made equal, some of them, like C# and .NET, do provide all the performace knobs that are needed for C++ like coding.

People only have to learn how to use them, instead of placing all garbage collected languages into the same basket.

In this specific case, MSIL and .NET were designed to support C++ as well, and languages like C# and F# do have ways to access those features, and even if some feature isn't exposed at the language grammar level, you can emit the same MSIL that C++/CLI would generate.

orthoxerox

0 replies

9h8m

2024-03-19 09:17:04 UTC

There's a series of blog posts by Oren Eini that shows that C# can be sufficiently fast even when you don't really optimize anything. Beating Redis in every benchmark is a whole another level, of course.

https://ayende.com/blog/197412-B/high-performance-net-buildi...

fulafel

0 replies

12h20m

2024-03-19 06:05:36 UTC

Seems it wins by using more sophisticated data structures & algorithms, instead of "tight code".

hipadev23

14 replies

13h0m

2024-03-19 05:25:09 UTC

Redis may require a significant performance optimization.

Redis is single-threaded, it’s simple and effective. I’m not sure it needs optimization, and we have 3 alternatives here.

Garnet however is the first alternative to actually outperform Redis at both low and high levels of concurrency, which is remarkable. I can’t wait to try it out.

gigatexal

11 replies

10h6m

2024-03-19 08:18:56 UTC

Redis for 99% of the intended use-cases and companies will be just fine. It has always been rock solid when we used it. It was the opposite of DNS: it was never a problem.

moritonal

10 replies

9h20m

2024-03-19 09:05:22 UTC

Could you expand on "It was the opposite of DNS: it was never a problem."? Feel like I'm missing some interesting history here.

mbreese

2 replies

7h42m

2024-03-19 10:43:47 UTC

It’s just a common saying “the problem is always DNS”. When you have a weird networking issue, the problem always seems to be DNS. Either some misconfigured DNS entry or DHCP giving you the wrong server address, etc…

And then the problem is confounded by the fact that, ironically, DNS works so well that we don’t think of it as a primary point of failure. So inevitably, when there is a DNS problem, it’s the last thing we check. This just reinforces the idea that the problem is always DNS… because in those long, hard to troubleshoot instances… the problem was DNS.

robertlagrant

1 replies

4h51m

2024-03-19 13:34:13 UTC

DNS, BGP, or branch prediction. Pick three.

zimpenfish

0 replies

3h54m

2024-03-19 14:31:42 UTC

Stealing from the greats, "DNS, BGP, off-by-one errors, or branch prediction. Pick three."

akvadrako

2 replies

8h21m

2024-03-19 10:04:01 UTC

It's weird since DNS is one of the most rock solid systems we have, with redundancy at every level.

1. Clients query a list of servers (IPs) and handle failover when you don't quickly get a reply.

2. Most of those servers at the root and TLD level are actually anycasted from multiple locations globally, so you connect to the closest instance.

3. Those instances are often clusters of physical servers. The big ones have fully redundant networking, so any router or switch failing doesn't take it down. Some run different DNS server software on each physical server, so even software bugs won't take down the whole system.

sumtechguy

1 replies

5h51m

2024-03-19 12:34:11 UTC

In my years of doing this sort of thing. I only had it really be DNS once. The issues I usually see blamed on DNS is when DNS is 'abused' to do things like load balancing and the TTL is very small. Then you need goofy things like 'sticky sessions' and what not to work around that.

byteknight

0 replies

2h50m

2024-03-19 15:35:17 UTC

Assuming you didnt manage connectivity of systems very long or very many. I find it difficult that not being true with you saying it happened "once".

Terretta

1 replies

6h1m

2024-03-19 12:24:24 UTC

Meme this is referencing should be "It's always DNS config."

jameshart

0 replies

1h54m

2024-03-19 16:31:23 UTC

Mostly. But it can also be DNS request volume, DNS cache expiry, DNS response times, DNS connection contention, DNS security, DNS error handling, or DNS client configuration.

All variants of ‘DNS is working perfectly, just your expectations of how it will work in your situation are not completely correct’.

stephenr

0 replies

6h37m

2024-03-19 11:48:30 UTC

People who don't understand networking like to blame DNS for every problem they experience I guess?

klohto

0 replies

9h9m

2024-03-19 09:15:52 UTC

isitdns.com

LeonidBugaev

1 replies

9h26m

2024-03-19 08:59:40 UTC

Plus in production, with high load, Redis cluster is way more common, which kind of solve single-threaded concern.

CuriouslyC

0 replies

5h56m

2024-03-19 12:29:12 UTC

I've always found redis cluster to just bring problems with it.

west0n

8 replies

14h23m

2024-03-19 04:02:17 UTC

What surprises me the most is that this project is developed in C#, while Dragonfly is developed in C++, and Redis is in C.

amir734jj

7 replies

12h50m

2024-03-19 05:34:52 UTC

If you look at the store code you see a lot of "unsafe" C# code (i.e pointer manipulation)

zigzag312

6 replies

10h17m

2024-03-19 08:08:17 UTC

Shows how flexible C# is.

You can trade high-level expressiveness for low-level control where needed and you don't have to deal with any FFI to do it.

CharlieDigital

5 replies

3h7m

2024-03-19 15:18:01 UTC

C# and .NET are both truly underrated in the wider community -- I think -- because of some early snafus. The late adoption of an OSS model, Windows-centric in the .NET Framework days, too dependent on Visual Studio for a very long time.

Nowadays, I think it's probably the most natural language and platform for teams that need to move on from TypeScript rather than Go or Rust given the similar constructs and idioms.

mdaniel

2 replies

1h49m

2024-03-19 16:36:35 UTC

For a while there I had high hopes that one could pull in Java libraries via their JVM interop on top of the Common Language Runtime, but it doesn't seem to have caught on and I'm not in that ecosystem enough to know why. But yes, for the many excellent reasons you cited it means the library ecosystem is nowhere near that of the JVM, and thus I haven't once considered .net for a new project regardless of how much I love C# the language. Well, that, and my experience with the observability and introspection components of the JVM are second to none

CharlieDigital

1 replies

37m

2024-03-19 17:48:44 UTC

    > the library ecosystem

I don't really see many gaps. There tends to be fewer libraries, but the libraries available generally feel more complete and well thought out because users tend to cluster around the known libraries.

Many of the first party libraries are really, really good. EF Core is a prime example of possibly one of the best ORMs on the market right now in terms of productivity, ergonomics, and performance.

mdaniel

0 replies

30m

2024-03-19 17:55:27 UTC

I don't really see many gaps

isn't that the "works on my machine" of this discussion? What's the Apache Tika for .net then? I don't mean, pdf parsing, I don't mean .docx parsing, I mean a framework for interacting with all their supported types <https://tika.apache.org/2.9.1/formats.html> with one surface area?

ametrau

1 replies

1h6m

2024-03-19 17:19:04 UTC

It’s great but you’re locked into visual studio if you want the most out of it (they put 95% of their effort in tooling there). You want to use visual studio?

CharlieDigital

0 replies

39m

2024-03-19 17:46:36 UTC

I work in .NET on the daily on an M1 MBP using primarily VS Code and occasionally Rider.

There's no need for VS for most (any?) .NET workloads these days. That's why I wrote it was an early mis-step. Nowadays, it's easy to do .NET dev on any platform. In fact, we ship our production runtime to AWS t4g (Arm64) instances.

throwaway38375

5 replies

7h38m

2024-03-19 10:47:42 UTC

Neat project, but I will be sticking with Redis.

I trust Redis to not do something weird with their licensing or pricing in the future.

Plus Redis has billions of production hours under its belt.

It's easier to install and understand.

trelane

0 replies

4h35m

2024-03-19 13:50:46 UTC

Maybe. Patents are certainly a question, because of the license. Microsoft has definitely been aggressive about patents.

sciurus

0 replies

7h9m

2024-03-19 11:16:19 UTC

The licensing has been through a number of changes in the last few years, depending on which distribution and modules you use.

https://redis.com/legal/licenses/

oaiey

0 replies

7h32m

2024-03-19 10:52:56 UTC

100% on that. I think Garnet (and similar YARP) are useful if you build your own tweaked version of that. Otherwise, off-the-shelf or PaaS thing are more useful.

Also, this is Microsoft Research. This thing is code sharing not a product.

The real interesting part is what Azure will do with it.

dmw_ng

0 replies

7h9m

2024-03-19 11:16:37 UTC

Tech has all the attention span of a goldfish https://techcrunch.com/2019/02/21/redis-labs-changes-its-ope...

Someone

0 replies

6h57m

2024-03-19 11:28:31 UTC

This is a project from Microsoft Research, so I would worry a lot less about licensing and pricing than about lack of updates (either features, maintenance, or security ones)

wokwokwok

4 replies

13h25m

2024-03-19 05:00:31 UTC

After the aborted/abandoned attempt to port redis to windows (0) this feels like a second-try at the same thing, but first party.

Of course, as a research project it doesn't have the same stability / support, etc. as redis, but I could easily imagine this rolling into a real product if it's popular.

...and as an MIT license, if nothing else, the code is a fun read. :)

[0] - https://github.com/microsoftarchive/redis?tab=readme-ov-file...

mordae

3 replies

11h2m

2024-03-19 07:23:22 UTC

MIT license, but with CLA.

It's bizzare. What more rights could they possibly want on top of MIT to warrant CLA?

orthoxerox

1 replies

9h17m

2024-03-19 09:08:27 UTC

It's probably a blanket requirement by the MSFT legal team so they can just release everything as "(c) Microsoft" instead of "(c) Microsoft and 999 more contributors, one of which has actually forgot to put their name in the AUTHORS.txt and thus we're technically violating the terms of the license"

withinboredom

0 replies

8h22m

2024-03-19 10:03:09 UTC

Yep, the CLA only applies if you want to merge your changes into the project.

easton

0 replies

9h13m

2024-03-19 09:12:07 UTC

IIRC, it’s to allow them to relicense contributions in the event they decide to move away from MIT?

lxe

3 replies

14h43m

2024-03-19 03:42:08 UTC

A drop-in redis replacement with rather impressive latency and throughput bench figures. Wonder what its like to operate in a non-azure stack in the real world.

nurettin

2 replies

13h20m

2024-03-19 05:05:34 UTC

Are you sure it is drop-in? I don't see any indication of xstream support.

StepWeiwu

1 replies

12h4m

2024-03-19 06:21:42 UTC

The title does say that it can work with existing redis clients but it's unclear if they mean full compatibility.

vanhanenjjv

0 replies

11h50m

2024-03-19 06:34:58 UTC

https://microsoft.github.io/garnet/docs/commands/api-compati...

jiggawatts

3 replies

12h7m

2024-03-19 06:17:58 UTC

I just wish they had something like this embedded into Azure App Service, so it wouldn't be necessarily to use a remote service for caching.

For reference, something commonly used with IIS for ASP.NET apps was to have an out-of-process "session state" store, so that if the web app process restarted, users wouldn't lose their sessions and have to log in from scratch. Sure, you can put this somewhere central like SQL Server, but then every web page request sits there waiting for the session state to load before it does any processing at all. Session state is also typically locked in some way, which has all sorts of performance issues.

The typical current solution is to use Redis for both caching and session state, and this works... okay-ish. Throughput is high, sure, but Redis is a separate resource in Azure and is stupidly expensive. I really don't want to pay Oracle DB prices for something this simple. It's also a bit of a hassle to wire up.

In this article they talk about 300 microsecond response times, but that's irrelevant in any zone-redundant design because all Azure load balancers use random zone selection. So you'll have a web server picked in a random zone, then it'll contact a cache server in a random zone in turn. That server in turn may not have your key and have to contact yet another random zone to get your cache data! Your traffic ping-pongs between data centres. This introduces about 1-3ms of delays, up to 10x higher than the advertised numbers for Garnet.

The ideal scenario would be something like what Microsoft Service Fabric does: it has a "reliable collections"[1] service that runs locally on each host node and replicates to two other nodes. A web app can always read its cached values from the same physical host. The latency can be single-digit microseconds in some cases, which is thousands of times faster than any naively load balanced external service, no matter how well optimised.

I don't want 30% faster than Redis. I want 3,000x faster.

[1] https://learn.microsoft.com/en-us/azure/service-fabric/servi...

withinboredom

2 replies

8h18m

2024-03-19 10:07:07 UTC

This is (one reason why) I love k8s with cilium. I can set up a service to always go to the local service (or any other routing topology). It is great for any kind of dns or application cache.

zinclozenge

1 replies

2h50m

2024-03-19 15:34:50 UTC

Can you point to any resources to read up on this?

withinboredom

0 replies

32m

2024-03-19 17:53:43 UTC

https://docs.cilium.io/en/latest/network/kubernetes/local-re...

legulere

2 replies

11h48m

2024-03-19 06:37:46 UTC

Garnet’s storage layer, called Tsavorite, was forked from our prior open-source project FASTER

Would be interesting to know why it was forked, why the changes can't be incorporated and wether FASTER continues to be developed

withinboredom

0 replies

8h16m

2024-03-19 10:09:14 UTC

“Our” may be the key word here. In other words, it may be political; they wanted the freedom to make changes to their original project without having to deal with PRs to the current maintainers.

compressedgas

0 replies

10h59m

2024-03-19 07:25:56 UTC

I tried to compare the source trees. They are nearly identical except for replacing Faster with Tsavorite which makes a direct comparison much harder as they renamed directories and files.

Weryj

2 replies

8h17m

2024-03-19 10:08:42 UTC

Orleans will love being bundled with this

jeremycarter

1 replies

7h35m

2024-03-19 10:49:56 UTC

Each Orleans node could have a Garnet node, I can see the Aspire configuration now

deskamess

0 replies

5h7m

2024-03-19 13:17:55 UTC

I am just happy there is a native Redis-like available on Windows. I believe there is another RavenDB replacement but this one is 'more' official!

Is it possible to use Aspire locally or is it just a cloud only 'framework'?

ksec

1 replies

2h18m

2024-03-19 16:07:20 UTC

Judging from comments here I guess no one uses memcached anymore ?

dormando

0 replies

1h26m

2024-03-19 16:59:47 UTC

:'(

cpressland

1 replies

9h57m

2024-03-19 08:28:31 UTC

This looks really good. I hope ultimately this replaces the “Azure Cache for Redis” resource. It’s slow, it’s a fork of Redis made to run on Windows, and it takes nearly an hour to create an instance of it.

robertlagrant

0 replies

4h50m

2024-03-19 13:35:38 UTC

I don't know why they wouldn't just run Redis on Linux.

caleblloyd

1 replies

3h3m

2024-03-19 15:22:01 UTC

Garnet being multi-threaded, `MSET` is not atomic. For an atomic version of `MSET`, you would need to express it as a transaction (stored procedure).

I am having trouble understanding this. Why wouldn't they wrap that in a transaction internally for you, and make the command atomic? What other atomicity "gotchas" are there.

danbruc

0 replies

2h33m

2024-03-19 15:52:10 UTC

Because it would mean a performance penalty for everyone who does not need MSET to be atomic without being able to opt out of that transaction. On the other hand, if you want to be a drop-in replacement for Redis, then this is an issue as Redis guarantees atomicity. Maybe you could have a configuration option that lets you select between compatibility and performance, at least if being a drop-in replacement is a design goal.

zokier

0 replies

11h4m

2024-03-19 07:21:06 UTC

So what's the catch, where this doesn't perform well? Would be neat to see benchmarks on smaller instance types too, 72vcpu is quite chunky boi

oaiey

0 replies

9h7m

2024-03-19 09:18:21 UTC

It is interesting to see how Microsoft and the .NET team are building some very impressive hack-your-own-infrastructure project. Yarp is a reverse proxy/API gateway/whatever you do. Now Garnet for memory caches.

Seems they have tons of internal need and are willing to share.

giancarlostoro

0 replies

14h32m

2024-03-19 03:53:44 UTC

Definitely impressive, Microsoft Research comes out with some impressive projects from time to time, must be fun getting paid to do R&D. I wish big companies did more R&D style projects that benefit the industry in general. I sure hope a good company takes over Hashicorp if they're on the market to be bought.

KyleSanderson

0 replies

11h2m

2024-03-19 07:23:25 UTC

Garnet’s storage layer, called Tsavorite, was forked from OSS FASTER, and includes strong database features such as thread scalability, tiered storage support (memory, SSD, and cloud storage), fast non-blocking checkpointing, recovery, operation logging for durability, multi-key transaction support, and better memory management and reuse.

https://www.microsoft.com/en-us/research/blog/introducing-ga...

DonnyV

0 replies

3h50m

2024-03-19 14:35:12 UTC

I'm looking forward to see where they are using this in production.

"After thousands of unit tests and a couple of years working with first-party teams at Microsoft deploying Garnet in production (more on this in future blog posts!), we felt it was time to release it publicly" https://microsoft.github.io/garnet/blog