return to table of content

Garnet – A new remote cache-store from Microsoft Research

west0n
46 replies
14h30m

From the benchmark performance charts(https://microsoft.github.io/garnet/docs/benchmarking/results...), the throughput of the GET command exceeds that of Dragonfly by more than tenfold. While 50% latency is slightly higher than Dragonfly, the 99th percentile is slightly lower than Dragonfly. Both the throughput and latency of Garnet and Dragonfly are far better than Redis, indicating that Redis may require a significant performance optimization.

whimsicalism
21 replies
14h23m

surprised to see a garbage collected language project (C# for Garnet) beat redis/dragonfly

blackoil
10 replies
13h13m

The investment in optimization in CLR or JVM can be huge as they impact millions of applications. While each C / C++ code will have to be hand optimized.

Also limits on number of people in given time who can write optimal C code vs C# will also make managed code better.

rfoo
5 replies
12h25m

Plus that it's easier to have unmanaged code in C# than in Java.

larodi
3 replies
11h36m

Is it possible to have unmanaged code in Java at all… or you mean linking libs and exposing them through Java API in the lang?

rfoo
2 replies
10h56m

Ah, I meant Java code manipulating off-heap memory. For Java I don't think there's a way to mix managed and unmanaged code like you do in C#/C++ CLR.

pjmlp
0 replies
9h4m

Panama has helped into that direction, it isn't like C# though, for feature parity we will eventually need Valhalla, which since they want to keep existing JARs working, it is taking its time.

Ignoring the type systems from Eiffel, Objective-C, Oberon and Modula-3 linage, even though they were inspirations for Java, has been shown to have been a bad decision in hindsight.

neonsunset
0 replies
10h49m

In C# you usually don't want to mix in C++. Manipulating off-heap memory is indeed easy - nowadays pointers to both object interiors, stack and unmanaged memory can be represented as `ref T` where T is byte, int, etc.

These are then subsequently wrapped by `Span<T>` and `ReadOnlySpan<T>` respectively. This way a span can be a slice of memory that can have any origin:

    var fromStack1 = (stackalloc byte[32]);
    var fromStack2 = (Span<byte>)[0x20, 0x20, 0x20, 0x20];
    var fromHeap = new byte[32].AsSpan();
    unsafe
    {
        var ptr = NativeMemory.Alloc(32);
        var fromMalloc = new Span<byte>(ptr, 32);
        // Don't forget to free :)
    }

pjmlp
0 replies
9h13m

True, I still look forward to have Valhala, eventually.

theLiminator
3 replies
11h2m

Couldn't you just replace the CLR and JVM with llvm and gcc and then replace C with assembly and arrive at the same conclusion.

blackoil
1 replies
10h20m

Now commenting beyond my expertise, I understand both ecosystems have developed orthogonally, C#/Java assumes code to be higher level and "dumb" and optimization are in library/CLR/JVM. While C/C++ are developed with code at lower level and developer making sure code is optimized.

Now, I could be talking out of ass, haven't done enough C/C++ coding in over a decade.

balls187
0 replies
4h35m

You can write poorly optimized C#.

And the intel C++ compiler has provided significant performance for decades via optimization without needing to optimize by hand.

Dumble
0 replies
10h43m

I'm not the person you are replying too, but I believe that of course, the pattern holds if you keep shifting it down. I.E. using a faster CPU will speed up all programs running on it, each (already optimized) ASIC has to be optimized further individually.

latchkey
4 replies
13h37m

I think it is rare, but isn't impossible. I remember in the 2000's watching someone do a demo of Java/JVM beating the pants off a C++ application. If I remember correctly, it was something about how the JIT was doing an optimization that you would have had to write assembly in order to optimize it to the same level.

thayne
1 replies
12h29m

There are at least two reasons I can think of why a JIT language with GC can outperform C/c++:

1. Memory management with a GC often has higher throughput. With the downside that you can have high latency when a garbage collection occurs.

2. The JIT compilation can potentially do a better job of optimizing, because it has information about how the code has been run so far.

It is possible for c or c++ (or rust) to get get the first using alternative memory management strategies, and the second by hand optimizing, and/or using profile guided optimization.

pjmlp
0 replies
9h13m

3. The runtime also supports C++, and one emits the same MSIL code sequence that the C++ compiler would generate.

Which is why WASM isn't that great novelty for grey beards.

nurettin
1 replies
13h5m

There were coding competition benchmarks at the time putting javac against the underdeveloped gcc 2.95 of the time. The trick they did with java programs was to allocate a region of memory only once and reuse that whenever they needed more, simulating a stack. Then the programs were benchmarked as hot start and cold start. Hot start timings were used to bench against C++ programs. If the algorithm they used was better, this sometimes resulted in head to head or better performance.

latchkey
0 replies
13h1m

I remember that one too, but this one was not just the hot memory trick, which seems relatively easy. It was entirely JIT based, where the JIT happened to pick a better path than gcc.

Thaxll
1 replies
4h33m

Because it's not the language that matters here, the performance comes from the architecture of the storage among other things.

I've seen Go code best Rust one, really when comparing languages you should look at the same implementation, if the design is completely different its not really comparable.

surajrmal
0 replies
3h50m

While true, gc tends to affect tail latencies as it can increase variance. Obviously not always the case, but without sufficient planning it's likely. Languages with less stop the world problems tend to trade it for throughput, however if the application is not CPU bound it doesn't matter. The same can happen in a non gc language where a similar lack of allocation planning is done, but it tends to be considered more carefully because those languages really put allocations in your face.

pjmlp
0 replies
9h15m

Not all garbage collected languages are made equal, some of them, like C# and .NET, do provide all the performace knobs that are needed for C++ like coding.

People only have to learn how to use them, instead of placing all garbage collected languages into the same basket.

In this specific case, MSIL and .NET were designed to support C++ as well, and languages like C# and F# do have ways to access those features, and even if some feature isn't exposed at the language grammar level, you can emit the same MSIL that C++/CLI would generate.

orthoxerox
0 replies
9h8m

There's a series of blog posts by Oren Eini that shows that C# can be sufficiently fast even when you don't really optimize anything. Beating Redis in every benchmark is a whole another level, of course.

https://ayende.com/blog/197412-B/high-performance-net-buildi...

fulafel
0 replies
12h20m

Seems it wins by using more sophisticated data structures & algorithms, instead of "tight code".

hipadev23
14 replies
13h0m

Redis may require a significant performance optimization.

Redis is single-threaded, it’s simple and effective. I’m not sure it needs optimization, and we have 3 alternatives here.

Garnet however is the first alternative to actually outperform Redis at both low and high levels of concurrency, which is remarkable. I can’t wait to try it out.

gigatexal
11 replies
10h6m

Redis for 99% of the intended use-cases and companies will be just fine. It has always been rock solid when we used it. It was the opposite of DNS: it was never a problem.

moritonal
10 replies
9h20m

Could you expand on "It was the opposite of DNS: it was never a problem."? Feel like I'm missing some interesting history here.

mbreese
2 replies
7h42m

It’s just a common saying “the problem is always DNS”. When you have a weird networking issue, the problem always seems to be DNS. Either some misconfigured DNS entry or DHCP giving you the wrong server address, etc…

And then the problem is confounded by the fact that, ironically, DNS works so well that we don’t think of it as a primary point of failure. So inevitably, when there is a DNS problem, it’s the last thing we check. This just reinforces the idea that the problem is always DNS… because in those long, hard to troubleshoot instances… the problem was DNS.

robertlagrant
1 replies
4h51m

DNS, BGP, or branch prediction. Pick three.

zimpenfish
0 replies
3h54m

Stealing from the greats, "DNS, BGP, off-by-one errors, or branch prediction. Pick three."

akvadrako
2 replies
8h21m

It's weird since DNS is one of the most rock solid systems we have, with redundancy at every level.

1. Clients query a list of servers (IPs) and handle failover when you don't quickly get a reply.

2. Most of those servers at the root and TLD level are actually anycasted from multiple locations globally, so you connect to the closest instance.

3. Those instances are often clusters of physical servers. The big ones have fully redundant networking, so any router or switch failing doesn't take it down. Some run different DNS server software on each physical server, so even software bugs won't take down the whole system.

sumtechguy
1 replies
5h51m

In my years of doing this sort of thing. I only had it really be DNS once. The issues I usually see blamed on DNS is when DNS is 'abused' to do things like load balancing and the TTL is very small. Then you need goofy things like 'sticky sessions' and what not to work around that.

byteknight
0 replies
2h50m

Assuming you didnt manage connectivity of systems very long or very many. I find it difficult that not being true with you saying it happened "once".

Terretta
1 replies
6h1m

Meme this is referencing should be "It's always DNS config."

jameshart
0 replies
1h54m

Mostly. But it can also be DNS request volume, DNS cache expiry, DNS response times, DNS connection contention, DNS security, DNS error handling, or DNS client configuration.

All variants of ‘DNS is working perfectly, just your expectations of how it will work in your situation are not completely correct’.

stephenr
0 replies
6h37m

People who don't understand networking like to blame DNS for every problem they experience I guess?

klohto
0 replies
9h9m

isitdns.com

LeonidBugaev
1 replies
9h26m

Plus in production, with high load, Redis cluster is way more common, which kind of solve single-threaded concern.

CuriouslyC
0 replies
5h56m

I've always found redis cluster to just bring problems with it.

west0n
8 replies
14h23m

What surprises me the most is that this project is developed in C#, while Dragonfly is developed in C++, and Redis is in C.

amir734jj
7 replies
12h50m

If you look at the store code you see a lot of "unsafe" C# code (i.e pointer manipulation)

zigzag312
6 replies
10h17m

Shows how flexible C# is.

You can trade high-level expressiveness for low-level control where needed and you don't have to deal with any FFI to do it.

CharlieDigital
5 replies
3h7m

C# and .NET are both truly underrated in the wider community -- I think -- because of some early snafus. The late adoption of an OSS model, Windows-centric in the .NET Framework days, too dependent on Visual Studio for a very long time.

Nowadays, I think it's probably the most natural language and platform for teams that need to move on from TypeScript rather than Go or Rust given the similar constructs and idioms.

mdaniel
2 replies
1h49m

For a while there I had high hopes that one could pull in Java libraries via their JVM interop on top of the Common Language Runtime, but it doesn't seem to have caught on and I'm not in that ecosystem enough to know why. But yes, for the many excellent reasons you cited it means the library ecosystem is nowhere near that of the JVM, and thus I haven't once considered .net for a new project regardless of how much I love C# the language. Well, that, and my experience with the observability and introspection components of the JVM are second to none

CharlieDigital
1 replies
37m

    > the library ecosystem
I don't really see many gaps. There tends to be fewer libraries, but the libraries available generally feel more complete and well thought out because users tend to cluster around the known libraries.

Many of the first party libraries are really, really good. EF Core is a prime example of possibly one of the best ORMs on the market right now in terms of productivity, ergonomics, and performance.

mdaniel
0 replies
30m

I don't really see many gaps

isn't that the "works on my machine" of this discussion? What's the Apache Tika for .net then? I don't mean, pdf parsing, I don't mean .docx parsing, I mean a framework for interacting with all their supported types <https://tika.apache.org/2.9.1/formats.html> with one surface area?

ametrau
1 replies
1h6m

It’s great but you’re locked into visual studio if you want the most out of it (they put 95% of their effort in tooling there). You want to use visual studio?

CharlieDigital
0 replies
39m

I work in .NET on the daily on an M1 MBP using primarily VS Code and occasionally Rider.

There's no need for VS for most (any?) .NET workloads these days. That's why I wrote it was an early mis-step. Nowadays, it's easy to do .NET dev on any platform. In fact, we ship our production runtime to AWS t4g (Arm64) instances.

throwaway38375
5 replies
7h38m

Neat project, but I will be sticking with Redis.

I trust Redis to not do something weird with their licensing or pricing in the future.

Plus Redis has billions of production hours under its belt.

It's easier to install and understand.

trelane
0 replies
4h35m

Maybe. Patents are certainly a question, because of the license. Microsoft has definitely been aggressive about patents.

sciurus
0 replies
7h9m

The licensing has been through a number of changes in the last few years, depending on which distribution and modules you use.

https://redis.com/legal/licenses/

oaiey
0 replies
7h32m

100% on that. I think Garnet (and similar YARP) are useful if you build your own tweaked version of that. Otherwise, off-the-shelf or PaaS thing are more useful.

Also, this is Microsoft Research. This thing is code sharing not a product.

The real interesting part is what Azure will do with it.

Someone
0 replies
6h57m

This is a project from Microsoft Research, so I would worry a lot less about licensing and pricing than about lack of updates (either features, maintenance, or security ones)

wokwokwok
4 replies
13h25m

After the aborted/abandoned attempt to port redis to windows (0) this feels like a second-try at the same thing, but first party.

Of course, as a research project it doesn't have the same stability / support, etc. as redis, but I could easily imagine this rolling into a real product if it's popular.

...and as an MIT license, if nothing else, the code is a fun read. :)

[0] - https://github.com/microsoftarchive/redis?tab=readme-ov-file...

mordae
3 replies
11h2m

MIT license, but with CLA.

It's bizzare. What more rights could they possibly want on top of MIT to warrant CLA?

orthoxerox
1 replies
9h17m

It's probably a blanket requirement by the MSFT legal team so they can just release everything as "(c) Microsoft" instead of "(c) Microsoft and 999 more contributors, one of which has actually forgot to put their name in the AUTHORS.txt and thus we're technically violating the terms of the license"

withinboredom
0 replies
8h22m

Yep, the CLA only applies if you want to merge your changes into the project.

easton
0 replies
9h13m

IIRC, it’s to allow them to relicense contributions in the event they decide to move away from MIT?

lxe
3 replies
14h43m

A drop-in redis replacement with rather impressive latency and throughput bench figures. Wonder what its like to operate in a non-azure stack in the real world.

nurettin
2 replies
13h20m

Are you sure it is drop-in? I don't see any indication of xstream support.

StepWeiwu
1 replies
12h4m

The title does say that it can work with existing redis clients but it's unclear if they mean full compatibility.

jiggawatts
3 replies
12h7m

I just wish they had something like this embedded into Azure App Service, so it wouldn't be necessarily to use a remote service for caching.

For reference, something commonly used with IIS for ASP.NET apps was to have an out-of-process "session state" store, so that if the web app process restarted, users wouldn't lose their sessions and have to log in from scratch. Sure, you can put this somewhere central like SQL Server, but then every web page request sits there waiting for the session state to load before it does any processing at all. Session state is also typically locked in some way, which has all sorts of performance issues.

The typical current solution is to use Redis for both caching and session state, and this works... okay-ish. Throughput is high, sure, but Redis is a separate resource in Azure and is stupidly expensive. I really don't want to pay Oracle DB prices for something this simple. It's also a bit of a hassle to wire up.

In this article they talk about 300 microsecond response times, but that's irrelevant in any zone-redundant design because all Azure load balancers use random zone selection. So you'll have a web server picked in a random zone, then it'll contact a cache server in a random zone in turn. That server in turn may not have your key and have to contact yet another random zone to get your cache data! Your traffic ping-pongs between data centres. This introduces about 1-3ms of delays, up to 10x higher than the advertised numbers for Garnet.

The ideal scenario would be something like what Microsoft Service Fabric does: it has a "reliable collections"[1] service that runs locally on each host node and replicates to two other nodes. A web app can always read its cached values from the same physical host. The latency can be single-digit microseconds in some cases, which is thousands of times faster than any naively load balanced external service, no matter how well optimised.

I don't want 30% faster than Redis. I want 3,000x faster.

[1] https://learn.microsoft.com/en-us/azure/service-fabric/servi...

withinboredom
2 replies
8h18m

This is (one reason why) I love k8s with cilium. I can set up a service to always go to the local service (or any other routing topology). It is great for any kind of dns or application cache.

zinclozenge
1 replies
2h50m

Can you point to any resources to read up on this?

legulere
2 replies
11h48m

Garnet’s storage layer, called Tsavorite, was forked from our prior open-source project FASTER

Would be interesting to know why it was forked, why the changes can't be incorporated and wether FASTER continues to be developed

withinboredom
0 replies
8h16m

“Our” may be the key word here. In other words, it may be political; they wanted the freedom to make changes to their original project without having to deal with PRs to the current maintainers.

compressedgas
0 replies
10h59m

I tried to compare the source trees. They are nearly identical except for replacing Faster with Tsavorite which makes a direct comparison much harder as they renamed directories and files.

Weryj
2 replies
8h17m

Orleans will love being bundled with this

jeremycarter
1 replies
7h35m

Each Orleans node could have a Garnet node, I can see the Aspire configuration now

deskamess
0 replies
5h7m

I am just happy there is a native Redis-like available on Windows. I believe there is another RavenDB replacement but this one is 'more' official!

Is it possible to use Aspire locally or is it just a cloud only 'framework'?

ksec
1 replies
2h18m

Judging from comments here I guess no one uses memcached anymore ?

dormando
0 replies
1h26m

:'(

cpressland
1 replies
9h57m

This looks really good. I hope ultimately this replaces the “Azure Cache for Redis” resource. It’s slow, it’s a fork of Redis made to run on Windows, and it takes nearly an hour to create an instance of it.

robertlagrant
0 replies
4h50m

I don't know why they wouldn't just run Redis on Linux.

caleblloyd
1 replies
3h3m

Garnet being multi-threaded, `MSET` is not atomic. For an atomic version of `MSET`, you would need to express it as a transaction (stored procedure).

I am having trouble understanding this. Why wouldn't they wrap that in a transaction internally for you, and make the command atomic? What other atomicity "gotchas" are there.

danbruc
0 replies
2h33m

Because it would mean a performance penalty for everyone who does not need MSET to be atomic without being able to opt out of that transaction. On the other hand, if you want to be a drop-in replacement for Redis, then this is an issue as Redis guarantees atomicity. Maybe you could have a configuration option that lets you select between compatibility and performance, at least if being a drop-in replacement is a design goal.

zokier
0 replies
11h4m

So what's the catch, where this doesn't perform well? Would be neat to see benchmarks on smaller instance types too, 72vcpu is quite chunky boi

oaiey
0 replies
9h7m

It is interesting to see how Microsoft and the .NET team are building some very impressive hack-your-own-infrastructure project. Yarp is a reverse proxy/API gateway/whatever you do. Now Garnet for memory caches.

Seems they have tons of internal need and are willing to share.

giancarlostoro
0 replies
14h32m

Definitely impressive, Microsoft Research comes out with some impressive projects from time to time, must be fun getting paid to do R&D. I wish big companies did more R&D style projects that benefit the industry in general. I sure hope a good company takes over Hashicorp if they're on the market to be bought.

KyleSanderson
0 replies
11h2m

Garnet’s storage layer, called Tsavorite, was forked from OSS FASTER, and includes strong database features such as thread scalability, tiered storage support (memory, SSD, and cloud storage), fast non-blocking checkpointing, recovery, operation logging for durability, multi-key transaction support, and better memory management and reuse.

https://www.microsoft.com/en-us/research/blog/introducing-ga...

DonnyV
0 replies
3h50m

I'm looking forward to see where they are using this in production.

"After thousands of unit tests and a couple of years working with first-party teams at Microsoft deploying Garnet in production (more on this in future blog posts!), we felt it was time to release it publicly" https://microsoft.github.io/garnet/blog