return to table of content

GoFetch: New side-channel attack using data memory-dependent prefetchers

jerf
43 replies
3d1h

As long as we're getting efficiency cores and such, maybe we need some "crypto cores" added to modern architectures, that make promises specifically related to constant time algorithms like this and promise not to prefetch, branch predict, etc. Sort of like the Itanium, but confined to a "crypto processor". Given how many features these things wouldn't have, they wouldn't be much silicon for the cores themselves, in principle.

This is the sort of thing that would metaphorically drive me to drink if I were implementing crypto code. It's an uphill battle at the best of times, but even if I finally get it all right, there's dozens of processor features both current and future ready to blow my code up at any time.

FiloSottile
33 replies
3d1h

Speaking as a cryptography implementer, yes, these drive us up the wall.

However, crypto coprocessors would be a tremendously disruptive solution: we'd need to build mountains of scaffolding to allow switching to and off these cores, and to share memory with them, etc.

Even more critically, you can't just move the RSA multiplication to those cores and call it a day. The key is probably parsed from somewhere, right? Does the parser need to run on a crypto core? What if it comes over the network? And if you even manage to protect all the keys, what if a CPU side channel leaks the message you encrypted? Are you ok with it just because it's not a key? The only reason we don't see these attacks against non-crypto code is that finding targets is very application specific, while in crypto libraries everyone can agree leaking a key is bad.

No, processor designers "just" need to stop violating assumptions, or at least talk to us before doing it.

bee_rider
26 replies
3d

I don’t think the security community is also going to become experts in chip design, these are two full skill sets that are already very difficult to obtain.

We must stop running untrustworthy code on modern full-performance chips.

The feedback loop that powers everything is: faster chips allow better engineering and science, creating faster chips. We’re not inserting the security community into that loop and slowing things down just so people can download random programs onto their computers and run them at random. That’s just a stupid thing to do, there’s no way to make it safe, and there never will be.

I mean we’re talking about prefetching. If there was a way to give ram cache-like latencies why wouldn’t the hardware folks already have done it?

titzer
12 replies
2d23h

I almost gave you up an upvote until your third paragraph, but I have to now give a hard disagree. We're running more untrusted code than ever, and we absolutely should trust it less than ever and have hardware and software designed with security in mind. Security should be priority #1 from here on out. We are absolutely awash in performance and memory capacity but keep getting surprised by bad security outcomes because it's been second fiddle for too long.

Software is now critical infrastructure in modern society, akin to the power grid and telephone lines. It's a strategic vulnerability to neglect security, and it must happen at all levels of the software and hardware stack. Meaning, trying to crash an enemy's entire society by bricking all of its computers and send them back to the dark ages in milliseconds. I fundamentally don't understand the mindset of people who want to take that kind of risk for a 10% boost in their games' FPS[1].

Part of that is paying back the debt that decades of cutting corners has yielded us.

In reality, the vast majority of the 1000x increase in performance and memory capacity over the past four decades has come from shrinking transistors and increasing clockspeeds and memory density--the 1 or 5 or 10% gains from turning off bounds checks or prefetching aren't the lion's share. And for the record, turning off bounds checks is monumentally stupid, and people should be jailed for it.

[1] I'm exaggerating to make a point here. What we trade for a little desktop or server performance is an enormous, pervasive risk. Not just melting down in a cyberwar, but the constant barrage of intrusion and leaks that costs the economy billions upon billions of dollars per year. We're paying for security, just at the wrong end.

bee_rider
7 replies
2d22h

I fundamentally don't understand the mindset of people who want to take that kind of risk for a 10% boost in their games' FPS[1]

Me either. But, lots of engineers are out there just writing single threaded matlab and python codes with lots of data-dependencies and just hoping the system manages to do a good job (for those operations that can’t be offloaded to BLAS). So I'm glad gamer dollars subsidize the development of fast single threaded chips that handle branchy codes well.

In reality, the vast majority of the 1000x increase in performance and memory capacity over the past four decades has come from shrinking transistors and increasing clockspeeds and memory density

I disagree, modern designs include deep pipelines, lots of speculation, and complex caches because that’s the only way to spend that higher transistor budget for higher clocks and compensate for the fact that memory latencies haven’t kept up.

Part of that is paying back the debt that decades of cutting corners has yielded us.

It will be tough, but yeah, server and mainframe users need to roll back the decision to repurpose consumer focus chips like the x86 and arm families. RISC-V is looking good though and seems open enough that maybe they can pick-and-choose which features they take.

I almost gave you up an upvote until your third paragraph, but I have to now give a hard disagree.

I’m not too worried about votes on this post; this site has lots of web devs and cloud users, pointing out that the ecosystem they rely on is impossible to secure is destined to get lots of downvotes-to-disagree.

saagarjha
6 replies
2d21h

How is RISC-V going to solve anything here?

snvzz
4 replies
2d10h

It is significantly less complex yet without compromising anything. This means a larger portion of a chip's design effort can be put elsewhere such as into preventing side channel attacks.

saagarjha
2 replies
2d10h

I don't really see how the design of RISC-V avoids the need to have a DMP

snvzz
1 replies
2d7h

I don't really see how the design of RISC-V avoids the need to have a DMP

Because it does not. I also do not see where, if at all, such claim was made.

saagarjha
0 replies
2d7h

Perhaps you should explain how this design effort spent preventing side-channel attacks is spent, then?

epcoa
0 replies
2d3h

Anything specific?

bee_rider
0 replies
2d21h

It isn’t a sure thing. Just, since it is a more open ecosystem, maybe the designers of chips that need to be able to safely run untrusted code can still borrow some features from the general population.

I think it is basically impossible to run untrusted code safely or to build sand-proof sandboxes, but I thought the rest of my post was too pessimistic.

saagarjha
2 replies
2d21h

Turning off bounds checks is like a 5% performance penalty. Turning off prefetching is like using a computer from twenty years ago.

crest
1 replies
2d16h

Turning off prefetching while running crypto code would be a performance gain before you can implement the algorithms safely without even more expensive and fragile software mitigations. Just give me the option of configuring parts of the caches (at least data + instructions + TLBs) as scratchpad and and a "run without timing side-channels pretty please" bit with a clearly defined API contract and accessible (by default) to unprivileged userspace code. Lots of cryptographic algorithms have such small working sets that they would profit from a constant-time accessible scratchpad in the L1d cache if they get to use data dependent addresses into it again.

olliej
0 replies
2d1h

Happily there are mechanisms to do just that, specifically for the purpose of implementing cryptography (I commented at the top level and don't want to just spamming the url).

aseipp
0 replies
2d20h

I agree that hardware/software codesign are critical to solving things like this, but features like prefetching, speculation, and prediction are absolutely critical to modern pipelines and broadly speaking are what enable what we think of as "modern computer performance." This has been true for over 20 years now. In terms of "overhead" it's not in the same ballpark -- or even the same sport, frankly -- as something like bounds checking or even garbage collection. Hell, if the difference was within even one order magnitude, they'd have done it already.

FiloSottile
10 replies
3d

download random programs onto their computers and run them at random

To be clear that includes what we're all doing by downloading and running Javascript to read HN.

Maybe I can say "don't run adversarial code on my same CPU" and only care about over-the-network CPU side-channels (of which there are still some), because I write Go crypto, but it doesn't sound like something my colleagues writing browser code can do.

arp242
6 replies
2d23h

Is this exploitable through JavaScript?

In general from what I've seen, most of these JS-based CPU exploits didn't strike me as all that practical in real world conditions. I mean, it is a problem, but not really all that worrying.

cryptonector
5 replies
2d19h

Is this exploitable through JavaScript?

Why wouldn't it be?

olliej
2 replies
2d1h

Because JS/html provides APIs to perform cryptography(I can't recall whether the cryptography specs are part of ES or HTML/DOM) - if you try to implement constant time cryptography in JS you will run into a world of hurt due to the entire concept of "fast JS" being dependent on heavy speculation, and lots of exciting variations in timing of even "primitive" operations.

cryptonector
1 replies
1d22h

No, the attack would be implemented in JS, not the victim code (though, that too, but that's not what's interesting here).

olliej
0 replies
1d18h

Ah, you’re concerned about person using js to execute the side channel portion of the attack, not the bit creating the side channel :)

samatman
1 replies
2d13h

How is JavaScript going to run a chosen-input attack against one of your cores for an hour?

cryptonector
0 replies
1d22h

If you leave a tab open that's running that JS..

bee_rider
1 replies
2d23h

Unfortunately somebody has tricked users to leaving JavaScript on for every site, it is a really bad situation.

anonymous-panda
0 replies
2d23h

Security and utility are in opposing balances often. The safest possible computer is one buried far underground without any cables in a faraday cage. Not very useful.

We’re not inserting the security community into that loop and slowing things down just so people can download random programs onto their computers and run them at random. That’s just a stupid thing to do, there’s no way to make it safe, and there never will be.

Setting aside JavaScript, you can see this today with cloud computers which have largely displaced private clouds. These run untrusted code on shared computers. Fundamentally that’s what they’re doing because that’s what you need for economies of scale, durability, availability, etc. So figuring out a way to run untrusted code on another machine safely is fundamentally a desirable goal. That’s why people are trying to do homomorphic encryption - so that the “safely” part can go both ways and both the HW owner and the “untrusted” SW don’t need to trust each other to execute said code.

csande17
0 replies
2d22h

Speak for yourself; I've got JavaScript disabled on news.ycombinator.com and it works just fine.

tadfisher
1 replies
2d23h

The feedback loop that powers everything is: faster chips allow better engineering and science, creating faster chips. We’re not inserting the security community into that loop and slowing things down just so people can download random programs onto their computers and run them at random. That’s just a stupid thing to do, there’s no way to make it safe, and there never will be.

Note that in the vast majority of cases, crypto-related code isn't what we spend compute cycles on. If there was a straightforward, cross-architecture mechanism to say, "run this code on a single physical core with no branch prediction, no shared caches, and using in-order execution" then the real-world performance impact would be minimal, but the security benefits would be huge.

bee_rider
0 replies
2d22h

I’m in favor of adding some horrible in-order, no speculation, no prefetching, 5 stage pipeline architectures 101 core which can be completely verified and bulletproof to chips.

But the presence of this bulletproof core would not solve the problem of running bad code on modern hardware, unless all untrusted code is run on it.

olliej
2 replies
2d1h

No, processor designers "just" need to stop violating assumptions, or at least talk to us before doing it.

No, you don't get to say processor designers need to stop violating your assumptions. You need to stop making assumptions about behaviour if that behavior is important (for cryptographic or other reasons). Your assumptions being faulty are not a valid justification, because that would mean no one could have ever added any caches or predictors at any point because that would be "violating your assumptions". Also lets be real here: even if "not violating your assumptions" was a reasonable position to take, it is not reasonable in any way to make any kind of assumption about modern processors (<30 years old) processors not caching, predicting, buffering, or speculating anything.

If you care about constant time behaviour you should either be writing your code such that it is timing agnostic, or you could read the platform documentation rather than making assumptions. The apple documentation tells you how to actually get constant time behavior, rather than making assumptions.

FiloSottile
1 replies
2d1h

you should either be writing your code such that it is timing agnostic, or you could read the platform documentation rather than making assumptions

Have you even read the paper? Especially the part where the attack applies to everyone’s previous idea of “timing agnostic” code, and the part where Apple does not respect the (new) DIT flag on M1/M2?

olliej
0 replies
1d22h

No, the paper targets "constant time" operations, not timing agnostic.

The paper even mentions that blinding works, and that to me is the canonical "separate the time and power use of the operation from the key material" solution. The complaint about this approach in the paper being is that it would be specific to these prefetchers, but it seems this type of prefetcher is increasingly prevalent across multiple cpus and architectures so it seems unlikely to be apple specific for long. The paper even mentions new intel processors have these prefetchers and so necessarily provide functionality to disable them there too. This is all before we get to the numerous prior articles showing that key extraction via side channels is already possible with these constant time algorithms (a la last months(I think?) "get the secrets from the power led" paper). The solution is to use either specialized hardware (as done for AES) or timing agnostic code.

Trying to create side channel free code by clever construction based on assumptions about power and performance of all hardware based on a simple model of how CPUs behave is going to just change the side channels, not remove them. If it's a real attack vector that you are really concerned about you should probably just do best effort and monitor for repeated key reuse or the like, and then start blinding at some threshold.

saagarjha
0 replies
2d21h

Processor designers are very unlikely to do that for you, because everyone not working on constant time crypto gives them a whole lot of money to keep doing this. The best you might get is a mode where the set of assumptions they violate is reduced.

heresie-dabord
0 replies
1d3h

processor designers "just" need to stop violating assumptions

"Security" rarely (almost never) seems to be part of any commercially-significant spec.

Almost as if by design...

eximius
0 replies
1d21h

Wouldn't that "just" allow someone to see if a key was present (and any information that informs) but dramatically help prevent secret key extraction?

lxgr
0 replies
2d3h

The secure enclave is not a general-purpose/user-programmable processor. It only runs Apple-signed code, and access is only exposed via the Keychain APIs, which only support a very limited set of cryptographic operations.

Presumably latency for any operation is also many orders of magnitude higher than in-thread crypto, so that just doesn't work for many applications.

john_alan
0 replies
2d7h

If you look at the cryptokit API docs the Secure Enclave essentially only supports P-256. Which is maybe why they didn’t include ECC crypto in the examples.

bee_rider
1 replies
3d

One option would be for people to stop downloading viruses and then running them.

SpaghettiCthulu
0 replies
21h53m

Except when these vulnerabilities are exploitable from JavaScript in your web browser.

sargun
0 replies
3d

I think what's more likely is "mode switching" in which you can disable these components of the CPU for a certain section of executing code (the abstraction would probably be at the thread level).

gabrielhidasy
0 replies
3d1h

Many modern architectures have crypto extensions, usually to accelerate a few common algorithms, maybe it would be good to add a few crypto-primitives instructions to allow new algorithms?

a-dub
0 replies
2d9h

see DIT and DOIT flags referenced in the paper and in the faq question about mitigations. newer CPUs apparently provide functions to do just that.

Joel_Mckay
0 replies
3d

Encrypted bus mmu have existed since the 1990's.

However, the trend to consumer-grade hardware for cost-optimized cloud architecture ate the CPU market.

Thus, the only real choice now is consumer CPUs even in scaled applications.

martinky24
12 replies
3d2h

Why does every attack needs its own branding, marketing page, etc...? Genuine question.

sapiogram
3 replies
3d2h

Well, names are useful for the same reason people's names are useful. The rest just kinda happens naturally, I think.

martinky24
1 replies
3d1h

Name makes enough sense. "Branding, marketing page, etc..." was my question.

"Happens naturally" isn't really an answer.

ziddoap
0 replies
2d20h

Is your position that any write-up about an attack must be plain text only, and must not use its own URL?

I truly cannot understand why this is brought up so often. You aren't paying for it, it doesn't hurt you in any way, it detracts nothing from the findings (in fact, it makes the findings easier to discuss), etc. There is no downside I can think of.

Can you share what the downsides of a picture of a puppy and a $5 domain are? Sorry, "branding" and "marketing page"?

Or at least, maybe you can share what you think would be a more preferable way?

yborg
0 replies
3d1h

Yes, it saves time vs. starting a discussion on "that crypto cache sidechannel attack that one team in China found".

saagarjha
2 replies
2d21h

Why does the comments of every such attack need a question about why it has its own branding, marketing page, etc…? Genuine question.

(Seriously, this comes up every time, just do a search for it if you actually want to figure out why.)

0xedd
1 replies
2d5h

Because it makes it feel like you need some marketing department if you want to publish your work. Rather than give _only_ the work merit, we give too much merit to its colorful presentation. That shouldn't be the case.

howinteresting
0 replies
1d19h

Good communication has always been a part of making sure your work is influential.

xena
0 replies
3d2h

So people talk about it

modeless
0 replies
3d1h

Science isn't just about discovering information. Dissemination is critical. Communicating ideas is just as important as discovering them and promotion is part of effective communication. It's natural and healthy for researchers to promote their ideas.

fruktmix
0 replies
3d1h

It's science these days. They need funding, one way is to get people to recognize the importance of their work

FiloSottile
0 replies
3d1h

Names are critical to enable discussion.

The "marketing" page is where documentation is. Summaries that don't require reading a whole academic papers are a good thing, and they are the place where all the different links are collected. Same reason software has READMEs.

Logos... are cute and take 10-60 minutes? If you spend months on some research might as well take the satisfaction of giving it a cute logo, why not.

12_throw_away
0 replies
2d

Dunno, but I'm glad they do it. In other fields of research, researchers often purposely hold off on naming something, so that the community kind of has no choice but to name it after the authors themselves.

Eg in my field, they would have called Spectre "the Horn-Genkin-Hamburg vulnerability" or something. Which one of these is hard-to-remember jargon, and which one is catchy and evocative?

theobservor
8 replies
3d1h

The end result of these side channel attacks would be to have CPUs that perform no optimizations at all and all opcodes would run in the same number of cycles in all situations. But that will never happen. No one wants a slow CPU.

As long as these effects cannot be exploited remotely, it's not a concern. Of course multi-tenant cloud-based virtualization would be a no go.

bee_rider
5 replies
2d23h

We need to drop all the untrusted code on some horrible in-order, no speculative execution, no prefetching, 5 stage pipeline from architectures 101 class core.

graemep
2 replies
2d22h

It might be preferable.

We have ridiculously fast hardware. In many use cases (client machines in particular) we do not usually really need that. I would gladly drop features for security.

kbolino
0 replies
2d3h

If you account for all of the CPU "features" that can be exploited, you're looking at probably 80% of what makes it "ridiculously fast". If you also account for all of the ways in which the entire modern hardware ecosystem can be exploited, you're probably looking at gross performance loss of over 90% to remove these "holes".

An overclocked 486 PC that can only run a single program at a time and isn't continuously connected to a network might be very secure, but replacing every modern computer with something like it will not be even remotely feasible. In most situations, it would be better to have some risk tolerance, and couple modern hardware with mitigations, disposability, and supply-chain security instead.

bee_rider
0 replies
2d22h

It will also be good because users will become more annoyed when people try to sneak full programs into their websites, hopefully resulting in a generally less bloated internet.

wmf
1 replies
2d19h

If untrusted code includes JavaScript that would make Web apps ridiculously slow. (I know what you're thinking...)

bee_rider
0 replies
2d18h

Oh no, a totally unexpected side effect, less complex webpages.

lenerdenator
0 replies
2d4h

multi-tenant cloud-based virtualization

And that's why I'm not as worried about this as I was about the same vulnerability in Intel chips a few years ago.

There are a few cloud service providers that will rent you clock cycles on a rack-mounted Mac Mini, but not many, and even then they're for highly-specific workloads or build tasks. I suppose that's a problem for people paying far out the butt for that kind of service, but the vast majority of Apple Silicon devices are never, ever going to host cloud services.

_factor
0 replies
2d23h

This is why high core counts and isolation matter. Isolate the code to a specific core. Assuming everything is working as intended, an exploit won’t compromise other tenants.

saagarjha
4 replies
2d21h

Can the DMP be disabled?

Yes, but only on some processors. We observe that the DIT bit set on m3 CPUs effectively disables the DMP. This is not the case for the m1 and m2.

Surely there is a chicken bit somewhere to do this?

john_alan
3 replies
2d7h

I’ve often wondering how are these bits set?

Like can you do it from Swift? Or need assembly?

saagarjha
2 replies
2d7h

It's probably in a MSR accessible from the kernel?

lxgr
1 replies
2d3h

It seems to be userspace accessible: https://developer.apple.com/documentation/xcode/writing-arm6...

The kernel would have to be aware of it in order to be able to restore its state across context switches though, unless it's part of a set of registers that is automatically persisted. But given that Apple is publicly documenting this flag, I suppose it is.

Here's an interesting conversation by the Go developers from as early as 2021 being suspicious of DIT: https://github.com/golang/go/issues/49702

saagarjha
0 replies
1d16h

No, that’s something else. I’m talking about the thing that disables DMP, which would not be part of the standard architecture.

woadwarrior01
2 replies
3d2h

Reminded me of the Augury attack[1] from 2022, which also exploits the DMP prefetcher on Apple Silicon CPUs.

[1]: https://www.prefetchers.info

loeg
0 replies
3d1h

Yes, they specifically mention that in the article and FAQ.

Findecanor
0 replies
3d1h

BTW. Three of the authors of GoFetch where also behind Augury.

0xedd
2 replies
2d5h

Why does Apple have so many hardware backd... innocent bugs?

olliej
1 replies
2d1h

why do we even need caches?

why do we need prefetchers?

But in answer to your bullshit backdoor conspiracy theory (JFC processors have caches and timing variants because people want fast CPUs, you cannot have constant time and fast, apple is not the only company with prefetchers), here's some apple provided documentation on how disable the hardware backd... enable constant time operations specifically for the purpose of cryptography, almost like it's designed into the hardware. So weird. https://developer.apple.com/documentation/xcode/writing-arm6...

howinteresting
0 replies
1d19h

The M1 and M2 don't have that bit.

xiconfjs
1 replies
3d1h

From the paper: "OpenSSL reported that local side-channel attacks (...) fall outside of their threat model. The Go Crypto team considers this attack to be low severity".

slowmovintarget
0 replies
1d1h

So malware scanning and virus scanners just became relevant for Macs and IPads.

(Compromise must be running on the same hardware.)

john_alan
0 replies
2d22h

On reading it seems a lib like libsodium can simply set the disable bit prior to cryptographic operations that are sensitive on M3 and above.

Also looks like they need to predetermine aspects of the key.

Very cool but I don’t think it looks particularly practical.

d-z-m
0 replies
2d15h

what's the attack vector here? access to an encrypt oracle and co-location on the target machine?

Shtirlic
0 replies
2d21h

Is it naive to ask whether implementing this mitigation would impact performance and memory interaction speed?