return to table of content

Gaining kernel code execution on an MTE-enabled Pixel 8

Retr0id
14 replies
1d16h

This is great research and a great write-up, but I'm a little (pleasantly) surprised to see it on GitHub's blog.

Does anyone know what their "business reason" for doing research like this is? (not that a business reason should be needed, but like I said, I'm a bit surprised to see it here)

mmsc
7 replies
1d14h

Unlike other departments, security teams often don’t have anything to do so this research is a good use of free time.

zq
2 replies
1d14h

What is this comment? Github security research lab solely focuses on security research and publishes some of the best research in the industry.

Man Yue Mo is a security researcher who finds some of the most complex and impactful bugs in the industry like crbug.com/40065473

mrb
1 replies
1d12h

Seeing mmsc's post history, especially computer security related comments, I presume he was just being sarcastic :)

mmsc
0 replies
1d11h

Indeed.

Although, it wouldn’t be abnormal for a security team to have free time, and dedicate it to researching an emerging technology whether it directly contributes to the business goals or not. Of course I’m not talking about a security team that is reading log files from their SIEM while sitting in a SOC.

ramon156
1 replies
1d10h

Sadly people didn't see the sarcasm in this comment

ambichook
0 replies
16h6m

i understand people disliking using tone indicators, especially when they can ruin a joke, but they are really wonderful things that can prevent misunderstandings like this online

ssklash
0 replies
1d14h

How many security teams have you been on? Definitely ones with less work than I've been on...

frosting1337
0 replies
1d14h

Wow, that's just absolutely incorrect. Ignoring that tons of security teams are actually stupidly busy, this person's specific role at GitHub is security research. GitHub have security products for code security, which he ties into.

richardwhiuk
0 replies
1d8h

A little surprising that hasn't been shifted into MSRC, but GitHub operates very independently inside Microsoft.

mdriley
0 replies
1d12h

Man Yue Mo worked at Semmle (https://blog.sonatype.com/steps-to-responsible-disclosure) before it was acquired by GitHub (https://github.blog/2019-09-18-github-welcomes-semmle/). That research function has carried on as the GitHub Security Lab.

Semmle built CodeQL, now offered by GitHub (https://docs.github.com/en/code-security/code-scanning/intro...), which GitHub and Microsoft (see https://www.microsoft.com/en-us/security/blog/2023/11/02/ann...) want to associate with "deep security insight".

So they continue to fund this kind of novel security research, for which security practitioners across industry are grateful.

fragmede
0 replies
1d16h

They got bought by Microsoft and so have the resources to sponsor research, including of this kind. There’s a GitHub app, and the security of that app is not outside their purview. if an attacker manages to install a lurky app on your phone, they could do stuff as you. if you're someone with GitHub clout, that could be real damaging so it's in their interests to find such vulnerabilities.

devsda
0 replies
1d16h

They have hosted action runners for arm too. So, they may have an interest in checking and verifying the security capabilities of arm hardware with MTE for sandboxing.

bmacho
0 replies
1d11h

Does anyone know what their "business reason" for doing research like this is? (not that a business reason should be needed, but like I said, I'm a bit surprised to see it here)

I think it's basically basic research [0]. In first order reasoning, github, as a product doesn't really need android security experts. But employing them has some potential long-term benefits.

[0]: https://en.wikipedia.org/wiki/Basic_research

transpute
11 replies
1d14h

Probabilistic Arm MTE memory safety is a stepping stone to deterministic CHERI hardware, https://saaramar.github.io/memory_safety_blogpost_2022/ & https://news.ycombinator.com/item?id=39668053

  The right kind of mitigations targets the 1st order primitive; the root cause of the bug.

  Hardware solutions: CHERI (Morello, CheriIoT), MTE
  Software mitigations: kalloc_type+dataPAC, AUTOSLAB, Firebloom, GuardedMemcpy, CastGuard, attack surface reduction
  Safe programming languages: Rust, Swift

  MTE/CHERI play pretty nicely - they help ensure that whatever bugs we have in these areas are killed at their root cause… MSR, MSRC and Azure Silicon pushed for… scaling CHERI down to RISC-V32E, the smallest core RISC-V specification.
Microsoft Research open-sourced a hardware/software stack for CHERI in IoT devices, https://msrc.microsoft.com/blog/2023/02/first-steps-in-cheri...

  CHERI-based microcontroller that aims to… get very strong security guarantees if we are willing to co-design the instruction set architecture (ISA), the application binary interface (ABI), isolation model, and the core parts of the software stack… our microcontroller achieves the following security properties:

  Deterministic mitigation for spatial safety (using CHERI-ISA capabilities).
  Deterministic mitigation for heap and cross-compartment stack temporal safety (using a load barrier, zeroing, revocation, and a one-bit information flow control scheme).
  Fine-grained compartmentalization (using additional CHERI-ISA features and a tiny monitor).
David Chisnall, U of Cambridge, https://lobste.rs/s/gnjx2n/c_can_be_memory_safe#c_9ohzku via https://eclypsium.com/blog/a-faster-path-to-memory-safety-ch...

> There are around 13 billion lines of open source C and C++, which end up in various TCBs. This number gets even bigger when you include proprietary code… if we all stopped writing C/C++ code now and every software engineer focused on rewriting legacy code in safe languages (and on the assumption that everything can be written in safe languages) then it would take 5-10 to replace everything and we’d likely see a lot of logic bugs because we’d be replacing old well-tested code with new code that would need different algorithms and data structures to fit with allowable idioms in safe languages.

If we didn’t do the rewriting thing and just stopped writing code in C/C++, then at normal code replacement rates, our TCBs would be entirely safe in around 50 years. If we don’t all agree to stop writing C/C++, it’s at least 100 years.

In contrast, if the major CPU vendors shipped CHERI CPUs in five years, most machines (and all high-value ones) would have memory safety within 15 years of today, without needing programmers to change their behaviour.

pjmlp
5 replies
1d10h

There is also Solaris SPARC ADI, that most folks keep forgeting, because Oracle and the state of Solaris SPARC, unfortunely.

fanf2
4 replies
1d6h

That’s comparable to MTE, and much weaker than CHERI.

pjmlp
3 replies
1d6h

CHERI is great, but until it becomes a widespread product and not ARM Morello test board, or current RISC-V prototype, anything else in production is better than nothing.

yjftsjthsd-h
2 replies
1d4h

Does SPARC count as being in production anymore?

pjmlp
0 replies
1d2h

It definitely counts, it is available for anyone that still wants to buy one.

pigeons
0 replies
1d3h

Its certainly available though.

hedora
1 replies
1d3h

Since this was an off-CPU hardware bug, I don't see how CHERI would help.

Anyway, the last time I looked into it, CHERI wasn't sound: It was still possible to write memory bugs on top of it. Have they fixed that yet?

sweetjuly
0 replies
14h58m

Yes and no. CHERI provides bounds safety but not lifetime safety. If you use capability enhanced garbage collection you can have both, but obviously bolting garbage collection on top of everything you're already doing with manual management (reference counting, etc.) in your existing C/C++ codebase is going to be the worst of both worlds.

Lifetime safety is a much harder problem to solve. Despite CHERI providing ""more robust"" bounds safety, the fact that you get decent lifetime safety for essentially free from MTE is a huge plus. The two technologies aren't incompatible so in theory you could bolt the two together to get MTE lifetime safety and CHERI bounds safety, but that would likely waste a ton of memory.

zdragnar
0 replies
1d3h

then it would take 5-10 to replace everything

If we're talking years, that seems wildly optimistic. I imagine the bike-shedding alone would take that long.

rightbyte
0 replies
1d7h

The root problem seems to be that the user is executing malicious code and abuse some MMU hash collision.

The exploit can probably be written in most languages, including Rust.

gchadwick
0 replies
1d8h

For anyone interested in CHERI for embedded/IoT and other similar use cases lowRISC (whom I work for) are building a couple of FPGA based evaluation platforms for CHERIoT (The Microsoft created CHERI variant referred to above): https://www.sunburst-project.org/

The first is the Sonata system: https://github.com/lowRISC/sonata-system. This comprises a dedicated PCB with an FPGA along with various peripherals and headers. The PCB design is done and will be available through Mouser (plus it's open source including the board layout so you can assemble your own if you like). We're currently working on the RTL for the FPGA. When complete you'll have a complete CHERIoT based microcontroller like system with documentation and tooling.

Additionally we're building the Symphony system, which combines Sonata with the OpenTitan Earl Grey root of trust: https://github.com/lowRISC/symphony-system

saagarjha
7 replies
1d5h

The big thing here is that the GPU has historically been a pain point for Android, because it has extreme access to the AP in ways that basically sidestep any mitigation that you put in its way. Any bugs in the driver's mapping code (and there have been many) end up giving very powerful primitives, and this fact has repeatedly been used in in-the-wild exploits. Unfortunately, I don't think much is going to change here until this gets rearchitected.

monocasa
4 replies
1d2h

IMO, what needs to happen is that half-assed mobile GPUs need to stop including their own MMU, and use a standard IO-MMU.

my123
3 replies
8h14m

A number of GPUs use a standard Arm SMMU instead of an IOMMU already.

The problem with those GPUs in general is driver issues, the hardware is fine.

matheusmoreira
1 replies
5h43m

Is the GPU driver closed source?

monocasa
0 replies
1h33m

It is in this case.

monocasa
0 replies
1h44m

A number of GPUs use a standard Arm SMMU instead of an IOMMU already.

Yes, I'm talking about using cores like an ARM SMMU (which is an IO-MMU). Perhaps some GPUs do, but many (most?) don't including the Mali-G710 in this article that's currently shipping in the Pixel 8.

The problem with those GPUs in general is driver issues, the hardware is fine.

Exactly. I want them to stop writing bespoke kernel code manually fiddling with some custom page table format that gives physical memory read/write primitives when they get it wrong.

pxeger1
1 replies
1d3h

What does AP mean here?

mkopec
0 replies
1d3h

Application Processor, i.e. the main processor

chx
7 replies
1d5h

I am surprised no one introduced yet a CPU and phone which has little if any GPU and called it a business phone. The obvious advantages include security, cost, power consumption.

pvg
6 replies
1d5h

the obvious disadvantage is no high-dpi touchscreen so you're back to a Blackberry or Palm Treo, things that were sold as business phones.

chx
5 replies
1d4h

And that requires a powerful GPU? I thought a much much simpler 2D accelerator in the style of the S3 911 of yesteryear would be enough.

TillE
4 replies
21h2m

Swipe up from the bottom of your iPhone. Oops, you're suddenly doing 3D transformations.

There are dozens of UI effects which rely on the GPU, and there's just no such thing as a 2D GPU these days, it makes no sense unless you're building a retro console or something.

littlestymaar
2 replies
18h46m

There's quite a step between “you can't have fancy UI animations” and “you're back to BlackBerry” though…

pvg
1 replies
14h52m

Things like inertial scrolling are not 'fancy UI animations', they're core components of a touch ui. Take out the touch UI and you're back to something like a nicer Treo.

littlestymaar
0 replies
10h59m

Anyone who has an eInk device (where such animations are impossible due to the refresh rate of the screen) can tell you that it's still fully usable and has nothing to do with getting back to BlackBerry or Treo.

It looks less nice and is limited in some ways, but for business needs to does the job perfectly.

Dylan16807
0 replies
20h48m

Swipe up from the bottom of your iPhone. Oops, you're suddenly doing 3D transformations.

So don't do that exact effect? This is a pretty weak objection.

there's just no such thing as a 2D GPU these days, it makes no sense

This might be stronger but I'm not an expert on pixel pushing.

sylware
2 replies
1d7h

hardware is _that_ bad?? holy...

fanf2
0 replies
1d6h

This is a bug in the driver that runs on the CPU.

ahartmetz
0 replies
1d4h

GPU hardware is crawling with bugs. Hardware is only re-spun for things that cannot be worked around in the driver at an acceptable cost. That approach is possible because GPUs do not allow relatively direct hardware access like CPUs do.

Dudhbbh3343
2 replies
1d14h

Would this affect GrapheneOS installs as well prior to the March update?

simcop2387
0 replies
1d6h

given that this is related to a hardware-ish problem (maybe firmware inside it?) in the GPU I'd bet it even affects it after the march update which was related to the bluetooth stack.

EDIT: Ignore me, I was confusing that with the recent blog post they had about finding an issue with MTE applying to all system apps too. Looks like GrapheneOS should have this as of their 2024030600 release because it brings in the "full 2024-03-05 security patch level"

devit
0 replies
1d4h

One of the main goals of GrapheneOS is to release security updates as soon as possible, so if it's patched upstream GrapheneOS almost surely includes the patch.

Sometimes they even adopt pre-release AOSP security patch levels or backport security fixes from unreleased AOSP or kernel sources.

menaerus
0 replies
1d7h

What is interesting about this vulnerability is that it is a logic bug in the memory management unit of the Arm Mali GPU and it is capable of bypassing Memory Tagging Extension (MTE)

The rest of the article appears to be describing that a bug is actually caused by a race condition and use-after-free is simply a consequence of it.