HN comments for: Gaining kernel code execution on an MTE-enabled Pixel 8

Retr0id

14 replies

1d16h

2024-03-19 02:00:51 UTC

This is great research and a great write-up, but I'm a little (pleasantly) surprised to see it on GitHub's blog.

Does anyone know what their "business reason" for doing research like this is? (not that a business reason should be needed, but like I said, I'm a bit surprised to see it here)

mmsc

7 replies

1d14h

2024-03-19 03:27:37 UTC

Unlike other departments, security teams often don’t have anything to do so this research is a good use of free time.

2 replies

1d14h

2024-03-19 04:12:48 UTC

What is this comment? Github security research lab solely focuses on security research and publishes some of the best research in the industry.

Man Yue Mo is a security researcher who finds some of the most complex and impactful bugs in the industry like crbug.com/40065473

mrb

1 replies

1d12h

2024-03-19 05:29:12 UTC

Seeing mmsc's post history, especially computer security related comments, I presume he was just being sarcastic :)

mmsc

0 replies

1d11h

2024-03-19 06:50:10 UTC

Indeed.

Although, it wouldn’t be abnormal for a security team to have free time, and dedicate it to researching an emerging technology whether it directly contributes to the business goals or not. Of course I’m not talking about a security team that is reading log files from their SIEM while sitting in a SOC.

ramon156

1 replies

1d10h

2024-03-19 08:12:06 UTC

Sadly people didn't see the sarcasm in this comment

ambichook

0 replies

16h6m

2024-03-20 02:20:12 UTC

i understand people disliking using tone indicators, especially when they can ruin a joke, but they are really wonderful things that can prevent misunderstandings like this online

ssklash

0 replies

1d14h

2024-03-19 03:52:27 UTC

How many security teams have you been on? Definitely ones with less work than I've been on...

frosting1337

0 replies

1d14h

2024-03-19 04:03:13 UTC

Wow, that's just absolutely incorrect. Ignoring that tons of security teams are actually stupidly busy, this person's specific role at GitHub is security research. GitHub have security products for code security, which he ties into.

infima

1 replies

1d15h

2024-03-19 03:03:48 UTC

This work comes from GitHub's Security Lab https://securitylab.github.com/

richardwhiuk

0 replies

1d8h

2024-03-19 09:45:56 UTC

A little surprising that hasn't been shifted into MSRC, but GitHub operates very independently inside Microsoft.

mdriley

0 replies

1d12h

2024-03-19 05:42:11 UTC

Man Yue Mo worked at Semmle (https://blog.sonatype.com/steps-to-responsible-disclosure) before it was acquired by GitHub (https://github.blog/2019-09-18-github-welcomes-semmle/). That research function has carried on as the GitHub Security Lab.

Semmle built CodeQL, now offered by GitHub (https://docs.github.com/en/code-security/code-scanning/intro...), which GitHub and Microsoft (see https://www.microsoft.com/en-us/security/blog/2023/11/02/ann...) want to associate with "deep security insight".

So they continue to fund this kind of novel security research, for which security practitioners across industry are grateful.

fragmede

0 replies

1d16h

2024-03-19 02:05:28 UTC

They got bought by Microsoft and so have the resources to sponsor research, including of this kind. There’s a GitHub app, and the security of that app is not outside their purview. if an attacker manages to install a lurky app on your phone, they could do stuff as you. if you're someone with GitHub clout, that could be real damaging so it's in their interests to find such vulnerabilities.

devsda

0 replies

1d16h

2024-03-19 02:22:21 UTC

They have hosted action runners for arm too. So, they may have an interest in checking and verifying the security capabilities of arm hardware with MTE for sandboxing.

bmacho

0 replies

1d11h

2024-03-19 06:59:18 UTC

Does anyone know what their "business reason" for doing research like this is? (not that a business reason should be needed, but like I said, I'm a bit surprised to see it here)

I think it's basically basic research [0]. In first order reasoning, github, as a product doesn't really need android security experts. But employing them has some potential long-term benefits.

[0]: https://en.wikipedia.org/wiki/Basic_research

transpute

11 replies

1d14h

2024-03-19 04:17:48 UTC

Probabilistic Arm MTE memory safety is a stepping stone to deterministic CHERI hardware, https://saaramar.github.io/memory_safety_blogpost_2022/ & https://news.ycombinator.com/item?id=39668053

  The right kind of mitigations targets the 1st order primitive; the root cause of the bug.

  Hardware solutions: CHERI (Morello, CheriIoT), MTE
  Software mitigations: kalloc_type+dataPAC, AUTOSLAB, Firebloom, GuardedMemcpy, CastGuard, attack surface reduction
  Safe programming languages: Rust, Swift

  MTE/CHERI play pretty nicely - they help ensure that whatever bugs we have in these areas are killed at their root cause… MSR, MSRC and Azure Silicon pushed for… scaling CHERI down to RISC-V32E, the smallest core RISC-V specification.

Microsoft Research open-sourced a hardware/software stack for CHERI in IoT devices, https://msrc.microsoft.com/blog/2023/02/first-steps-in-cheri...

  CHERI-based microcontroller that aims to… get very strong security guarantees if we are willing to co-design the instruction set architecture (ISA), the application binary interface (ABI), isolation model, and the core parts of the software stack… our microcontroller achieves the following security properties:

  Deterministic mitigation for spatial safety (using CHERI-ISA capabilities).
  Deterministic mitigation for heap and cross-compartment stack temporal safety (using a load barrier, zeroing, revocation, and a one-bit information flow control scheme).
  Fine-grained compartmentalization (using additional CHERI-ISA features and a tiny monitor).

David Chisnall, U of Cambridge, https://lobste.rs/s/gnjx2n/c_can_be_memory_safe#c_9ohzku via https://eclypsium.com/blog/a-faster-path-to-memory-safety-ch...

> There are around 13 billion lines of open source C and C++, which end up in various TCBs. This number gets even bigger when you include proprietary code… if we all stopped writing C/C++ code now and every software engineer focused on rewriting legacy code in safe languages (and on the assumption that everything can be written in safe languages) then it would take 5-10 to replace everything and we’d likely see a lot of logic bugs because we’d be replacing old well-tested code with new code that would need different algorithms and data structures to fit with allowable idioms in safe languages.

If we didn’t do the rewriting thing and just stopped writing code in C/C++, then at normal code replacement rates, our TCBs would be entirely safe in around 50 years. If we don’t all agree to stop writing C/C++, it’s at least 100 years.

In contrast, if the major CPU vendors shipped CHERI CPUs in five years, most machines (and all high-value ones) would have memory safety within 15 years of today, without needing programmers to change their behaviour.

pjmlp

5 replies

1d10h

2024-03-19 07:41:01 UTC

There is also Solaris SPARC ADI, that most folks keep forgeting, because Oracle and the state of Solaris SPARC, unfortunely.

fanf2

4 replies

1d6h

2024-03-19 11:47:49 UTC

That’s comparable to MTE, and much weaker than CHERI.

pjmlp

3 replies

1d6h

2024-03-19 12:06:24 UTC

CHERI is great, but until it becomes a widespread product and not ARM Morello test board, or current RISC-V prototype, anything else in production is better than nothing.

yjftsjthsd-h

2 replies

1d4h

2024-03-19 13:51:57 UTC

Does SPARC count as being in production anymore?

pjmlp

0 replies

1d2h

2024-03-19 15:54:50 UTC

It definitely counts, it is available for anyone that still wants to buy one.

pigeons

0 replies

1d3h

2024-03-19 14:28:06 UTC

Its certainly available though.

hedora

1 replies

1d3h

2024-03-19 14:31:19 UTC

Since this was an off-CPU hardware bug, I don't see how CHERI would help.

Anyway, the last time I looked into it, CHERI wasn't sound: It was still possible to write memory bugs on top of it. Have they fixed that yet?

sweetjuly

0 replies

14h58m

2024-03-20 03:28:11 UTC

Yes and no. CHERI provides bounds safety but not lifetime safety. If you use capability enhanced garbage collection you can have both, but obviously bolting garbage collection on top of everything you're already doing with manual management (reference counting, etc.) in your existing C/C++ codebase is going to be the worst of both worlds.

Lifetime safety is a much harder problem to solve. Despite CHERI providing ""more robust"" bounds safety, the fact that you get decent lifetime safety for essentially free from MTE is a huge plus. The two technologies aren't incompatible so in theory you could bolt the two together to get MTE lifetime safety and CHERI bounds safety, but that would likely waste a ton of memory.

zdragnar

0 replies

1d3h

2024-03-19 14:47:23 UTC

then it would take 5-10 to replace everything

If we're talking years, that seems wildly optimistic. I imagine the bike-shedding alone would take that long.

rightbyte

0 replies

1d7h

2024-03-19 11:14:29 UTC

The root problem seems to be that the user is executing malicious code and abuse some MMU hash collision.

The exploit can probably be written in most languages, including Rust.

gchadwick

0 replies

1d8h

2024-03-19 09:55:43 UTC

For anyone interested in CHERI for embedded/IoT and other similar use cases lowRISC (whom I work for) are building a couple of FPGA based evaluation platforms for CHERIoT (The Microsoft created CHERI variant referred to above): https://www.sunburst-project.org/

The first is the Sonata system: https://github.com/lowRISC/sonata-system. This comprises a dedicated PCB with an FPGA along with various peripherals and headers. The PCB design is done and will be available through Mouser (plus it's open source including the board layout so you can assemble your own if you like). We're currently working on the RTL for the FPGA. When complete you'll have a complete CHERIoT based microcontroller like system with documentation and tooling.

Additionally we're building the Symphony system, which combines Sonata with the OpenTitan Earl Grey root of trust: https://github.com/lowRISC/symphony-system

saagarjha

7 replies

1d5h

2024-03-19 13:18:05 UTC

The big thing here is that the GPU has historically been a pain point for Android, because it has extreme access to the AP in ways that basically sidestep any mitigation that you put in its way. Any bugs in the driver's mapping code (and there have been many) end up giving very powerful primitives, and this fact has repeatedly been used in in-the-wild exploits. Unfortunately, I don't think much is going to change here until this gets rearchitected.

monocasa

4 replies

1d2h

2024-03-19 15:36:36 UTC

IMO, what needs to happen is that half-assed mobile GPUs need to stop including their own MMU, and use a standard IO-MMU.

my123

3 replies

8h14m

2024-03-20 10:12:24 UTC

A number of GPUs use a standard Arm SMMU instead of an IOMMU already.

The problem with those GPUs in general is driver issues, the hardware is fine.

matheusmoreira

1 replies

5h43m

2024-03-20 12:43:17 UTC

Is the GPU driver closed source?

monocasa

0 replies

1h33m

2024-03-20 16:53:13 UTC

It is in this case.

monocasa

0 replies

1h44m

2024-03-20 16:42:32 UTC

A number of GPUs use a standard Arm SMMU instead of an IOMMU already.

Yes, I'm talking about using cores like an ARM SMMU (which is an IO-MMU). Perhaps some GPUs do, but many (most?) don't including the Mali-G710 in this article that's currently shipping in the Pixel 8.

The problem with those GPUs in general is driver issues, the hardware is fine.

Exactly. I want them to stop writing bespoke kernel code manually fiddling with some custom page table format that gives physical memory read/write primitives when they get it wrong.

pxeger1

1 replies

1d3h

2024-03-19 15:00:04 UTC

What does AP mean here?

mkopec

0 replies

1d3h

2024-03-19 15:21:11 UTC

Application Processor, i.e. the main processor

chx

7 replies

1d5h

2024-03-19 12:50:06 UTC

I am surprised no one introduced yet a CPU and phone which has little if any GPU and called it a business phone. The obvious advantages include security, cost, power consumption.

pvg

6 replies

1d5h

2024-03-19 12:53:17 UTC

the obvious disadvantage is no high-dpi touchscreen so you're back to a Blackberry or Palm Treo, things that were sold as business phones.

chx

5 replies

1d4h

2024-03-19 13:55:42 UTC

And that requires a powerful GPU? I thought a much much simpler 2D accelerator in the style of the S3 911 of yesteryear would be enough.

TillE

4 replies

21h2m

2024-03-19 21:24:08 UTC

Swipe up from the bottom of your iPhone. Oops, you're suddenly doing 3D transformations.

There are dozens of UI effects which rely on the GPU, and there's just no such thing as a 2D GPU these days, it makes no sense unless you're building a retro console or something.

littlestymaar

2 replies

18h46m

2024-03-19 23:40:33 UTC

There's quite a step between “you can't have fancy UI animations” and “you're back to BlackBerry” though…

pvg

1 replies

14h52m

2024-03-20 03:34:29 UTC

Things like inertial scrolling are not 'fancy UI animations', they're core components of a touch ui. Take out the touch UI and you're back to something like a nicer Treo.

littlestymaar

0 replies

10h59m

2024-03-20 07:26:34 UTC

Anyone who has an eInk device (where such animations are impossible due to the refresh rate of the screen) can tell you that it's still fully usable and has nothing to do with getting back to BlackBerry or Treo.

It looks less nice and is limited in some ways, but for business needs to does the job perfectly.

Dylan16807

0 replies

20h48m

2024-03-19 21:37:50 UTC

Swipe up from the bottom of your iPhone. Oops, you're suddenly doing 3D transformations.

So don't do that exact effect? This is a pretty weak objection.

there's just no such thing as a 2D GPU these days, it makes no sense

This might be stronger but I'm not an expert on pixel pushing.

sylware

2 replies

1d7h

2024-03-19 10:47:54 UTC

hardware is _that_ bad?? holy...

fanf2

0 replies

1d6h

2024-03-19 11:45:50 UTC

This is a bug in the driver that runs on the CPU.

ahartmetz

0 replies

1d4h

2024-03-19 14:13:13 UTC

GPU hardware is crawling with bugs. Hardware is only re-spun for things that cannot be worked around in the driver at an acceptable cost. That approach is possible because GPUs do not allow relatively direct hardware access like CPUs do.

Dudhbbh3343

2 replies

1d14h

2024-03-19 04:19:27 UTC

Would this affect GrapheneOS installs as well prior to the March update?

simcop2387

0 replies

1d6h

2024-03-19 11:45:29 UTC

given that this is related to a hardware-ish problem (maybe firmware inside it?) in the GPU I'd bet it even affects it after the march update which was related to the bluetooth stack.

EDIT: Ignore me, I was confusing that with the recent blog post they had about finding an issue with MTE applying to all system apps too. Looks like GrapheneOS should have this as of their 2024030600 release because it brings in the "full 2024-03-05 security patch level"

devit

0 replies

1d4h

2024-03-19 14:10:52 UTC

One of the main goals of GrapheneOS is to release security updates as soon as possible, so if it's patched upstream GrapheneOS almost surely includes the patch.

Sometimes they even adopt pre-release AOSP security patch levels or backport security fixes from unreleased AOSP or kernel sources.

menaerus

0 replies

1d7h

2024-03-19 11:20:22 UTC

What is interesting about this vulnerability is that it is a logic bug in the memory management unit of the Arm Mali GPU and it is capable of bypassing Memory Tagging Extension (MTE)

The rest of the article appears to be describing that a bug is actually caused by a race condition and use-after-free is simply a consequence of it.