A memory safe linux kernel would be a fairly incredible thing. If you could snap your fingers and have it, the wins would be huge.
Consider that right now a docker container can't be relied upon to contain arbitrary malware, exactly because the Linux kernel has so many security issues and they're exposed to containers. The reason why a VM like Firecracker is so much safer is that it removes the kernel as the primary security boundary.
Imagine if containers were actually vm-level safe? The performance and operational simplicity of a container with the security of a VM.
I'm not saying this is practical, at this point the C version of Linux is here to stay for quite a while and I think, if anything, Fuschia is the most likely successor (and is unlikely to give us the memory safety that a Rust kernel would). But damn, if Linux had been built with safety in mind security would be a lot simpler. Being able to trust the kernel would be so nice.
edit: OK OK. Yeesh. I meant this to be a hypothetical, I got annoyed at so many of the replies, and this has spiraled. I'm signing off.
I apologize if I was rude! Not a fun start to the morning.
Memory safety isn’t why containers are considered insufficient as a security boundary. It’s exposing essentially the entire Linux feature surface, and the ability to easily interact with the host/other containers that makes them unsafe by themselves. What you’re saying about VMs vs containers makes no sense to me. VMs are used to sandbox containers. You still need to sandbox containers if your kernel is written in rust
Even just considering Linux security itself: there are so, so many ways OS security can break besides a slight (you’re going to have to use unsafe a whole lot) increase in memory safety
The culture around memory safe languages is a positive improvement for programmer zeitgeist. Man though the overreach all the way to "always safe forever" needs to be checked.
Can you expound in this some? I am not fully grasping your point. Are you saying "building safe by default" is a bad thing or assuming "safe forever" is a bad thing. Or are you saying something entirely different?
I said "the idea of using memory safe languages is great!" And "using memory safe languages does not eliminate attack surface". (It's pre coffee here so I appreciate your probe)
I meant that it's over-reach to say it's completely trustworthy just bc it's written in a GC/borrow checked language.
The premise of my post was "imagine a memory safe kernel". I repeatedly use the word "imagine".
The disagreement is that you wrote "imagine a memory safe kernel" but appear to have meant "imagine a kernel with zero vulnerabilities of any kind", and those things are not equivalent.
I expect it's likely more of "memory safety in a language doesn't make it _safe_, it makes it less vulnerable". It removes _some_ issues, in the same way that a language with static types removes some ways a program can be wrong, it doesn't make it correct.
The problem is the word "safe," which is inherently ambiguous. Safe from what? A better term would be "correct," because at least that implies there is some spec to which the developer expects the program to conform (assuming the spec itself is "correct" and devoid of design flaws).
Just the other day they were full kum ba yah over a holiday-time-released feature that's likely going to greatly increase the likelihood of building race conditions and deadlocks.
https://news.ycombinator.com/item?id=38721039
This is such a bizarre take on the stabilization of async that I can't even tell if you are being serious or just hate Rust.
I know it's the curmudgeon NIMBY in me, and I love Rust and use it daily, but it's starting to feel like the worst of the JS crowd scribbled in grandpa's ANSI C books just to make him mad.
I am super happy with most features, but demand is demand and people demand runtimes. As a chimera it perfectly represents the modern world.
To be clear you're talking about async fn and impl trait in return position in traits? If so, how does that impact the likelihood one way or the other of race conditions or deadlocks?
How do you figure?
Is it just because it makes async possible?
JS and Rust are memory safe languages with a culture of pulling in hundreds if not thousands of dependencies. So unfortunately, in terms of culture, at least those languages are not Pareto improvements.
Also a lot of "in Rust" rewrites include a substantial amount of unsafe FFI calls to the original C libraries...
serious question: how much additional safety do you get over best practices and tooling in modern c++?
It's not possible for me to say.
Clearly you can only do worse in Rust than you'd have with perfect C. But what's that?
The question is: what is the expected loss (time, bugs, exploits that lead to crashes or injury or death or financial catastrophe) with Rust vs other languages.
Unfortunately that's not the conversation we have. We instead have absolutism re managed memory, which does account for about half of known security flaws that have been discovered and patched. Removing half of bugs in one fell swoop sounds amazing.
It's not that we can't remove those bugs other ways. Maybe modern c++ can cut bugs in half at compile time too. But Rust seems nicer to play with and doesn't require much more than what it has out of the box. Also it's shiny and new.
Given that Rust is making its way into platforms that Cpp struggled into, it's potentially moot. I sincerely doubt Linux will accept Cpp, but we're on the cusp of Rust in Linux.
If you find this amazing, perhaps you should take a look at seL4, which has formal proofs of correctness, going all the way down to the generated assembly code still satisfying the requirements.
It also has a much better overall architecture, the best currently available: A third generation microkernel multiserver system.
It provides a protected (with proof of isolation) RTOS with hard realtime, proof of worst case timing as well as mixed criticality support. No other system can currently make such claims.
I wish L4 had taken off for general purpose computing. That and Plan9 are things I'd really like to try out but I don't have space to fit operating systems in amongst the other projects. They both strike me as having the Unix nature, either "everything is messages in userspace processes" or "everything is a file."
I don't think I've ever seen an argument for why "everything is a file" is a desirable thing. It seems like a kitchen where "everything is a bowl". Fine when you want to eat cereal, mix a cake, do some handwashing up, store some fruit; tolerable when you want to bake a cake in the oven. Intolerable when you've got a carrot in a bowl-shaped chopping board and you've got to peel and cut it using a bowl.
Why in principle should everything we do on computers behave and beshaped like a file?
the file is inconsequential. it could be any other universal abstraction, e.g. HTTP POST.
it's just something that every program running on a computer knows how to do, so why bother with special APIs you have to link against if you can just write to a file? (note you can still develop those layers if you wish, but you can also write a device driver in sh if you wish, because why not?)
Bad abstractions are notoriously problematic, and no abstraction is fit for every purpose.
So everything-is-a-file is a bit like REST - restrict the verbs, and move the complexity that causes into the addressing scheme and the clients.
I don't think the metaphor applies because kitchen utensils' forms are dictated are their purpose, but software interfaces are abstractions.
A fairer analogy would be if everything in the kitchen was bowl-shaped, but you could do bowl-like actions and get non-bowl behavior. Drop the carrot in the peelbowl and it is peeled, drop the carrot in the knifebowl and it is diced, drop the cubes in the stovebowl and they are cooked. Every manipulation is placing things in bowls. Every bowl is the same shape which means you can store them however is intuitive to you (instead of by shape).
A file system is a tree of named objects. These objects are seamlessly part of the OS and served by a program or kernel driver called a file server which can then be shared over a network. Security is then handled by file permissions so authentication is native through the system and not bolted on. It fits together very well and removes so much pointless code and mechanisms.
A great example is a old uni demo where a system was built to control X10 outlets and switches (early home automation gear). Each device was a file in a directory tree that represents a building with sub directories for floors and rooms - e.g. 'cat /mnt/admin-building/2fl/rm201/lights' would return 'on' or 'off' (maybe it's a dimmer and its 0-255, or an r,g,b, value or w/e, sky's the limit, just put the logic in the fs). To change the state of the lights you just echo off >/mnt/admin-building/2fl/rm201/lights.
Now you can make a script that shuts all the lights off in the building by walking directories and looking for "lights" then writing off to those files. Maybe it's a stage, all your lights are on DMX and you like the current settings so then you 'tar -c /mnt/auditorium/stage/lighting|gzip >student_orientation_lighting_preset.tar.gz' and do the reverse over-writing all the archived settings back to their respective files. You could even serve those files over smb to a windows machine and turn lights on and off using notepad or whatever. And the file data doesn't have to be text, it could be binary too. It's just that for some things like the state of lights, temperature or w/e can easily be stored and retrieved as human readable text.
That is the beauty and power of 9p - its removes protocol barriers and hands you named objects seamlessly integrated into your OS which you can read/write using regular every day tools. It's a shame so many people can't grasp it.
Eh, seL4 has a suite of tools that turn their pile of C and ASM into an obscure intermediate language that has some formally verifiable properties. IMO this is just shifting the compiler problem somewhere else, into a dark corner where no one is looking.
I highly doubt that it will ever have a practical use beyond teaching kids in the classroom that formal verification is fun, and maybe nerd-sniping some defense weirdos to win some obscene DOD contracts.
Some day I would love to read a report where some criminal got somewhere they shouldn't, and the fact that they landed on an seL4 system stopped them in their tracks. If something like that exists, let me know, but until then I'm putting my chips on technologies that are well known to be battle tested in the field. Maestro seems a lot more promising in that regard.
Uh, perhaps take a look at the seL4 foundation's members[0], who are using it in the wild in very serious scenarios.
You can learn more about them as well as ongoing development work in seL4 Summit[1].
0. https://sel4.systems/Foundation/Membership/home.pml
1. https://sel4.systems/Foundation/Summit/home.pml
seL4 actually has an end-to-end proof, which proves that the final compiled binary matches up with the formal specification. There are not many places that bugs can be shifted---probably the largest one at this point is the CPU itself.
See here[0] one of Gernot Heiser's comments (part of the seL4 foundation) talking about how "there are seL4-based devices in regular use in several defence forces. And it's being built in to various products, including civilian, eg critical infrastructure protection".
There is also an interesting case study[1][2] where seL4 was shown to prevent malicious access to a drone. Using seL4 doesn't necessarily make an entire system safe but for high security applications you have to build from the ground up and having a formally proven kernel is the first step in doing that.
I have been fortunate enough to play a small role in developing some stuff to be used with seL4 and it's obvious that the team are passionate about what they've got and I wish them the best of luck
0 - https://news.ycombinator.com/item?id=25552222 1 - https://www.youtube.com/watch?v=TH0tDGk19_c 2 - http://loonwerks.com/publications/pdf/Steal-This-Drone-READM...
Ok, but can I run a desktop on it? Not knocking seL4, it's damn amazing, but it's not exactly a Linux killer.
Genode runs on sel4 and has a desktop gui.
I think it's possible to run Genode[1] as a desktop on top of seL4 (Genode supports different kernels). However, I'm struggling to find a tutorial to get that up and running.
[1] https://en.wikipedia.org/wiki/Genode
"to not contain"?
Edit to contain (ahem!) the downvotes: I was genuinely confused by the ambiguous use of "contain", but comments below cleared that up.
They're using 'contain' to mean 'keep isolated'. If you put some malware in a docker container, you can't rely on docker to keep the rest of your system safe.
Does the fact that docker runs as a root have something to do with it?
Yes, but even rootless containers rely on user namespaces, which are a recurring source of privilege escalation vulnerabilities in Linux.
The issue of root vs rootless is unrelated to escaping the container. User namespaces lead to privescs because attackers who can enter a namespace and become the root within that namespace have access to kernel functionality that is far less hardened (because upstream has never considered root->kernel to be a privesc and, of course, most people focus on unprivileged user -> kernel privesc). The daemon running as root doesn't change anything there
No, it's because the malware would have direct access to the (privileged) Linux Kernel via system calls.
Got it, thanks.
a docker image can’t be relied on to not contain malware and a docker container can’t be relied on to contain malware.
I largely agree, but this seems quite unfair to Linux.
For its time, it was built with safety in mind, we can't hold it to a standard that wasn't prevalent until ~20 years later
I don't think it's that unfair, but I don't want to get into a whole thing about it, people get really upset about criticisms of the Linux kernel in my experience and I'm not looking to start my morning off with that conversation.
We can agree that C was definitely the language to be doing these things in and I don't blame Linus for choosing it.
My point wasn't to shit on Linux for its decisions, it was to think about a hypothetical world where safety built in from the start.
Apollo Computer is notable for having had a UNIX compatible OS, written in a Pascal dialect, as was Mac OS (migrating to C++ later on).
Clarification: The Apollo computer series by Apollo Domain is meant here, not the Apollo space mission, just to be sure.
The Pascal-based operating system is Aegis (later DomainOS), which - with UNIX - is a joint precursor of HP-UX: https://en.wikipedia.org/wiki/Domain/OS#AEGIS .
Don't worry, in 30 years people will write the same thing about using Rust, assuming that Rust will still be in use 30 years from now.
Yeah, how naive we were building operating systems without linear and dependent types. Savages.
why not ada? Sure rust didn't exist when linux was first being built, but ada did and had a number of memory safety features. (not the same as rust's, but still better than C)
*30 years...
Yes, we're that old.
If you don't run docker as root, it's fairly ok for normal software. Kernel memory safety is not the main issue with container escapes. Even with memory safety, you can have logical bugs that result in privilege escalation scenarios. Is docker itself in Rust?
Memory safety is not a magic bullet, the Linux kernel isn't exactly trivial to exploit either these days, although still not as hardened as windows (if you don't consider stuff like win32k.sys font parsing kernel space since NT is hybrid after all) in my humble opinion.
I think it was, given the resources available in 1993. But if Trovalds caved in and allowed a mini-kernel or NT like hybrid design instead if hard-core monolithic unix, it would have been a game changer. In 1995, Ada was well accepted mainstream, it was memory safe and even Rust devs learned a lot from it. It just wasn't fun to use for the devs (on purpose, so devs were forced to do tedious stuff to prevent even non-memory bugs). But since it is developed by volunteers, they used what attracts the most volunteers.
The main benefit of Rust is not it's safety but its popularity. Ada has been running on missiles, missile defense, subways, aircraft, etc... for a long time and it even has a formally verified subset (SPARK).
In my opinion, even today Ada is a better suit technically for the kernel than Rust because it is time tested and version stable and it would open up the possibility easily formal-verifying parts of the kernel.
Given how widely used Linux is, it would require a massive backing fund to pay devs to write something not so fun like Ada though.
If I remember correctly, Ada was much slower compared to C. Stuff like boundary checks on arrays has a cost.
Runtime checks can be disabled in Ada. They’re useful for debug builds though!
But that elimites purpose for Ada. Rust has better type system to deal with this.
I thought both Ada and rust have good compile time checks for memory safety that eliminates the need for run time checks?
I disagree, I think it is the primary issue. Logical bugs are far less common.
It's not that hard, though of course exploitation hasn't been trivial since the 90s. We did it at least a few times at my company: https://web.archive.org/web/20221130205026/graplsecurity.com...
Chompie certainly worked hard (and is one of if not the most talented exploit devs I've met), but we're talking about a single exploit developer developing highly reliable exploits in a matter of weeks.
A single talented developer taking weeks sounds about right, that's what I meant by difficult but also you have vulns that never get a cve issued or exploit developed because of kernel specific hardening.
As for container escapes, there are tools like deepce:
https://github.com/stealthcopter/deepce
I can't honestly say I've heard of real life container escapes by attackers or pentesters using kernel exploits. Although I am sure it happens and there are people who won't update the host's kernel to patch it.
Container vulnerabilities are rarely related to memory bugs. Most vulnerabilities in container deployments are due to logical bugs, misconfiguration, etc. C-level memory stuff is absolutely NOT the reason why virtualization is safer, and not something Rust would greatly improve. On the opposite end of the spectrum, you have hardware vulnerabilities that Rust also wouldn't help you with.
Rust is a good language and I like using it, but there's a lot of magical thinking around the word "safe". Rust's definition of what "safe" means is fairly narrow, and while the things it fixes are big wins, the majority of CVEs I've seen in my career are not things that Rust would have prevented.
The easiest way to escape a container is through exploitation of the Linux kernel via a memory safety issue.
Yes it is. The point of a VM is that you can remove the kernel as a trust boundary because the kernel is not capable of enforcing that boundary because of memory safety issues.
There's no magical thinking on my part. I'm quite familiar with exploitation of the Linux kernel, container security, and VM security.
I don't know what your point is here. Do you spend a lot of time in your career thinking about hardening your containers against kernel CVEs?
Yes, I literally led a team of people at a FAANG doing this.
You're saying the easiest way to escape a container is a vulnerability normally priced over 1 million USD. I'm saying the easiest way is through one of the million side channels.
OK, I apologize if I was coming off as glib or condescending. I will take your input into consideration.
I'm not looking to argue, I was just annoyed that I was getting so many of the same comments. It's too early for all of this negativity.
If you want to discuss this via an avenue that is not HN I would be open to it, I'm not looking to make enemies here, I'd rather have an earnest conversation with a colleague rather than jumping down their throats because they caught me in the middle of an annoying conversation.
Same, re-reading my replies I realize I phrased things in a stand-offish way. Sorry about that.
Thanks for being willing to take a step back. I think possibly we are talking about two different things. IME most instances of exploitation are due to much more rudimentary vulnerabilities.
My bias is that, while I did work on mitigations for stuff like Meltdown and Rowhammer, most "code level" memory vulnerabilities were easier to just patch, than to involve my team, so I probably under-estimate their number.
Regardless, if I were building computation-as-a-service, 4 types of vulnerability would make me worry about letting multiple containers share a machine:
1. Configuration bugs. It's really easy to give them access to a capability, a mount or some other resource they can use to escape.
2. Kernel bugs in the filesystems, scheduling, virtual memory management (which is different from the C memory model). It's a big surface. As you said, better use a VM.
3. The kernel has straight up vulnerabilities, often related to memory management (use after free, copy too much memory, etc.)
4. The underlying platform has bugs. Some cloud providers don't properly erase physical RAM. x86 doesn't always zero registers. Etc.
Most of my experience is in 1, a bit of 2 and mitigation work on 4.
The reason I think we're talking past each other a bit is that you're generating CVEs, while I mostly worked on mitigating and detecting/investigating attacks. In my mind, the attacks that are dirt cheap and I see every week are the biggest problem, but if we fix all of those, and the underlying platform gets better, I see that it'll boil down to trusting the kernel doesn't have vulnerabilities.
You two seem to have figured this out, but as far as I can tell, the disconnect here is that the vast majority of security issues related to the separation difference between VMs and containers isn't due to container "escapes" at all. It's due to the defaults of the application you're running assuming it's the only software on the system and it can run with any and all privileges. Lazy developers don't give you containers that work without running as privileged and demand from users to use that application after migrating from a primarily VM-based IT infrastructure to a primarily container-based one is too great to simply tell them no, and if it's free software, you have no ability to tell the developers to do anything differently.
Discussions on Hacker News understandably lean toward the concerns of application developers and especially greenfield projects run by startups who can take complete control of the full stack if they want to. But running applications using resources partially shared by other applications encompasses a hell of a lot of other scenarios. Think some bank or military department that has to self-host ADP, Rocket Chat, a Git server, M365, and whatever other hundreds of company-wide collaboration tooling the employees need. Do you do it on VMs or containers? If the application in question inherently assumes it is running on its own server as root, the answer to that question doesn't really depend on kernel CVEs potentially allowing for container escapes.
If we're just reasoning from first principles, applications in containers on the same host OS share more of a common attack surface than applications in VMs on the same physical host, and those share more than applications running on separate servers in the same rack, which in turn share more than servers in separate racks, which in turn share more than servers in separate data centers. The potential layers of separation can be nearly endless, but there is a natural hierachy on which containers will always sit below VMs, regardless of the kernel providing the containers.
Even putting that aside, if we're going to frame a choice here, these are not exactly kernels on equal footing. A kernel written in C that has existed for nearly four decades and is used on probably trillions of devices by everything from hobbyists to militaries to Fortune 50 companies to hospitals to physics labs is very likely to be safer on any realistic scale compared to a kernel written in Rust by one college student in his spare time that is tested on Qemu. The developer himself tells you don't use this in production.
I think the annoyance here is it often feels when reading Hacker News that a lot of users treat static typing and borrow checking like it's magic and automatically guarantees a security advantage. Imagine we lived in the Marvel Multiverse and vibranium was real. It might provide a substrate with which it is possible to create stronger structures than real metals, but does that mean you'd rather fly in an aircraft constructed by Riri Williams when she is 17 that she built in her parents' garage or would you rather trust Boeing and whatever plain-ass alloy with all its physical flaws they put into a 747? Maybe it's a bad analogy because vibranium pretty much is magic but there is no magic in the real world.
I like Rust and work in it fulltime, and like its memory-safety aspects but I think it's a bit of a stretch to be able to claim memory safety guarantees of any kind when we're talking about low-level code like a kernel.
Because in reality, the kernel will have to do all sorts of "unsafe" things even just to provide for basic memory management services for itself and applications, or for interacting with hardware.
You can confine these bits to verified and well-tested parts of the code, but they're still there. And because we're human beings, they will inevitably have bugs that get exploited.
TLDR being written in Rust is an improvement but no guarantee of lack of memory safety issues. It's all how you hold the tool.
Yep. And tooling to secure C improved a lot in recent years. The Address-Sanitizer is a big improvement. I’m looking forward that C++ improves as language itself because it was already improved (smart-pointers, RAII, a lot of edge cases regarding sequencing) and they seem to be willing to modify the actual language. This opens a path for project to migrate from C to C++. A language inherits a lot from its introduction (strength/weak) but also changes a lot.
Every interaction with hardware (disk, USB, TCP/IP, graphics…) need to do execute unsafe code. And we have firmware. Firmware is probably a underestimate issue for a long time :(
Aside from errors caused by undetected undefined behavior all kinds of errors remain possible. Especially logic errors. Which are probably the biggest surface?
Example:
https://neilmadden.blog/2022/04/19/psychic-signatures-in-jav...
Honestly I struggle to see the point in rewriting C++ code with Java just for the sake of doing it. Probably improving test coverage for the C++ implementation would have been less work and didn’t created the security issue first.
That being said. I want to see an #unsafe and #safe in C++. I want some hard check that the code is executing only defined. And modern compilers can do it for Rust. Same applies to machine-dependent/implementation defined code which isn’t undefined but also can be dangerous.
One of the inspirations for Rust, as I recall, was Cyclone: https://cyclone.thelanguage.org/
Which was/is a "safe" dialect of C; basically C extended with a bunch of the stuff that made it into Rust (algebraic datatypes, pattern matching, etc.) Though its model of safety is not the borrow checker model that Rust has.
Always felt to me like something like Cyclone would be the natural direction for OS development to head in, as it fits better with existing codebases and skillsets.
In any case, I'm happy to see this stuff happening in Rust.
I've responded to the central point of "there will still be 'unsafe'" here: https://news.ycombinator.com/item?id=38853040
More like memory safer. A kernel necessarily has a lot of unsafe parts. See: https://github.com/search?q=repo%3Allenotre%2Fmaestro+unsafe...
Rust is not a magic bullet, it just reduces the attack surface by isolating the unsafe parts. Another way to reduce the attack surface would be to use a microkernel architecture, it has a cost though.
You're not really illustrating your point well with the link. If you look through the examples, they're mostly trivial and there's no clear way to eliminate them. Some reads/writes will interact with hardware and the software concepts of memory safety will never reach there because hardware does not operate at that level.
Check a few of the results. They range from single assembler line (interrupts or special registers), array buffer reads from hardware or special areas, and rare sections that have comments about the purpose of using unsafe in that place.
Those results really aren't "look how much unsafe code there is", but rather "look how few, well isolated sections there are that actually need to be marked unsafe". It's really not "a lot" - 86 cases across memory mapping, allocator, task switching, IO, filesystem and object loader is surprisingly few. (Actually even 86 is overestimated because for example inb is unsafe and blocks using it are unsafe so they're double-counted)
Practically speaking, even with `unsafe` the exploitability of rust programs is extremely difficult. With modern mitigation techniques it is required that you be able to chain multiple vulnerabilities and primitives together in order to actually reliably exploit software.
Bug density from `unsafe` is so low in Rust programs that it's just radically more difficult.
My company (not me, Chompie did the work, all credit to her for it) took a known bug, which was super high potential (write arbitrary data to the host's memory), and found it extremely difficult to exploit (we were unable to): https://chompie.rip/Blog+Posts/Attacking+Firecracker+-+AWS'+...
Ultimately there were guard pages where we wanted to write and it would have taken other vulnerabilities to actually get a working POC.
Exploitation of Rust programs is just flat out really, really hard.
While I agree, do note that a significant portion of a kernel is internal logic that can be made much safer.
As far as I know, the order of magnitudes of container security flaws from memory safety is the same as security flaws coming from namespace logic issues, and you'll have to top that with hardware issues. I'm sorry but rust or not, there will never be a world where you can 100% trust running a malware.
Well being micro kernel make it easier to migrate bits by bits, and not care about ABI
Memory safety issues are very common in the kernel, namespace logic issues are not.
I'm replying simply because you're getting defensive with your edits, but you're missing a few important points, IMO.
First of all, the comment I quoted falls straight into the category of if only we knew back then what we know now.
What does it even mean "built with safety in mind" for a project like Linux?
No one could predict that Linux (which was born as a kernel) would run on billions of devices that people keep in their pockets and constantly use for everything, from booking a table at the restaurant to checking the weather, from chatting with other people to accessing their bank accounts. And that said banks would use it too.
Literally no one.
Computers were barely connected back then, internet wasn't even a thing outside of research centers and universities.
So, what kind of safety should he have planned for?
And to safeguard what from what and who from who?
Secondly, Linux was born as a collaborative effort to write something already old: a monolithic Unix like kernel, nothing fancy, nothing new, nothing experimental, just plain old established stuff for Linus to learn how that kernel thing worked.
The most important thing about it was to be a collaborative effort so he used a language that he and many others already knew.
Did Linus use something more suited for stronger safety guarantees, such as Ada (someone else already mentioned it), Linux wouldn't be the huge success it is now and we would not be having this conversation.
Lastly, the strongest Linux safety guarantee is IMO the GPL license, that conveniently all these Rust rewrites are turning into more permissive licenses. Which steers away from what Linux was, and still largely is, a community effort based on the work of thousands of volunteers.
There is nothing about permissive licenses which prevents the project from being such a community effort. In fact, most of the Rust ecosystem is a community effort just like you describe, while most projects have permissive licenses. There's no issue here.
I’m interested of reading more. Where can I find the blog posts?
https://web.archive.org/web/20221130205026/graplsecurity.com...
The company no longer exists so you can find at least some of them mirrored here:
https://chompie.rip/Blog+Posts/
The Firecracker, io_uring, and ebpf exploitation posts.
Chompie was my employee and was the one who did the exploitation, though I'd like to think I was at least a helpful rubber duck, and I did also decide on which kernel features we would be exploiting, if I may pat myself on the back ever so gently.
Hasn't Kata containers solved this probl: https://github.com/kata-containers/kata-containers ?
Kata is an attempt at solving this problem. There are problems:
1. If using firecracker then you can't do nested virtualization
2. You still have the "os in an os" problem, which can make it operationally more complex
But Kata is a great project.
Isn't gVisor kind of this as well?
"gVisor is an application kernel for containers. It limits the host kernel surface accessible to the application while still giving the application access to all the features it expects. Unlike most kernels, gVisor does not assume or require a fixed set of physical resources; instead, it leverages existing host kernel functionality and runs as a normal process. In other words, gVisor implements Linux by way of Linux."
https://github.com/google/gvisor
Has there ever been any examples of malware/viruses jumping around through levels like this?
I'm honestly interested to know, because it sounds like a huge deal here, but in my laymans ears very cool and sci fi!
I didn't know Firecracker existed, that's really awesome. Looks to be in Rust as well. I'll have to look at how this differs from the approach that Docker uses, my understanding is that Docker uses cgroups and some other built-in Linux features.
Containers became popular because it doesn't make much sense to be running full blown virtual machines just to run simple single process services.
You can lock down the allowed kernel syscalls with seccomp and go further with confining the processes with apparmor. Docker has good enough defaults for these 2 security approaches.
Full fat VMs are not immune to malware infection (the impact still applies to the permitted attack surface). Might not be able to easily escape to host but the risk is still there.
No, Docker container was never meant for that. Never use containers with untrustable binary. There is Vagrant and others for that.