It can be a lot slower too - but it's most dependent on how the kernel handles console logging. Serial consoles slow it down dramatically, which is why it's so much quicker on your typical GUI enabled setup.
IIUC steps 5-7 in the exploit cause around 2^32 oopses. I don't know much about the Linux kernel - could it perhaps have a limit on the number of oopses before it halts the entire system?
The article explains why it is important to not do that in general, as an oops allows debugging and recovery etc. But 2^32 of them seems suspicious.
To the contrary, I appreciate azakai's enthusiasm for seeking solutions, and also credit the fact that they came up with exactly the same solution we used to mitigate the issue.
When they say "map the zero page" in the article, it appears they are talking about the page with index zero, not the page with all zeros in it. Does anyone know if this is correct?
Yes I had considered it rather apparent I was referring to the page at the virtual address zero, but I've definitely referred to the CoW page used for private anonymous mappings as the "zero page" too, so I understand the confusion. I probably should have made it more clear! Sorry about that.
Yes, you have that right. By mapping memory to the page at index zero (the page a 'null' pointer would point to), an attacker can leverage a null dereference to achieve much more interesting/dangerous outcomes than just a crash.
I'm really curious why there aren't more enterprise-grade, production ready kernels at this point. Isn't Rust nearing maturity? Doesn't the community have tonnes of enterprise ready C code that could be used as a reference (ie. Linux, BSD) of "what not to do"?
I'm not trying to start an argument here, I think the world knows that C/C++ make it way too easy to shoot ourselves in the foot by now. I know that writing operating systems is hard and takes a long time, i've written my own prototype single and multitasking operating systems for x86_32, 68k, Z80, 6502 etc. I'm aware that Rust support has been added to recent Linux kernels, for the limited use case of writing secure device drivers. None of these things are news to me, so please don't regurgitate these points.
But given the great body of reference that is available, the enthusiasm in the Rust community for the promise of more secure operating system kernels, I'm genuinely suprised that things aren't further along. Yes I'm aware of Redox, but it seems more aimed at desktop use, and last time I tried it didn't even boot.
Projects in C/C++ seem to be making much faster progress eg. SerenityOS than the Rust community. What is holding Rust back in this area? This is a genuine question, not intending to inflame the discussion. I'm spending some time learning Rust as I can afford, but am not opinionated one way or the other yet.
Where are all the Linux replacements that I would have imagined to be up and running by now given Rust's maturity? What am I missing here? Happy to be genuinely informed.
I kind of expected there to be a bunch of projects in flight by now, ala bazaar style, with the Rust community starting to conglomerate around the strongest contenders and move them forward at a rapid pace.
You're asking for enterprise-grade, production ready kernels, and immediately after say Rust is nearing maturity. There you go.
Writing a kernel is like building a cathedral. To have a production ready kernel, set aside 15 years, or the equivalent in dollars, at least a couple billion. This is why stuff like Serenity, which is an outstanding achievement, is not much more than a toy.
We will have a Rust based kernel that is memory safe. Not this decade though. By that time Linux will have replaced more and more subsystem with Rust already. Remember, enterprise grade means boring and stable. It is nonsense to want a new, unproved kernel to provide that level of safety out of a language that has reached 1.0 not very long ago.
Personally, I think Linux is plenty good enough, but we have seen the best UNIX can offer. It's time to move on and try something new.
> By that time Linux will have replaced more and more subsystem with Rust already.
Honestly, this is a bold prediction, maybe a foregone conclusion to some, but not everybody. It completely remains to be seen.
> It is nonsense to want a new, unproved kernel to provide that level of safety out of a language that has reached 1.0 not very long ago.
Perhaps - but the Rust community has been very vocal for a very long time about how much of an improvement over C/C++ it was already before it reached 1.0. I'm not asking for an "enterprise-grade" kernel yet, just signs of prototypes starting to emerge and the community gathering round them.
I realise I worded that badly in my initial post.
New C/C++/asm based hobby kernels are started (and abandoned) everyday on /r/osdev. Nowhere near as many in Rust by a wide margin. Instead the Rust community seems to have placed all their bets on an "outside in" rewrite of the Linux kernel in Rust. While I think this may well be great for Linux in the long term, its a little dissapointing that there is not more momentum for "from scratch" projects yet. Even Linux started as "as hobby, not big and professional like GNU". Which leads to my next point ...
> Personally, I think Linux is plenty good enough, but we have seen the best UNIX can offer. It's time to move on and try something new.
Don't disagree at all. That's why I'm surprised there aren't more up and coming projects in Rust with brand new designs to the degree that there are in C/C++. That's all. Maybe they are there, and I'm just not seeing them.
> New C/C++/asm based hobby kernels are started (and abandoned) everyday on /r/osdev. Nowhere near as many in Rust by a wide margin. Instead the Rust community seems to have placed all their bets on an "outside in" rewrite of the Linux kernel in Rust. While I think this may well be great for Linux in the long term, its a little dissapointing that there is not more momentum for "from scratch" projects yet. Even Linux started as "as hobby, not big and professional like GNU". Which leads to my next point ...
I think you may just be not looking where the rust people are posting. All you're really saying is that they aren't on /r/osdev, and frankly I don't find that surprising. Here are some of the more flagship projects
> Don't disagree at all. That's why I'm surprised there aren't more up and coming projects in Rust with brand new designs to the degree that there are in C/C++. That's all. Maybe they are there, and I'm just not seeing them.
I suppose you might have a point: with the amount of pro-Rust posts filling almost every discussion of C and C++ over the last 5 years or more, all over the web, the time that advocates spent arguing about why C users are ignorant and stubborn could have been spent contributing to a minimal kernel.
> with the amount of pro-Rust posts filling almost every discussion of C and C++ over the last 5 years or more, all over the web, the time that advocates spent arguing about why C users are ignorant and stubborn could have been spent contributing to a minimal kernel
Don't mistake a very small but very very vocal group of zealots for the Rust community at large
> Don't mistake a very small but very very vocal group of zealots for the Rust community at large
Yeah, but we're talking about the small group who continuously jumps into every C discussion on the web. That's still a large enough group to build a new kernel.
> the time that advocates spent arguing about why C users are ignorant and stubborn could have been spent contributing to a minimal kernel.
I didn't want to say it (I have done so bluntly in the past and gotten heavily downvoted for it), but yes that is one conclusion I have reached as well. I sometimes wonder if some of the people who comment that way have ever read the LKMG.
> Projects in C/C++ seem to be making much faster progress eg. SerenityOS than the Rust community. What is holding Rust back in this area? This is a genuine question, not intending to inflame the discussion.
I think one element is manpower. I don't know how many people are at good enough with Rust to write some kernel code, but I'm almost certain that it's way less than the people that can write kernel code in C or C++. C is 51 years old, C++ 38 years old, Rust 12 years old. It is possible for an enterprise-grade, production ready kernel written in Rust to exist. That still leaves the task of actually writing it.
I think this is a real factor. Rust is winning over C/C++ devs to some degree, who are switching to it or adding it to their skillset, but C/C++ devs probably still outnumber them by a factor of, what 10:1? 20:1? Only some of those C/C++ devs have a background in OSDev as well.
I suspect though that Rust is also getting a lot of its fanbase from the Go crowd, or other languages from the webdev side, where there isn't really strong hardware and OSdev skills to begin with.
It's not necessarily just ability, but also availability of jobs. Lots of places need people to help tweak the kernel they run on, so lots of Linux kernel jobs and some BSD kernel jobs, but no jobs to tweak the Rust kernel that doesn't exist yet. Gotta have some real commitment to pay people to build a new kernel from scratch, knowing that it's gonna take a long time to get anything useful.
> I'm really curious why there aren't more enterprise-grade, production ready kernels at this point.
What would be the point? Kernels aren't swappable, and the value of an OS is in the non-kernel parts - drivers and software.
So a new kernel, even if it is enterprise-grade and production ready, won't run all the things that Linux does. Only for very niche cases, it makes sense (bare-metal hypervisors, or example).
> Projects in C/C++ seem to be making much faster progress eg. SerenityOS than the Rust community. What is holding Rust back in this area?
It's slower to write in, maybe? Unlike traditional projects where the language doesn't matter as much as the libraries available, with OS projects having a huge library doesn't help much.
For example, it might be just as fast to write a data processor in Rust as in C++, because the majority of the code is in already written libraries - you just glue together input libraries, database libraries, etc.
My understanding is that Rust is particularly lacking in the hardware interface area - a feature of incredible importance for kernels. In fact, that is the Kernel's primary function - to serve as an interface between user programs and hardware. Without excellent hardware interaction support, it's unlikely to be a good language to write a kernel in.
I mean, there are several rust kernel/os projects in progress.
One project that's pushing on the boundary of safety and composability is Thesus, which takes language safety to new ground by shifting traditionally OS-level responsibilities like resource management all the way down to typechecks in the language, and also explores a way of updating any core OS component on a live running system. https://github.com/theseus-os/Theseus
> I'm really curious why there aren't more enterprise-grade, production ready kernels at this point.... Projects in C/C++ seem to be making much faster progress eg. SerenityOS than the Rust community.
I'm not sure you can take anyone (read: any troll) seriously who thinks SerenityOS is an enterprise-grade, production ready kernel.
Should we be writing new kernels for enterprise-y workloads? Why? World has kinda proven it likes things like binary compatibility. It likes for its software to work as it has, and for all the rest of the system to be predictable.
Should we be rewriting such kernels in Rust? Maybe. Where it is justified.
There are quite a few production ready Rust kernels for non-enterprise-y workloads: TockOS, Hubris, etc.
Writing an operating system is hard and laborious. There are many bugs beyond memory and there are already many tools the Linux team uses to make it safer. That you can make a better, more stable version of the Linux kernel with rust is still to be proven. Not arguing that the language is not better than C or that, if Linux was being started today, it wouldn't be a more sensible pick. But the language is just one component among many others that make such a projects successful.
I would actually be very surprised if there was anything nearly as good as Linux written in rust already. I'm not sure why a company would invest the huge amount of resources to get it done by now and, unless the language had some really unprecedented productivity, I don't think a community led protect would've had it finished by now.
> Projects in C/C++ seem to be making much faster progress eg. SerenityOS than the Rust community. What is holding Rust back in this area? This is a genuine question, not intending to inflame the discussion. I'm spending some time learning Rust as I can afford, but am not opinionated one way or the other yet.
You're not wrong, but replace 'SerenityOS' with 'Fuchsia'. Since the latter is a much serious OS that uses Rust for its drivers.
Maybe some day Fuchsia's Zircon kernel will be rewritten in Rust and then we can all come back to this again.
UNIX-like OSes, C/C++ were just not designed for security. It is time we leave UNIX in the past and start on a clean state with the state-of-the-art and little to no legacy cruft.
I suspect the real reason is there's no money in it. Linux may have the image of being "bazaar style" but in reality most of it gets written by full-time corporate employees who get paid to work on it at their day jobs.
The article is about the exploitability of the flaw but really the flaw should not exist. Printing /proc/$pid/smaps is not on any conceivable performance-critical hot path. It can stand to have bounds checks and safety. The call to print out smaps should be well-encapsulated in some non-C language.
There is also an associated nullability sanitizer.
I use this in my own C code all the time and null pointer errors vanish if you faithfully annotate every pointer. There’s also a pragma to make non null pointers the default in a file.
GCC devs would have to be convinced to add this to GCC and then nullability annotations would need to be added to the kernel. You can then do static analysis/compile error if you do an unguarded check of a nullable pointer.
Yes, in an ideal world your kernel shouldn't have any bugs. We don't live in an ideal world.
Security engineering is the field of practical mitigations - given that there are, in fact, null pointer dereferences in the kernel, mmap_min_addr and adding count limit to kernel oops provides defense in depth to help prevent them from being exploitable.
Thankfully this isolated flaw was quite easy to fix. And yes this code isn't likely to be on any hot paths, and code can always stand to have bounds/sanity checks (and it always should). But unfortunately encapsulating all non-hot-paths in Linux kernel that might have these sorts of bugs in a memory-safe language is at best a very long term goal and at worst a pipe-dream. The real goal of the blog post was not to push for any sort of rewrite, but rather to note how even the simplest and most innocuous of bugs can lead to security-relevant primitives. And also to make sure kernel developers and bug fixers have strategies like this in mind when they evaluate other bugs in the future.
TLDR: However honorable the end-goal is, this blog post is not the ammo you need to push for a big rewrite of various kernel<->userland interfaces into memory safe languages.
If you think smaps is performance-critical that raises of the question of its ridiculous textual format. Clearly, it would be vastly more efficient to pass the information as a protobuf or whatever. Believe me, as the person who had to refactor the smaps-reading library on cost/efficiency grounds at Google, this issue is nearer and dearer to me than to probably anyone else.
My safe language doesn't have "null", more or less.
What it has is Option<T>, and I cannot turn that into a T without handling the failure case: there is literally no way to construct the code otherwise¹. One must handle the failure path. (That might be way of explicit panic/abort/oops, but it's then right there in the code: that branch will panic … and safely.)
¹(this example is using safe Rust. There's unsafe Rust too and there I can chase the null pointer all I want with that, but the parent's point is that we should be sticking to safe interfaces for stuff like this. And I'm using Rust as an example, but Option is hardly unique to Rust, heck, Rust stole the idea from its predecessors.)
OK, and what does your kernel actually do when a kernel thread panics? It "stops execution", sure, so it will... oops? Causing exactly the problem given in the article?
It shouldn’t be a question that tedunangst should even have to ask, given that he has had a chip on his shoulder about Rust and its safety guarantees for years and thus should have learned that much about it by now.
The real issue with this error is not the panic of course - it's the fact that during a panic, C doesn't provide a canonical way of unwinding whatever actions have been performed so far. Rust (or even C++) do provide a bit more robustness in regards to handling errors in an unwindable way, but Rust and C++ probably aren't tenable solutions for the kernel in this case. It's far easier to add an oops limit and kiss this technique goodbye (hopefully!).
One of the missing pieces IMO is that your language needs the right kind of shorthand to make it easy to say "I'm calling things that return Options or Results, I myself return an Option or Result, any time I unwrap let's pipeline any null values or failures into returning a failure immediately."
The whole idea is that there should never be a way to call unwrap() if you as the caller cannot handle it gracefully. And if you do this at every step up until the UI layer, which can handle any failure as an error to be displayed in the UI, then the job is complete!
Yep, and monads without syntactic support are a pain to work with in practice! For instance, the best possible way in a language like Python is to break down what would otherwise be a set of imperative statements into a whole bunch of lambda functions: https://returns.readthedocs.io/en/latest/#id1
In a C++ code base I work on, we use a macro that is essentially Rust’s ?. It’s obviously less convenient than ?, but it’s not bad (even without syntax support). In the past in systems C code, I’ve manually written every error check and return.
That's the explicit handling of the None case I mentioned in the comment: it causes an explicit, and safe, abort. By "explicit", I mean the .unwrap() call will be right there, in the method that needs to turn an Option<T> into a T, and visible to a code reviewer. In the larger context here of kernel code, it should raise the eyebrow on the reviewer: "wait, this function shouldn't abort, it needs to handle the edge cases!".
(But for some userland app, aborting might be acceptable. The kernel is in a bit of a bind, since an abort — a kernel panic — means the user loses computer until they reboot, and the work along with it.)
Vs. a C pointer … all uses are more or less equally suspect; any given use, you hope the code has done it's homework for ensuring they're not NULL, and if they are, the consequence is UB. (And in Rust, and in the languages Rust steals the idea of Option from, you're only using/passing Options where "None"/null/nil is a possibility. If it's not, or you've verified or handled that at some outer stack frame, then you just pass a reference to a T, which is statically guaranteed to point to a valid object¹.)
¹again, barring buggy code using unsafe Rust, in the example of Rust, or calling into C code that fails to maintain its invariants, etc.
Take the example in the article, where the code does,
priv->mm->mmap->vm_start
while trying to generate the output for smaps_rollup. That's compilable, but buggy, C, because mmap can be null, but we failed to check for it.
Vs., if mmap were an Option<T>, where T is whatever type that pointer points to. Let's say our coder attempts to write,
priv->mm->mmap->vm_start
(In some imaginary language, because C doesn't have Option, AFAIK.) The compiler would say, no, you can't "->vm_start", because "mmap" could be None (whatever you call the "nothing here" value/variant; I'm going to call it None, to distinguish it from the null pointer).
In the case of unwrap, the coder could do something like (this is psuedo-code)
(priv->mm->mmap).unwrap().vm_start
It would then be obvious there is an abort there. Their reviewer would not be pleased with that, I suspect: we don't want kernel panics or oops or aborts while generating a file in /proc. And likely our imaginary coder would know this too, and when the compiler errored the first time, saying, "hey, mmap is an Option", they'd raise an eyebrow, say something like, "wait, it is? When would mmap be None?" and then proceed to properly handle that case. (E.g., by treating it as if it where the empty list.)
The root cause of the bug isn't the kernel dereferencing a NULL and causing UB. It's the kernel doing error handling and attempting to kill the oopsing task and continue. If the semantics of Rust panics also did a kernel oops, unwrap() would trigger the exact same bug described in the blog post with regards to reference count rollover (if Rust in the kernel doesn't do stack unwinding)
There are two bugs discussed in the article. One is the one this thread started with, which was the kernel deref'ing a NULL.
You're right that this is separate from the handling of the oops, which is the main exploitability that TFA is getting at, and certainly fixing one deref leading to an oops (the proc file chasing NULL) doesn't fix the other bug of "any oops can be further exploited".
But the context of this subthread is the implication that you must have some null, and some thing must happen when it is chased. That assumption is wrong, that's what the core of the comment I'm making is getting at: you can't follow a null if you don't have the possibility of them in the first place. (Or where you must have an Option<T>, you can build safe interfaces for handling that fact.)
> unwrap() would trigger the exact same bug described in the blog post with regards to reference count rollover
If we consider this instead as "an unwrap occurring during the oops handling", maybe, but it's not guaranteed that that is the case. Other aspects of Rust could similarly prevent that bug. I haven't fully grokked the latter half of the article, but I didn't think it would be necessary for the comment, as, a. the chain was about "Printing /proc/$pid/smaps is not on any conceivable performance-critical hot path." and b. followed by the question about null.
Ref-counting in Rust is often dealt with via RAII, and is safe through that, both in that RAII means the refcount is managed correctly and without input from the coder, but also Rc (and I presume Arc) will abort on overflow. I don't know if that would fully translate to kernel code, given that we might be taking refs due to the actions of userland, and that might be happening near the userland/kernel boundary and be reasonably subject to unsafe code that could very well fall prey to the same problems.
Yes, but fundamentally the kernel's inability to handle exceptional cases is all in deference to performance. Non-performance-critical sections, which in a fair analysis would be 99.9% of the kernel at least, should be written in a language and style that provides structured unwinding, not just jumping to the error case and whoops I accidentally jumped beyond all the unlocks and reference count decrements and deallocations. That's the issue.
You wouldn't allow code that aborts without cleanup in these areas of the kernel, or in the kernel at all.
You can't make a similar rule against null dereferences, because those happen by accident. (Unless you wrap every single pointer dereference, which is not happening.)
If you don't allow aborting, then the compiler makes you write an error-handling path that returns, and the cleanup code will not be skipped.
These juvenile and facile retorts do not elevate the discourse, nor are they a good look. As a long-time OpenBSD developer you have a lot of smart things to say about technical subjects, including kernels specifically, and I wish you would leave the dumb snarky comments unwritten.
If an abort function still triggers cleanup, then yes it is better. C doesn't have such a thing, so your sarcasm about 'telling them' is unwarranted.
If an abort function doesn't trigger cleanup, then you can block it at compile time to prevent this kind of bug. But before you can even think about doing that, you need to split pointers into nullable and non-nullable. And the kernel devs already know about that idea, and how hard it is to implement in C.
Nobody is naively suggesting "hey kernel devs do this thing!" as if there isn't decades of momentum behind the current codebase. It's just a look at how C is bad at this particular kind of bug.
Ideally it has a iterator construct built in so it views an empty linked list chain truly as an empty list without derefencing the first (null) item preemptively.
In order to have the safe language I believe you would need to decompose the code shown here in show_smaps_rollup(). If the null deref occurred in the unsafe portion it would likely still do an oops. If the null deref occurred in the safe portion it would likely exit safely and cause the syscall to return some errno that describes a kernel fault.