[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DFYJYG80FGDK.15S9CRFYLJ7O8@garyguo.net>
Date: Mon, 26 Jan 2026 13:24:25 +0000
From: "Gary Guo" <gary@...yguo.net>
To: "Lyude Paul" <lyude@...hat.com>, <rust-for-linux@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, "Thomas Gleixner" <tglx@...utronix.de>
Cc: "Boqun Feng" <boqun.feng@...il.com>, "Daniel Almeida"
<daniel.almeida@...labora.com>, "Miguel Ojeda" <ojeda@...nel.org>, "Alex
Gaynor" <alex.gaynor@...il.com>, "Gary Guo" <gary@...yguo.net>,
Björn Roy Baron <bjorn3_gh@...tonmail.com>, "Benno Lossin"
<lossin@...nel.org>, "Andreas Hindborg" <a.hindborg@...nel.org>, "Alice
Ryhl" <aliceryhl@...gle.com>, "Trevor Gross" <tmgross@...ch.edu>, "Danilo
Krummrich" <dakr@...nel.org>, "Andrew Morton" <akpm@...ux-foundation.org>,
"Peter Zijlstra" <peterz@...radead.org>, "Ingo Molnar" <mingo@...hat.com>,
"Will Deacon" <will@...nel.org>, "Waiman Long" <longman@...hat.com>
Subject: Re: [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust
On Wed Jan 21, 2026 at 10:39 PM GMT, Lyude Paul wrote:
> This is the latest patch series for adding rust bindings for controlling
> local processor interrupts, adding support for spinlocks in rust that
> are acquired with local processor interrupts disabled, and implementing
> local interrupt controls through refcounting in the kernel.
>
> The previous version of this patch series can be found here:
>
> https://lkml.org/lkml/2025/12/15/1190
Please use lore.kernel.org (or patch.msgid.link) links instead.
Best,
Gary
>
> This patch series applies on top of the rust-next branch.
>
> There's a few big changes from the last time. Mainly that we've
> addressed all(?) of the open questions on this patch series:
>
> * Thanks to Joel Fernandes, we now have a seperate per-CPU counter for
> tracking NMI nesting - which ensures that we don't have to sacrifice
> NMI nest level bits in order to store a counter for refcounted IRQs.
> These patches have been included at the start of the series.
> * We've been able to prove that being able to convert the kernel over to
> this new interface is indeed possible, more on this below.
> * Also thank to Joel, we also now have actual benchmarks for how this
> affects performance:
> https://lore.kernel.org/rust-for-linux/20250619175335.2905836-1-joelagnelf@nvidia.com/
> * Also some small changes to the kunit test I added, mainly just making
> sure I don't forget to include a MODULE_DESCRIPTION or MODULE_LICENSE.
>
> Regarding the conversion plan: we've had some success at getting kernels
> to boot after attempting to convert the entire kernel from the
> non-refcounted API to the new refcounted API. It will definitely take
> quite a lot of work to get this right though, at least in the kernel
> core side of things. To give readers an idea of what I mean, here's a
> few of the issues that we ended up running into:
>
> On my end, I tried running a number of coccinelle conversions for this.
> At first I did actually try simply rewiring
> local_irq_disable()/local_irq_enable() to
> local_interrupt_enable()/local_interrupt_disable(). This wasn't really
> workable though, as it causes the kernel to crash very early on in a
> number of ways that I haven't fully untangled. Doing this with
> coccinelle on the other hand allowed me to convert individual files at a
> time, along with specific usage patterns of the old API, and as a result
> this ended up giving me a pretty good idea of where our issues are
> coming from. This coccinelle script, while still leaving most of the
> kernel unconverted, was at least able to be run on almost all of kernel/
> while still allowing us to boot on x86_64
>
> @depends on patch && !report@
> @@
> - local_irq_disable();
> + local_interrupt_disable();
> ...
> - local_irq_enable();
> + local_interrupt_enable();
>
> There were two files in kernel/ that were exceptions to this:
>
> * kernel/softirq.c
> * kernel/main.c (I figured out at least one fix to an issue here)
>
> The reason this worked is because it seems like the vast majority of the
> issues we're seeing come from "unbalanced"/"misordered" usages of the
> old irq API. And there seems to be a few reasons for this:
>
> * The first simple reason: occasionally the enable/disable was split
> across a function, which this script didn't handle.
> * The second more complicated reason: some portions of the kernel core
> end up calling processor instructions that modify the processor's
> local interrupt flags independently of the kernel. In x86_64's case, I
> believe we came to the conclusion the iret instruction (interrupt
> return) was modifying the interrupt flag state. There's possibly a few
> more instances like this elsewhere.
>
> Boqun also took a stab at this on aarch64, and ended up having similar
> findings. In their case, they discovered one of the culprits being
> raw_spin_rq_unlock_irq(). Here the reason is that on aarch64
> preempt_count is per-thread and not just per-cpu, and when context
> switching you generally disable interrupts from one task and restore it
> in the other task. So in order to fix it, we'll need to make some
> modifications to the aarch64 context-switching code.
>
> So - with this being said, we decided that the best way of converting it
> is likely to just leave us with 3 APIs for the time being - and have new
> drivers and code use the new API while we go through and convert the
> rest of the kernel.
>
> FULL CHANGELOG BELOW
>
> Boqun Feng (5):
> preempt: Introduce HARDIRQ_DISABLE_BITS
> preempt: Introduce __preempt_count_{sub, add}_return()
> irq & spin_lock: Add counted interrupt disabling/enabling
> rust: helper: Add spin_{un,}lock_irq_{enable,disable}() helpers
> locking: Switch to _irq_{disable,enable}() variants in cleanup guards
>
> Joel Fernandes (1):
> preempt: Track NMI nesting to separate per-CPU counter
>
> Lyude Paul (10):
> openrisc: Include <linux/cpumask.h> in smp.h
> irq: Add KUnit test for refcounted interrupt enable/disable
> rust: Introduce interrupt module
> rust: sync: Add SpinLockIrq
> rust: sync: Introduce lock::Lock::lock_with() and friends
> rust: sync: Expose lock::Backend
> rust: sync: lock/global: Rename B to G in trait bounds
> rust: sync: Add a lifetime parameter to lock::global::GlobalGuard
> rust: sync: lock/global: Add Backend parameter to GlobalGuard
> rust: sync: lock/global: Add ContextualBackend support to GlobalLock
Powered by blists - more mailing lists