[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251125230840.40baa93e@nimda.home>
Date: Tue, 25 Nov 2025 23:08:40 +0300
From: Onur Özkan <work@...rozkan.dev>
To: Daniel Almeida <daniel.almeida@...labora.com>
Cc: Lyude Paul <lyude@...hat.com>, rust-for-linux@...r.kernel.org,
lossin@...nel.org, ojeda@...nel.org, alex.gaynor@...il.com,
boqun.feng@...il.com, gary@...yguo.net, a.hindborg@...nel.org,
aliceryhl@...gle.com, tmgross@...ch.edu, dakr@...nel.org,
peterz@...radead.org, mingo@...hat.com, will@...nel.org,
longman@...hat.com, felipe_life@...e.com, daniel@...lak.dev,
bjorn3_gh@...tonmail.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7 5/6] rust: ww_mutex: implement LockSet
On Tue, 25 Nov 2025 16:01:08 -0300
Daniel Almeida <daniel.almeida@...labora.com> wrote:
>
>
> > On 24 Nov 2025, at 12:49, Onur Özkan <work@...rozkan.dev> wrote:
> >
> > On Fri, 21 Nov 2025 17:34:15 -0500
> > Lyude Paul <lyude@...hat.com> wrote:
> >
> >> On Sat, 2025-11-01 at 19:10 +0300, Onur Özkan wrote:
> >>> `LockSet` is a high-level and safe API built on top of
> >>> ww_mutex, provides a simple API while keeping the ww_mutex
> >>> semantics.
> >>>
> >>> When `EDEADLK` is hit, it drops all held locks, resets
> >>> the acquire context and retries the given (by the user)
> >>> locking algorithm until it succeeds.
> >>>
> >>> Signed-off-by: Onur Özkan <work@...rozkan.dev>
> >>> ---
> >>> rust/kernel/sync/lock/ww_mutex.rs | 6 +
> >>> rust/kernel/sync/lock/ww_mutex/lock_set.rs | 245
> >>> +++++++++++++++++++++ 2 files changed, 251 insertions(+)
> >>> create mode 100644 rust/kernel/sync/lock/ww_mutex/lock_set.rs
> >>>
> >>> diff --git a/rust/kernel/sync/lock/ww_mutex.rs
> >>> b/rust/kernel/sync/lock/ww_mutex.rs index
> >>> 2a9c1c20281b..d4c3b272912d 100644 ---
> >>> a/rust/kernel/sync/lock/ww_mutex.rs +++
> >>> b/rust/kernel/sync/lock/ww_mutex.rs @@ -5,6 +5,10 @@
> >>> //! It is designed to avoid deadlocks when locking multiple
> >>> [`Mutex`]es //! that belong to the same [`Class`]. Each lock
> >>> acquisition uses an //! [`AcquireCtx`] to track ordering and
> >>> ensure forward progress. +//!
> >>> +//! It is recommended to use [`LockSet`] as it provides safe
> >>> high-level +//! interface that automatically handles deadlocks,
> >>> retries and context +//! management.
> >>>
> >>> use crate::error::to_result;
> >>> use crate::prelude::*;
> >>> @@ -16,9 +20,11 @@
> >>>
> >>> pub use acquire_ctx::AcquireCtx;
> >>> pub use class::Class;
> >>> +pub use lock_set::LockSet;
> >>>
> >>> mod acquire_ctx;
> >>> mod class;
> >>> +mod lock_set;
> >>>
> >>> /// A wound-wait (ww) mutex that is powered with deadlock
> >>> avoidance /// when acquiring multiple locks of the same [`Class`].
> >>> diff --git a/rust/kernel/sync/lock/ww_mutex/lock_set.rs
> >>> b/rust/kernel/sync/lock/ww_mutex/lock_set.rs new file mode 100644
> >>> index 000000000000..ae234fd1e0be
> >>> --- /dev/null
> >>> +++ b/rust/kernel/sync/lock/ww_mutex/lock_set.rs
> >>> @@ -0,0 +1,245 @@
> >>> +// SPDX-License-Identifier: GPL-2.0
> >>> +
> >>> +//! Provides [`LockSet`] which automatically detects [`EDEADLK`],
> >>> +//! releases all locks, resets the state and retries the user
> >>> +//! supplied locking algorithm until success.
> >>> +
> >>> +use super::{AcquireCtx, Class, Mutex};
> >>> +use crate::bindings;
> >>> +use crate::prelude::*;
> >>> +use crate::types::NotThreadSafe;
> >>> +use core::ptr::NonNull;
> >>> +
> >>> +/// A tracked set of [`Mutex`] locks acquired under the same
> >>> [`Class`]. +///
> >>> +/// It ensures proper cleanup and retry mechanism on deadlocks
> >>> and provides +/// safe access to locked data via
> >>> [`LockSet::with_locked`]. +///
> >>> +/// Typical usage is through [`LockSet::lock_all`], which
> >>> retries a +/// user supplied locking algorithm until it succeeds
> >>> without deadlock. +pub struct LockSet<'a> {
> >>> + acquire_ctx: Pin<KBox<AcquireCtx<'a>>>,
> >>> + taken: KVec<RawGuard>,
> >>> + class: &'a Class,
> >>> +}
> >>> +
> >>> +/// Used by `LockSet` to track acquired locks.
> >>> +///
> >>> +/// This type is strictly crate-private and must never be exposed
> >>> +/// outside this crate.
> >>> +struct RawGuard {
> >>> + mutex_ptr: NonNull<bindings::ww_mutex>,
> >>> + _not_send: NotThreadSafe,
> >>> +}
> >>> +
> >>> +impl Drop for RawGuard {
> >>> + fn drop(&mut self) {
> >>> + // SAFETY: `mutex_ptr` originates from a locked `Mutex`
> >>> and remains
> >>> + // valid for the lifetime of this guard, so unlocking
> >>> here is sound.
> >>> + unsafe {
> >>> bindings::ww_mutex_unlock(self.mutex_ptr.as_ptr()) };
> >>> + }
> >>> +}
> >>> +
> >>> +impl<'a> Drop for LockSet<'a> {
> >>> + fn drop(&mut self) {
> >>> + self.release_all_locks();
> >>> + }
> >>> +}
> >>> +
> >>> +impl<'a> LockSet<'a> {
> >>> + /// Creates a new [`LockSet`] with the given class.
> >>> + ///
> >>> + /// All locks taken through this [`LockSet`] must belong to
> >>> the
> >>> + /// same class.
> >>> + pub fn new(class: &'a Class) -> Result<Self> {
> >>> + Ok(Self {
> >>> + acquire_ctx: KBox::pin_init(AcquireCtx::new(class),
> >>> GFP_KERNEL)?,
> >>> + taken: KVec::new(),
> >>> + class,
> >>> + })
> >>> + }
> >>> +
> >>> + /// Creates a new [`LockSet`] using an existing
> >>> [`AcquireCtx`] and
> >>> + /// [`Class`].
> >>> + ///
> >>> + /// # Safety
> >>> + ///
> >>> + /// The caller must ensure that `acquire_ctx` is properly
> >>> initialized,
> >>> + /// holds no mutexes and that the provided `class` matches
> >>> the one used
> >>> + /// to initialize the given `acquire_ctx`.
> >>> + pub unsafe fn new_with_acquire_ctx(
> >>> + acquire_ctx: Pin<KBox<AcquireCtx<'a>>>,
> >>> + class: &'a Class,
> >>> + ) -> Self {
> >>> + Self {
> >>> + acquire_ctx,
> >>> + taken: KVec::new(),
> >>> + class,
> >>> + }
> >>> + }
> >>> +
> >>> + /// Attempts to lock a [`Mutex`] and records the guard.
> >>> + ///
> >>> + /// Returns [`EDEADLK`] if lock ordering would cause a
> >>> deadlock.
> >>> + ///
> >>> + /// Returns [`EBUSY`] if `mutex` was locked outside of this
> >>> [`LockSet`].
> >>> + ///
> >>> + /// # Safety
> >>> + ///
> >>> + /// The given `mutex` must be created with the [`Class`] that
> >>> was used
> >>> + /// to initialize this [`LockSet`].
> >>> + pub unsafe fn lock<T>(&mut self, mutex: &'a Mutex<'a, T>) ->
> >>> Result {
> >>> + if mutex.is_locked()
> >>> + && !self
> >>> + .taken
> >>> + .iter()
> >>> + .any(|guard| guard.mutex_ptr.as_ptr() ==
> >>> mutex.inner.get())
> >>> + {
> >>> + return Err(EBUSY);
> >>> + }
> >>
> >> I don't think that we need or want to keep track of this - even for
> >> checking if we've acquired a lock already. The kernel already does
> >> this (from __ww_rt_mutex_lock()):
> >>
> >
> > The code is here self-documenting. It checks whether the mutex was
> > locked outside this context (which should never happen). This makes
> > the implementation easier to understand IMO.
> >
> >> if (ww_ctx) {
> >> if (unlikely(ww_ctx == READ_ONCE(lock->ctx)))
> >> return -EALREADY;
> >>
> >> /*
> >> * Reset the wounded flag after a kill. No other process can
> >> * race and wound us here, since they can't have a valid owner
> >> * pointer if we don't have any locks held.
> >> */
> >> if (ww_ctx->acquired == 0)
> >> ww_ctx->wounded = 0;
> >>
> >> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> >> nest_lock = &ww_ctx->dep_map;
> >> #endif
> >> }
> >>> +
> >>> + // SAFETY: By the safety contract, `mutex` belongs to the
> >>> same `Class`
> >>> + // as `self.acquire_ctx` does.
> >>> + let guard = unsafe { self.acquire_ctx.lock(mutex)? };
> >>> +
> >>> + self.taken.push(
> >>> + RawGuard {
> >>> + // SAFETY: We just locked it above so it's a
> >>> valid pointer.
> >>> + mutex_ptr: unsafe {
> >>> NonNull::new_unchecked(guard.mutex.inner.get()) },
> >>> + _not_send: NotThreadSafe,
> >>> + },
> >>> + GFP_KERNEL,
> >>> + )?;
> >>> +
> >>> + // Avoid unlocking here; `release_all_locks` (also run by
> >>> `Drop`)
> >>> + // performs the unlock for `LockSet`.
> >>> + core::mem::forget(guard);
> >>> +
> >>> + Ok(())
> >>> + }
> >>> +
> >>> + /// Runs `locking_algorithm` until success with retrying on
> >>> deadlock.
> >>> + ///
> >>> + /// `locking_algorithm` should attempt to acquire all needed
> >>> locks.
> >>> + /// If [`EDEADLK`] is detected, this function will roll back,
> >>> reset
> >>> + /// the context and retry automatically.
> >>> + ///
> >>> + /// Once all locks are acquired successfully,
> >>> `on_all_locks_taken` is
> >>> + /// invoked for exclusive access to the locked values.
> >>> Afterwards, all
> >>> + /// locks are released.
> >>> + ///
> >>> + /// # Example
> >>> + ///
> >>> + /// ```
> >>> + /// use kernel::alloc::KBox;
> >>> + /// use kernel::c_str;
> >>> + /// use kernel::prelude::*;
> >>> + /// use kernel::sync::Arc;
> >>> + /// use kernel::sync::lock::ww_mutex::{Class, LockSet,
> >>> Mutex};
> >>> + /// use pin_init::stack_pin_init;
> >>> + ///
> >>> + /// stack_pin_init!(let class =
> >>> Class::new_wound_wait(c_str!("test")));
> >>> + ///
> >>> + /// let mutex1 = Arc::pin_init(Mutex::new(0, &class),
> >>> GFP_KERNEL)?;
> >>> + /// let mutex2 = Arc::pin_init(Mutex::new(0, &class),
> >>> GFP_KERNEL)?;
> >>> + /// let mut lock_set = KBox::pin_init(LockSet::new(&class)?,
> >>> GFP_KERNEL)?;
> >>> + ///
> >>> + /// lock_set.lock_all(
> >>> + /// // `locking_algorithm` closure
> >>> + /// |lock_set| {
> >>> + /// // SAFETY: Both `lock_set` and `mutex1` uses the
> >>> same class.
> >>> + /// unsafe { lock_set.lock(&mutex1)? };
> >>> + ///
> >>> + /// // SAFETY: Both `lock_set` and `mutex2` uses the
> >>> same class.
> >>> + /// unsafe { lock_set.lock(&mutex2)? };
> >>
> >> I wonder if there's some way we can get rid of the safety contract
> >> here and verify this at compile time, it would be a shame if every
> >> single lock invocation needed to be unsafe.
> >>
> >
> > Yeah :(. We could get rid of them easily by keeping the class that
> > was passed to the constructor functions but that becomes a problem
> > for the from_raw implementations.
>
> Then let’s make from_raw actually unsafe and have the user pass the
> class as an argument there. Would that work?
>
This was the first solution I tried when working on this version but I
reverted it for a reason I don't recall right now. I will try it again
for the next version and see if I can remember why I reverted it.
> That was the first thing that I noticed about this patch as well:
> it’s unfortunate that locking is unsafe.
>
> >
> > I think the best solution would be to expose ww_class type from
> > ww_acquire_ctx and ww_mutex unconditionally (right now it depends on
> > DEBUG_WW_MUTEXES). That way we can just access the class and verify
> > that the mutex and acquire_ctx classes match.
> >
> > What do you think? I can submit a patch for the C-side
> > implementation. It should be straightforward and shouldn't have any
> > runtime impact.
> >
> >>> + ///
> >>> + /// Ok(())
> >>> + /// },
> >>> + /// // `on_all_locks_taken` closure
> >>> + /// |lock_set| {
> >>> + /// // Safely mutate both values while holding the
> >>> locks.
> >>> + /// lock_set.with_locked(&mutex1, |v| *v += 1)?;
> >>> + /// lock_set.with_locked(&mutex2, |v| *v += 1)?;
> >>> + ///
> >>> + /// Ok(())
> >>> + /// },
> >>
> >> I'm still pretty confident we don't need or want both closures and
> >> can combine them into a single closure. And I am still pretty sure
> >> the only thing that needs to be tracked here is which lock we
> >> failed to acquire in the event of a deadlock.
> >>
> >> Let me see if I can do a better job of explaining why. Or, if I'm
> >> actually wrong about this - maybe this will help you correct me and
> >> see where I've misunderstood something :).
> >>
> >> First, let's pretend we've made a couple of changes here:
> >>
> >> * We remove `taken: KVec<RawGuard>` and replace it with `failed:
> >> *mut Mutex<…>`
> >> * lock_set.lock():
> >> - Now returns a `Guard` that executes `ww_mutex_unlock` in its
> >> destructor
> >> - If `ww_mutex_lock` fails due to -EDEADLK, this function stores
> >> a pointer to the respective mutex in `lock_set.failed`.
> >> - Before acquiring a lock, we now check:
> >> + if lock_set.failed == lock
> >> * Return a Guard for lock without calling ww_mutex_lock()
> >> * lock_set.failed = null_mut();
> >> * We remove `on_all_locks_taken()`, and rename `locking_algorithm`
> >> to `ww_cb`.
> >> * If `ww_cb()` returns Err(EDEADLK):
> >> - if !lock_set.failed.is_null()
> >> + ww_mutex_lock(lock_set.failed) // Don't store a guard
> >> * If `ww_cb()` returns Ok(…):
> >> - if !lock_set.failed.is_null()
> >> // This could only happen if we hit -EDEADLK but then `ww_cb`
> >> did not // re-acquire `lock_set.failed` on the next attempt
> >> + ww_mutex_unlock(lock_set.failed)
> >>
> >> With all of those changes, we can rewrite `ww_cb` to look like
> >> this:
> >>
> >> |lock_set| {
> >> // SAFETY: Both `lock_set` and `mutex1` uses the same class.
> >> let g1 = unsafe { lock_set.lock(&mutex1)? };
> >>
> >> // SAFETY: Both `lock_set` and `mutex2` uses the same class.
> >> let g2 = unsafe { lock_set.lock(&mutex2)? };
> >>
> >> *g1 += 1;
> >> *g2 += 2;
> >>
> >> Ok(())
> >> }
> >>
> >> If we hit -EDEADLK when trying to acquire g2, this is more or less
> >> what would happen:
> >>
> >> * let res = ww_cb():
> >> - let g1 = …; // (we acquire g1 successfully)
> >> - let g2 = …; // (enter .lock())
> >> + res = ww_mutex_lock(mutex2);
> >> + if (res) == EDEADLK
> >> * lock_set.failed = mutex2;
> >> + return Err(EDEADLK);
> >> - return Err(-EDEADLK);
> >> // Exiting ww_cb(), so rust will drop all variables in this
> >> scope:
> >> + ww_mutex_unlock(mutex1) // g1's Drop
> >>
> >> * // (res == Err(EDEADLK))
> >> // All locks have been released at this point
> >>
> >> * if !lock_set.failed.is_null()
> >> - ww_mutex_lock(lock_set.failed) // Don't create a guard
> >> // We've now re-acquired the lock we dead-locked on
> >>
> >> * let res = ww_cb():
> >> - let g1 = …; // (we acquire g1 successfully)
> >> - let g2 = …; // (enter .lock())
> >> + if lock_set.failed == lock
> >> * lock_set.failed = null_mut();
> >> * return Guard(…); // but don't call ww_mutex_lock(), it's
> >> already locked
> >> - // We acquired g2 successfully!
> >> - *g1 += 1;
> >> - *g2 += 2;
> >>
> >> * etc…
> >>
> >> The only challenge with this is that users need to write their
> >> ww_cb() implementations to be idempotent (so that calling it
> >> multiple times isn't unexpected). But that's already what we do on
> >> the C side, and is kind of what I expected we would want to do in
> >> rust anyhow.
> >>
> >> Does this make sense, or was there something I made a mistake with
> >> here?
> >
> > Thanks a lot for analyzing and providing an alternative on this!
> >
> > However, collapsing everything into a single callback would require
> > the caller to self-police various disciplines like "don't touch gN
> > until gN+1 succeeded", which is exactly the foot-gun we are trying
> > avoid with 2 closures.
> >
> > Separating acquire and use logics not just simpler API to implement
> > (and provide), but also more effective compare to your example
> > here. With single closure we basically move API responsibility to
> > the users (e.g., do not run this part of the code in the loop, do
> > not access to any data behind any guard if all the locks aren't
> > taken yet, etc.), which is not a good thing to do, especially from
> > the high-level API.
>
> +1
>
> IMHO, we went from an API that’s simple to understand, to something
> much more elaborate. I’m not sure I see the benefit. I specifically
> think that having a separate function (i.e.: closure) that says “ok,
> all locks are taken now” is actually helpful for users.
>
>
> >
> >>
> >>> + /// )?;
> >>> + ///
> >>> + /// # Ok::<(), Error>(())
> >>> + /// ```
> >>> + pub fn lock_all<T, Y, Z>(
> >>> + &mut self,
> >>> + mut locking_algorithm: T,
> >>> + mut on_all_locks_taken: Y,
> >>> + ) -> Result<Z>
> >>> + where
> >>> + T: FnMut(&mut LockSet<'a>) -> Result,
> >>> + Y: FnMut(&mut LockSet<'a>) -> Result<Z>,
> >>> + {
> >>> + loop {
> >>> + match locking_algorithm(self) {
> >>> + Ok(()) => {
> >>> + // All locks in `locking_algorithm`
> >>> succeeded.
> >>> + // The user can now safely use them in
> >>> `on_all_locks_taken`.
> >>> + let res = on_all_locks_taken(self);
> >>> + self.release_all_locks();
> >>> +
> >>> + return res;
> >>> + }
> >>> + Err(e) if e == EDEADLK => {
> >>> + // Deadlock detected, retry from scratch.
> >>> + self.cleanup_on_deadlock();
> >>> + continue;
> >>> + }
> >>> + Err(e) => {
> >>> + self.release_all_locks();
> >>> + return Err(e);
> >>> + }
> >>> + }
> >>> + }
> >>> + }
> >>> +
> >>> + /// Executes `access` with a mutable reference to the data
> >>> behind `mutex`.
> >>> + ///
> >>> + /// Fails with [`EINVAL`] if the mutex was not locked in this
> >>> [`LockSet`].
> >>> + pub fn with_locked<T: Unpin, Y>(
> >>> + &mut self,
> >>> + mutex: &'a Mutex<'a, T>,
> >>> + access: impl for<'b> FnOnce(&'b mut T) -> Y,
> >>> + ) -> Result<Y> {
> >>> + let mutex_ptr = mutex.inner.get();
> >>> +
> >>> + if self
> >>> + .taken
> >>> + .iter()
> >>> + .any(|guard| guard.mutex_ptr.as_ptr() == mutex_ptr)
> >>> + {
> >>> + // SAFETY: We hold the lock corresponding to `mutex`,
> >>> so we have
> >>> + // exclusive access to its protected data.
> >>> + let value = unsafe { &mut *mutex.data.get() };
> >>> + Ok(access(value))
> >>> + } else {
> >>> + // `mutex` isn't locked in this `LockSet`.
> >>> + Err(EINVAL)
> >>> + }
> >>> + }
> >>> +
> >>> + /// Releases all currently held locks in this [`LockSet`].
> >>> + fn release_all_locks(&mut self) {
> >>> + // `Drop` implementation of the `RawGuard` takes care of
> >>> the unlocking.
> >>> + self.taken.clear();
> >>> + }
> >>> +
> >>> + /// Resets this [`LockSet`] after a deadlock detection.
> >>> + ///
> >>> + /// Drops all held locks and reinitializes the
> >>> [`AcquireCtx`].
> >>> + ///
> >>> + /// It is intended to be used for internal implementation
> >>> only.
> >>> + fn cleanup_on_deadlock(&mut self) {
> >>> + self.release_all_locks();
> >>> +
> >>> + // SAFETY: We are passing the same `class` that was used
> >>> + // to initialize `self.acquire_ctx`.
> >>> + unsafe { self.acquire_ctx.as_mut().reinit(self.class) };
> >>> + }
> >>> +}
>
>
-Onur
Powered by blists - more mailing lists