linux-kernel - Re: [PATCH v7 5/6] rust: ww

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251124184928.30b8bbaf@nimda>
Date: Mon, 24 Nov 2025 18:49:28 +0300
From: Onur Özkan <work@...rozkan.dev>
To: Lyude Paul <lyude@...hat.com>
Cc: rust-for-linux@...r.kernel.org, lossin@...nel.org, ojeda@...nel.org,
 alex.gaynor@...il.com, boqun.feng@...il.com, gary@...yguo.net,
 a.hindborg@...nel.org, aliceryhl@...gle.com, tmgross@...ch.edu,
 dakr@...nel.org, peterz@...radead.org, mingo@...hat.com, will@...nel.org,
 longman@...hat.com, felipe_life@...e.com, daniel@...lak.dev,
 bjorn3_gh@...tonmail.com, daniel.almeida@...labora.com,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7 5/6] rust: ww_mutex: implement LockSet

On Fri, 21 Nov 2025 17:34:15 -0500
Lyude Paul <lyude@...hat.com> wrote:

> On Sat, 2025-11-01 at 19:10 +0300, Onur Özkan wrote:
> > `LockSet` is a high-level and safe API built on top of
> > ww_mutex, provides a simple API while keeping the ww_mutex
> > semantics.
> > 
> > When `EDEADLK` is hit, it drops all held locks, resets
> > the acquire context and retries the given (by the user)
> > locking algorithm until it succeeds.
> > 
> > Signed-off-by: Onur Özkan <work@...rozkan.dev>
> > ---
> >  rust/kernel/sync/lock/ww_mutex.rs          |   6 +
> >  rust/kernel/sync/lock/ww_mutex/lock_set.rs | 245
> > +++++++++++++++++++++ 2 files changed, 251 insertions(+)
> >  create mode 100644 rust/kernel/sync/lock/ww_mutex/lock_set.rs
> > 
> > diff --git a/rust/kernel/sync/lock/ww_mutex.rs
> > b/rust/kernel/sync/lock/ww_mutex.rs index
> > 2a9c1c20281b..d4c3b272912d 100644 ---
> > a/rust/kernel/sync/lock/ww_mutex.rs +++
> > b/rust/kernel/sync/lock/ww_mutex.rs @@ -5,6 +5,10 @@
> >  //! It is designed to avoid deadlocks when locking multiple
> > [`Mutex`]es //! that belong to the same [`Class`]. Each lock
> > acquisition uses an //! [`AcquireCtx`] to track ordering and ensure
> > forward progress. +//!
> > +//! It is recommended to use [`LockSet`] as it provides safe
> > high-level +//! interface that automatically handles deadlocks,
> > retries and context +//! management.
> >  
> >  use crate::error::to_result;
> >  use crate::prelude::*;
> > @@ -16,9 +20,11 @@
> >  
> >  pub use acquire_ctx::AcquireCtx;
> >  pub use class::Class;
> > +pub use lock_set::LockSet;
> >  
> >  mod acquire_ctx;
> >  mod class;
> > +mod lock_set;
> >  
> >  /// A wound-wait (ww) mutex that is powered with deadlock avoidance
> >  /// when acquiring multiple locks of the same [`Class`].
> > diff --git a/rust/kernel/sync/lock/ww_mutex/lock_set.rs
> > b/rust/kernel/sync/lock/ww_mutex/lock_set.rs new file mode 100644
> > index 000000000000..ae234fd1e0be
> > --- /dev/null
> > +++ b/rust/kernel/sync/lock/ww_mutex/lock_set.rs
> > @@ -0,0 +1,245 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +//! Provides [`LockSet`] which automatically detects [`EDEADLK`],
> > +//! releases all locks, resets the state and retries the user
> > +//! supplied locking algorithm until success.
> > +
> > +use super::{AcquireCtx, Class, Mutex};
> > +use crate::bindings;
> > +use crate::prelude::*;
> > +use crate::types::NotThreadSafe;
> > +use core::ptr::NonNull;
> > +
> > +/// A tracked set of [`Mutex`] locks acquired under the same
> > [`Class`]. +///
> > +/// It ensures proper cleanup and retry mechanism on deadlocks and
> > provides +/// safe access to locked data via
> > [`LockSet::with_locked`]. +///
> > +/// Typical usage is through [`LockSet::lock_all`], which retries a
> > +/// user supplied locking algorithm until it succeeds without
> > deadlock. +pub struct LockSet<'a> {
> > +    acquire_ctx: Pin<KBox<AcquireCtx<'a>>>,
> > +    taken: KVec<RawGuard>,
> > +    class: &'a Class,
> > +}
> > +
> > +/// Used by `LockSet` to track acquired locks.
> > +///
> > +/// This type is strictly crate-private and must never be exposed
> > +/// outside this crate.
> > +struct RawGuard {
> > +    mutex_ptr: NonNull<bindings::ww_mutex>,
> > +    _not_send: NotThreadSafe,
> > +}
> > +
> > +impl Drop for RawGuard {
> > +    fn drop(&mut self) {
> > +        // SAFETY: `mutex_ptr` originates from a locked `Mutex`
> > and remains
> > +        // valid for the lifetime of this guard, so unlocking here
> > is sound.
> > +        unsafe {
> > bindings::ww_mutex_unlock(self.mutex_ptr.as_ptr()) };
> > +    }
> > +}
> > +
> > +impl<'a> Drop for LockSet<'a> {
> > +    fn drop(&mut self) {
> > +        self.release_all_locks();
> > +    }
> > +}
> > +
> > +impl<'a> LockSet<'a> {
> > +    /// Creates a new [`LockSet`] with the given class.
> > +    ///
> > +    /// All locks taken through this [`LockSet`] must belong to the
> > +    /// same class.
> > +    pub fn new(class: &'a Class) -> Result<Self> {
> > +        Ok(Self {
> > +            acquire_ctx: KBox::pin_init(AcquireCtx::new(class),
> > GFP_KERNEL)?,
> > +            taken: KVec::new(),
> > +            class,
> > +        })
> > +    }
> > +
> > +    /// Creates a new [`LockSet`] using an existing [`AcquireCtx`]
> > and
> > +    /// [`Class`].
> > +    ///
> > +    /// # Safety
> > +    ///
> > +    /// The caller must ensure that `acquire_ctx` is properly
> > initialized,
> > +    /// holds no mutexes and that the provided `class` matches the
> > one used
> > +    /// to initialize the given `acquire_ctx`.
> > +    pub unsafe fn new_with_acquire_ctx(
> > +        acquire_ctx: Pin<KBox<AcquireCtx<'a>>>,
> > +        class: &'a Class,
> > +    ) -> Self {
> > +        Self {
> > +            acquire_ctx,
> > +            taken: KVec::new(),
> > +            class,
> > +        }
> > +    }
> > +
> > +    /// Attempts to lock a [`Mutex`] and records the guard.
> > +    ///
> > +    /// Returns [`EDEADLK`] if lock ordering would cause a
> > deadlock.
> > +    ///
> > +    /// Returns [`EBUSY`] if `mutex` was locked outside of this
> > [`LockSet`].
> > +    ///
> > +    /// # Safety
> > +    ///
> > +    /// The given `mutex` must be created with the [`Class`] that
> > was used
> > +    /// to initialize this [`LockSet`].
> > +    pub unsafe fn lock<T>(&mut self, mutex: &'a Mutex<'a, T>) ->
> > Result {
> > +        if mutex.is_locked()
> > +            && !self
> > +                .taken
> > +                .iter()
> > +                .any(|guard| guard.mutex_ptr.as_ptr() ==
> > mutex.inner.get())
> > +        {
> > +            return Err(EBUSY);
> > +        }
> 
> I don't think that we need or want to keep track of this - even for
> checking if we've acquired a lock already. The kernel already does
> this (from __ww_rt_mutex_lock()):
> 

The code is here self-documenting. It checks whether the mutex was
locked outside this context (which should never happen). This makes the
implementation easier to understand IMO.

> if (ww_ctx) {
> if (unlikely(ww_ctx == READ_ONCE(lock->ctx)))
> return -EALREADY;
> 
> /*
> * Reset the wounded flag after a kill. No other process can
> * race and wound us here, since they can't have a valid owner
> * pointer if we don't have any locks held.
> */
> if (ww_ctx->acquired == 0)
> ww_ctx->wounded = 0;
> 
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> nest_lock = &ww_ctx->dep_map;
> #endif
> }
> > +
> > +        // SAFETY: By the safety contract, `mutex` belongs to the
> > same `Class`
> > +        // as `self.acquire_ctx` does.
> > +        let guard = unsafe { self.acquire_ctx.lock(mutex)? };
> > +
> > +        self.taken.push(
> > +            RawGuard {
> > +                // SAFETY: We just locked it above so it's a valid
> > pointer.
> > +                mutex_ptr: unsafe {
> > NonNull::new_unchecked(guard.mutex.inner.get()) },
> > +                _not_send: NotThreadSafe,
> > +            },
> > +            GFP_KERNEL,
> > +        )?;
> > +
> > +        // Avoid unlocking here; `release_all_locks` (also run by
> > `Drop`)
> > +        // performs the unlock for `LockSet`.
> > +        core::mem::forget(guard);
> > +
> > +        Ok(())
> > +    }
> > +
> > +    /// Runs `locking_algorithm` until success with retrying on
> > deadlock.
> > +    ///
> > +    /// `locking_algorithm` should attempt to acquire all needed
> > locks.
> > +    /// If [`EDEADLK`] is detected, this function will roll back,
> > reset
> > +    /// the context and retry automatically.
> > +    ///
> > +    /// Once all locks are acquired successfully,
> > `on_all_locks_taken` is
> > +    /// invoked for exclusive access to the locked values.
> > Afterwards, all
> > +    /// locks are released.
> > +    ///
> > +    /// # Example
> > +    ///
> > +    /// ```
> > +    /// use kernel::alloc::KBox;
> > +    /// use kernel::c_str;
> > +    /// use kernel::prelude::*;
> > +    /// use kernel::sync::Arc;
> > +    /// use kernel::sync::lock::ww_mutex::{Class, LockSet, Mutex};
> > +    /// use pin_init::stack_pin_init;
> > +    ///
> > +    /// stack_pin_init!(let class =
> > Class::new_wound_wait(c_str!("test")));
> > +    ///
> > +    /// let mutex1 = Arc::pin_init(Mutex::new(0, &class),
> > GFP_KERNEL)?;
> > +    /// let mutex2 = Arc::pin_init(Mutex::new(0, &class),
> > GFP_KERNEL)?;
> > +    /// let mut lock_set = KBox::pin_init(LockSet::new(&class)?,
> > GFP_KERNEL)?;
> > +    ///
> > +    /// lock_set.lock_all(
> > +    ///     // `locking_algorithm` closure
> > +    ///     |lock_set| {
> > +    ///         // SAFETY: Both `lock_set` and `mutex1` uses the
> > same class.
> > +    ///         unsafe { lock_set.lock(&mutex1)? };
> > +    ///
> > +    ///         // SAFETY: Both `lock_set` and `mutex2` uses the
> > same class.
> > +    ///         unsafe { lock_set.lock(&mutex2)? };
> 
> I wonder if there's some way we can get rid of the safety contract
> here and verify this at compile time, it would be a shame if every
> single lock invocation needed to be unsafe.
> 

Yeah :(. We could get rid of them easily by keeping the class that was
passed to the constructor functions but that becomes a problem for the
from_raw implementations.

I think the best solution would be to expose ww_class type from
ww_acquire_ctx and ww_mutex unconditionally (right now it depends on
DEBUG_WW_MUTEXES). That way we can just access the class and verify
that the mutex and acquire_ctx classes match.

What do you think? I can submit a patch for the C-side implementation.
It should be straightforward and shouldn't have any runtime impact.

> > +    ///
> > +    ///         Ok(())
> > +    ///     },
> > +    ///     // `on_all_locks_taken` closure
> > +    ///     |lock_set| {
> > +    ///         // Safely mutate both values while holding the
> > locks.
> > +    ///         lock_set.with_locked(&mutex1, |v| *v += 1)?;
> > +    ///         lock_set.with_locked(&mutex2, |v| *v += 1)?;
> > +    ///
> > +    ///         Ok(())
> > +    ///     },
> 
> I'm still pretty confident we don't need or want both closures and
> can combine them into a single closure. And I am still pretty sure
> the only thing that needs to be tracked here is which lock we failed
> to acquire in the event of a deadlock.
> 
> Let me see if I can do a better job of explaining why. Or, if I'm
> actually wrong about this - maybe this will help you correct me and
> see where I've misunderstood something :).
> 
> First, let's pretend we've made a couple of changes here:
> 
>  * We remove `taken: KVec<RawGuard>` and replace it with `failed: *mut
>    Mutex<…>`
>  * lock_set.lock():
>     - Now returns a `Guard` that executes `ww_mutex_unlock` in its
> destructor
>     - If `ww_mutex_lock` fails due to -EDEADLK, this function stores
> a pointer to the respective mutex in `lock_set.failed`.
>     - Before acquiring a lock, we now check:
>        + if lock_set.failed == lock
>           * Return a Guard for lock without calling ww_mutex_lock()
>           * lock_set.failed = null_mut();
>  * We remove `on_all_locks_taken()`, and rename `locking_algorithm` to
>    `ww_cb`.
>  * If `ww_cb()` returns Err(EDEADLK):
>     - if !lock_set.failed.is_null()
>        + ww_mutex_lock(lock_set.failed) // Don't store a guard
>  * If `ww_cb()` returns Ok(…):
>     - if !lock_set.failed.is_null()
>       // This could only happen if we hit -EDEADLK but then `ww_cb`
> did not // re-acquire `lock_set.failed` on the next attempt
>        + ww_mutex_unlock(lock_set.failed)
> 
> With all of those changes, we can rewrite `ww_cb` to look like this:
> 
> |lock_set| {
> // SAFETY: Both `lock_set` and `mutex1` uses the same class.
> let g1 = unsafe { lock_set.lock(&mutex1)? };
> 
> // SAFETY: Both `lock_set` and `mutex2` uses the same class.
> let g2 = unsafe { lock_set.lock(&mutex2)? };
> 
> *g1 += 1;
> *g2 += 2;
> 
> Ok(())
> }
> 
> If we hit -EDEADLK when trying to acquire g2, this is more or less
> what would happen:
> 
>  * let res = ww_cb():
>     - let g1 = …; // (we acquire g1 successfully)
>     - let g2 = …; // (enter .lock())
>        + res = ww_mutex_lock(mutex2);
>        + if (res) == EDEADLK
>           * lock_set.failed = mutex2;
>        + return Err(EDEADLK);
>     - return Err(-EDEADLK);
>       // Exiting ww_cb(), so rust will drop all variables in this
> scope:
>        + ww_mutex_unlock(mutex1) // g1's Drop
> 
>  * // (res == Err(EDEADLK))
>    // All locks have been released at this point
> 
>  * if !lock_set.failed.is_null()
>     - ww_mutex_lock(lock_set.failed) // Don't create a guard
>    // We've now re-acquired the lock we dead-locked on
> 
>  * let res = ww_cb():
>     - let g1 = …; // (we acquire g1 successfully)
>     - let g2 = …; // (enter .lock())
>        + if lock_set.failed == lock
>           * lock_set.failed = null_mut();
>           * return Guard(…); // but don't call ww_mutex_lock(), it's
> already locked
>     - // We acquired g2 successfully!
>     - *g1 += 1;
>     - *g2 += 2;
> 
>  * etc…
> 
> The only challenge with this is that users need to write their ww_cb()
> implementations to be idempotent (so that calling it multiple times
> isn't unexpected). But that's already what we do on the C side, and
> is kind of what I expected we would want to do in rust anyhow.
> 
> Does this make sense, or was there something I made a mistake with
> here?

Thanks a lot for analyzing and providing an alternative on this!

However, collapsing everything into a single callback would require the
caller to self-police various disciplines like "don't touch gN until
gN+1 succeeded", which is exactly the foot-gun we are trying avoid with
2 closures.

Separating acquire and use logics not just simpler API to implement (and
provide), but also more effective compare to your example here. With
single closure we basically move API responsibility to the users (e.g.,
do not run this part of the code in the loop, do not access to any data
behind any guard if all the locks aren't taken yet, etc.), which is not
a good thing to do, especially from the high-level API.

> 
> > +    /// )?;
> > +    ///
> > +    /// # Ok::<(), Error>(())
> > +    /// ```
> > +    pub fn lock_all<T, Y, Z>(
> > +        &mut self,
> > +        mut locking_algorithm: T,
> > +        mut on_all_locks_taken: Y,
> > +    ) -> Result<Z>
> > +    where
> > +        T: FnMut(&mut LockSet<'a>) -> Result,
> > +        Y: FnMut(&mut LockSet<'a>) -> Result<Z>,
> > +    {
> > +        loop {
> > +            match locking_algorithm(self) {
> > +                Ok(()) => {
> > +                    // All locks in `locking_algorithm` succeeded.
> > +                    // The user can now safely use them in
> > `on_all_locks_taken`.
> > +                    let res = on_all_locks_taken(self);
> > +                    self.release_all_locks();
> > +
> > +                    return res;
> > +                }
> > +                Err(e) if e == EDEADLK => {
> > +                    // Deadlock detected, retry from scratch.
> > +                    self.cleanup_on_deadlock();
> > +                    continue;
> > +                }
> > +                Err(e) => {
> > +                    self.release_all_locks();
> > +                    return Err(e);
> > +                }
> > +            }
> > +        }
> > +    }
> > +
> > +    /// Executes `access` with a mutable reference to the data
> > behind `mutex`.
> > +    ///
> > +    /// Fails with [`EINVAL`] if the mutex was not locked in this
> > [`LockSet`].
> > +    pub fn with_locked<T: Unpin, Y>(
> > +        &mut self,
> > +        mutex: &'a Mutex<'a, T>,
> > +        access: impl for<'b> FnOnce(&'b mut T) -> Y,
> > +    ) -> Result<Y> {
> > +        let mutex_ptr = mutex.inner.get();
> > +
> > +        if self
> > +            .taken
> > +            .iter()
> > +            .any(|guard| guard.mutex_ptr.as_ptr() == mutex_ptr)
> > +        {
> > +            // SAFETY: We hold the lock corresponding to `mutex`,
> > so we have
> > +            // exclusive access to its protected data.
> > +            let value = unsafe { &mut *mutex.data.get() };
> > +            Ok(access(value))
> > +        } else {
> > +            // `mutex` isn't locked in this `LockSet`.
> > +            Err(EINVAL)
> > +        }
> > +    }
> > +
> > +    /// Releases all currently held locks in this [`LockSet`].
> > +    fn release_all_locks(&mut self) {
> > +        // `Drop` implementation of the `RawGuard` takes care of
> > the unlocking.
> > +        self.taken.clear();
> > +    }
> > +
> > +    /// Resets this [`LockSet`] after a deadlock detection.
> > +    ///
> > +    /// Drops all held locks and reinitializes the [`AcquireCtx`].
> > +    ///
> > +    /// It is intended to be used for internal implementation only.
> > +    fn cleanup_on_deadlock(&mut self) {
> > +        self.release_all_locks();
> > +
> > +        // SAFETY: We are passing the same `class` that was used
> > +        // to initialize `self.acquire_ctx`.
> > +        unsafe { self.acquire_ctx.as_mut().reinit(self.class) };
> > +    }
> > +}
>