[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <FDF7EB09-BFCB-4E90-8D9F-8AA6E14B3D5B@collabora.com>
Date: Tue, 25 Nov 2025 16:01:08 -0300
From: Daniel Almeida <daniel.almeida@...labora.com>
To: Onur Özkan <work@...rozkan.dev>
Cc: Lyude Paul <lyude@...hat.com>,
rust-for-linux@...r.kernel.org,
lossin@...nel.org,
ojeda@...nel.org,
alex.gaynor@...il.com,
boqun.feng@...il.com,
gary@...yguo.net,
a.hindborg@...nel.org,
aliceryhl@...gle.com,
tmgross@...ch.edu,
dakr@...nel.org,
peterz@...radead.org,
mingo@...hat.com,
will@...nel.org,
longman@...hat.com,
felipe_life@...e.com,
daniel@...lak.dev,
bjorn3_gh@...tonmail.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7 5/6] rust: ww_mutex: implement LockSet
> On 24 Nov 2025, at 12:49, Onur Özkan <work@...rozkan.dev> wrote:
>
> On Fri, 21 Nov 2025 17:34:15 -0500
> Lyude Paul <lyude@...hat.com> wrote:
>
>> On Sat, 2025-11-01 at 19:10 +0300, Onur Özkan wrote:
>>> `LockSet` is a high-level and safe API built on top of
>>> ww_mutex, provides a simple API while keeping the ww_mutex
>>> semantics.
>>>
>>> When `EDEADLK` is hit, it drops all held locks, resets
>>> the acquire context and retries the given (by the user)
>>> locking algorithm until it succeeds.
>>>
>>> Signed-off-by: Onur Özkan <work@...rozkan.dev>
>>> ---
>>> rust/kernel/sync/lock/ww_mutex.rs | 6 +
>>> rust/kernel/sync/lock/ww_mutex/lock_set.rs | 245
>>> +++++++++++++++++++++ 2 files changed, 251 insertions(+)
>>> create mode 100644 rust/kernel/sync/lock/ww_mutex/lock_set.rs
>>>
>>> diff --git a/rust/kernel/sync/lock/ww_mutex.rs
>>> b/rust/kernel/sync/lock/ww_mutex.rs index
>>> 2a9c1c20281b..d4c3b272912d 100644 ---
>>> a/rust/kernel/sync/lock/ww_mutex.rs +++
>>> b/rust/kernel/sync/lock/ww_mutex.rs @@ -5,6 +5,10 @@
>>> //! It is designed to avoid deadlocks when locking multiple
>>> [`Mutex`]es //! that belong to the same [`Class`]. Each lock
>>> acquisition uses an //! [`AcquireCtx`] to track ordering and ensure
>>> forward progress. +//!
>>> +//! It is recommended to use [`LockSet`] as it provides safe
>>> high-level +//! interface that automatically handles deadlocks,
>>> retries and context +//! management.
>>>
>>> use crate::error::to_result;
>>> use crate::prelude::*;
>>> @@ -16,9 +20,11 @@
>>>
>>> pub use acquire_ctx::AcquireCtx;
>>> pub use class::Class;
>>> +pub use lock_set::LockSet;
>>>
>>> mod acquire_ctx;
>>> mod class;
>>> +mod lock_set;
>>>
>>> /// A wound-wait (ww) mutex that is powered with deadlock avoidance
>>> /// when acquiring multiple locks of the same [`Class`].
>>> diff --git a/rust/kernel/sync/lock/ww_mutex/lock_set.rs
>>> b/rust/kernel/sync/lock/ww_mutex/lock_set.rs new file mode 100644
>>> index 000000000000..ae234fd1e0be
>>> --- /dev/null
>>> +++ b/rust/kernel/sync/lock/ww_mutex/lock_set.rs
>>> @@ -0,0 +1,245 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +//! Provides [`LockSet`] which automatically detects [`EDEADLK`],
>>> +//! releases all locks, resets the state and retries the user
>>> +//! supplied locking algorithm until success.
>>> +
>>> +use super::{AcquireCtx, Class, Mutex};
>>> +use crate::bindings;
>>> +use crate::prelude::*;
>>> +use crate::types::NotThreadSafe;
>>> +use core::ptr::NonNull;
>>> +
>>> +/// A tracked set of [`Mutex`] locks acquired under the same
>>> [`Class`]. +///
>>> +/// It ensures proper cleanup and retry mechanism on deadlocks and
>>> provides +/// safe access to locked data via
>>> [`LockSet::with_locked`]. +///
>>> +/// Typical usage is through [`LockSet::lock_all`], which retries a
>>> +/// user supplied locking algorithm until it succeeds without
>>> deadlock. +pub struct LockSet<'a> {
>>> + acquire_ctx: Pin<KBox<AcquireCtx<'a>>>,
>>> + taken: KVec<RawGuard>,
>>> + class: &'a Class,
>>> +}
>>> +
>>> +/// Used by `LockSet` to track acquired locks.
>>> +///
>>> +/// This type is strictly crate-private and must never be exposed
>>> +/// outside this crate.
>>> +struct RawGuard {
>>> + mutex_ptr: NonNull<bindings::ww_mutex>,
>>> + _not_send: NotThreadSafe,
>>> +}
>>> +
>>> +impl Drop for RawGuard {
>>> + fn drop(&mut self) {
>>> + // SAFETY: `mutex_ptr` originates from a locked `Mutex`
>>> and remains
>>> + // valid for the lifetime of this guard, so unlocking here
>>> is sound.
>>> + unsafe {
>>> bindings::ww_mutex_unlock(self.mutex_ptr.as_ptr()) };
>>> + }
>>> +}
>>> +
>>> +impl<'a> Drop for LockSet<'a> {
>>> + fn drop(&mut self) {
>>> + self.release_all_locks();
>>> + }
>>> +}
>>> +
>>> +impl<'a> LockSet<'a> {
>>> + /// Creates a new [`LockSet`] with the given class.
>>> + ///
>>> + /// All locks taken through this [`LockSet`] must belong to the
>>> + /// same class.
>>> + pub fn new(class: &'a Class) -> Result<Self> {
>>> + Ok(Self {
>>> + acquire_ctx: KBox::pin_init(AcquireCtx::new(class),
>>> GFP_KERNEL)?,
>>> + taken: KVec::new(),
>>> + class,
>>> + })
>>> + }
>>> +
>>> + /// Creates a new [`LockSet`] using an existing [`AcquireCtx`]
>>> and
>>> + /// [`Class`].
>>> + ///
>>> + /// # Safety
>>> + ///
>>> + /// The caller must ensure that `acquire_ctx` is properly
>>> initialized,
>>> + /// holds no mutexes and that the provided `class` matches the
>>> one used
>>> + /// to initialize the given `acquire_ctx`.
>>> + pub unsafe fn new_with_acquire_ctx(
>>> + acquire_ctx: Pin<KBox<AcquireCtx<'a>>>,
>>> + class: &'a Class,
>>> + ) -> Self {
>>> + Self {
>>> + acquire_ctx,
>>> + taken: KVec::new(),
>>> + class,
>>> + }
>>> + }
>>> +
>>> + /// Attempts to lock a [`Mutex`] and records the guard.
>>> + ///
>>> + /// Returns [`EDEADLK`] if lock ordering would cause a
>>> deadlock.
>>> + ///
>>> + /// Returns [`EBUSY`] if `mutex` was locked outside of this
>>> [`LockSet`].
>>> + ///
>>> + /// # Safety
>>> + ///
>>> + /// The given `mutex` must be created with the [`Class`] that
>>> was used
>>> + /// to initialize this [`LockSet`].
>>> + pub unsafe fn lock<T>(&mut self, mutex: &'a Mutex<'a, T>) ->
>>> Result {
>>> + if mutex.is_locked()
>>> + && !self
>>> + .taken
>>> + .iter()
>>> + .any(|guard| guard.mutex_ptr.as_ptr() ==
>>> mutex.inner.get())
>>> + {
>>> + return Err(EBUSY);
>>> + }
>>
>> I don't think that we need or want to keep track of this - even for
>> checking if we've acquired a lock already. The kernel already does
>> this (from __ww_rt_mutex_lock()):
>>
>
> The code is here self-documenting. It checks whether the mutex was
> locked outside this context (which should never happen). This makes the
> implementation easier to understand IMO.
>
>> if (ww_ctx) {
>> if (unlikely(ww_ctx == READ_ONCE(lock->ctx)))
>> return -EALREADY;
>>
>> /*
>> * Reset the wounded flag after a kill. No other process can
>> * race and wound us here, since they can't have a valid owner
>> * pointer if we don't have any locks held.
>> */
>> if (ww_ctx->acquired == 0)
>> ww_ctx->wounded = 0;
>>
>> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>> nest_lock = &ww_ctx->dep_map;
>> #endif
>> }
>>> +
>>> + // SAFETY: By the safety contract, `mutex` belongs to the
>>> same `Class`
>>> + // as `self.acquire_ctx` does.
>>> + let guard = unsafe { self.acquire_ctx.lock(mutex)? };
>>> +
>>> + self.taken.push(
>>> + RawGuard {
>>> + // SAFETY: We just locked it above so it's a valid
>>> pointer.
>>> + mutex_ptr: unsafe {
>>> NonNull::new_unchecked(guard.mutex.inner.get()) },
>>> + _not_send: NotThreadSafe,
>>> + },
>>> + GFP_KERNEL,
>>> + )?;
>>> +
>>> + // Avoid unlocking here; `release_all_locks` (also run by
>>> `Drop`)
>>> + // performs the unlock for `LockSet`.
>>> + core::mem::forget(guard);
>>> +
>>> + Ok(())
>>> + }
>>> +
>>> + /// Runs `locking_algorithm` until success with retrying on
>>> deadlock.
>>> + ///
>>> + /// `locking_algorithm` should attempt to acquire all needed
>>> locks.
>>> + /// If [`EDEADLK`] is detected, this function will roll back,
>>> reset
>>> + /// the context and retry automatically.
>>> + ///
>>> + /// Once all locks are acquired successfully,
>>> `on_all_locks_taken` is
>>> + /// invoked for exclusive access to the locked values.
>>> Afterwards, all
>>> + /// locks are released.
>>> + ///
>>> + /// # Example
>>> + ///
>>> + /// ```
>>> + /// use kernel::alloc::KBox;
>>> + /// use kernel::c_str;
>>> + /// use kernel::prelude::*;
>>> + /// use kernel::sync::Arc;
>>> + /// use kernel::sync::lock::ww_mutex::{Class, LockSet, Mutex};
>>> + /// use pin_init::stack_pin_init;
>>> + ///
>>> + /// stack_pin_init!(let class =
>>> Class::new_wound_wait(c_str!("test")));
>>> + ///
>>> + /// let mutex1 = Arc::pin_init(Mutex::new(0, &class),
>>> GFP_KERNEL)?;
>>> + /// let mutex2 = Arc::pin_init(Mutex::new(0, &class),
>>> GFP_KERNEL)?;
>>> + /// let mut lock_set = KBox::pin_init(LockSet::new(&class)?,
>>> GFP_KERNEL)?;
>>> + ///
>>> + /// lock_set.lock_all(
>>> + /// // `locking_algorithm` closure
>>> + /// |lock_set| {
>>> + /// // SAFETY: Both `lock_set` and `mutex1` uses the
>>> same class.
>>> + /// unsafe { lock_set.lock(&mutex1)? };
>>> + ///
>>> + /// // SAFETY: Both `lock_set` and `mutex2` uses the
>>> same class.
>>> + /// unsafe { lock_set.lock(&mutex2)? };
>>
>> I wonder if there's some way we can get rid of the safety contract
>> here and verify this at compile time, it would be a shame if every
>> single lock invocation needed to be unsafe.
>>
>
> Yeah :(. We could get rid of them easily by keeping the class that was
> passed to the constructor functions but that becomes a problem for the
> from_raw implementations.
Then let’s make from_raw actually unsafe and have the user pass the class
as an argument there. Would that work?
That was the first thing that I noticed about this patch as well: it’s unfortunate
that locking is unsafe.
>
> I think the best solution would be to expose ww_class type from
> ww_acquire_ctx and ww_mutex unconditionally (right now it depends on
> DEBUG_WW_MUTEXES). That way we can just access the class and verify
> that the mutex and acquire_ctx classes match.
>
> What do you think? I can submit a patch for the C-side implementation.
> It should be straightforward and shouldn't have any runtime impact.
>
>>> + ///
>>> + /// Ok(())
>>> + /// },
>>> + /// // `on_all_locks_taken` closure
>>> + /// |lock_set| {
>>> + /// // Safely mutate both values while holding the
>>> locks.
>>> + /// lock_set.with_locked(&mutex1, |v| *v += 1)?;
>>> + /// lock_set.with_locked(&mutex2, |v| *v += 1)?;
>>> + ///
>>> + /// Ok(())
>>> + /// },
>>
>> I'm still pretty confident we don't need or want both closures and
>> can combine them into a single closure. And I am still pretty sure
>> the only thing that needs to be tracked here is which lock we failed
>> to acquire in the event of a deadlock.
>>
>> Let me see if I can do a better job of explaining why. Or, if I'm
>> actually wrong about this - maybe this will help you correct me and
>> see where I've misunderstood something :).
>>
>> First, let's pretend we've made a couple of changes here:
>>
>> * We remove `taken: KVec<RawGuard>` and replace it with `failed: *mut
>> Mutex<…>`
>> * lock_set.lock():
>> - Now returns a `Guard` that executes `ww_mutex_unlock` in its
>> destructor
>> - If `ww_mutex_lock` fails due to -EDEADLK, this function stores
>> a pointer to the respective mutex in `lock_set.failed`.
>> - Before acquiring a lock, we now check:
>> + if lock_set.failed == lock
>> * Return a Guard for lock without calling ww_mutex_lock()
>> * lock_set.failed = null_mut();
>> * We remove `on_all_locks_taken()`, and rename `locking_algorithm` to
>> `ww_cb`.
>> * If `ww_cb()` returns Err(EDEADLK):
>> - if !lock_set.failed.is_null()
>> + ww_mutex_lock(lock_set.failed) // Don't store a guard
>> * If `ww_cb()` returns Ok(…):
>> - if !lock_set.failed.is_null()
>> // This could only happen if we hit -EDEADLK but then `ww_cb`
>> did not // re-acquire `lock_set.failed` on the next attempt
>> + ww_mutex_unlock(lock_set.failed)
>>
>> With all of those changes, we can rewrite `ww_cb` to look like this:
>>
>> |lock_set| {
>> // SAFETY: Both `lock_set` and `mutex1` uses the same class.
>> let g1 = unsafe { lock_set.lock(&mutex1)? };
>>
>> // SAFETY: Both `lock_set` and `mutex2` uses the same class.
>> let g2 = unsafe { lock_set.lock(&mutex2)? };
>>
>> *g1 += 1;
>> *g2 += 2;
>>
>> Ok(())
>> }
>>
>> If we hit -EDEADLK when trying to acquire g2, this is more or less
>> what would happen:
>>
>> * let res = ww_cb():
>> - let g1 = …; // (we acquire g1 successfully)
>> - let g2 = …; // (enter .lock())
>> + res = ww_mutex_lock(mutex2);
>> + if (res) == EDEADLK
>> * lock_set.failed = mutex2;
>> + return Err(EDEADLK);
>> - return Err(-EDEADLK);
>> // Exiting ww_cb(), so rust will drop all variables in this
>> scope:
>> + ww_mutex_unlock(mutex1) // g1's Drop
>>
>> * // (res == Err(EDEADLK))
>> // All locks have been released at this point
>>
>> * if !lock_set.failed.is_null()
>> - ww_mutex_lock(lock_set.failed) // Don't create a guard
>> // We've now re-acquired the lock we dead-locked on
>>
>> * let res = ww_cb():
>> - let g1 = …; // (we acquire g1 successfully)
>> - let g2 = …; // (enter .lock())
>> + if lock_set.failed == lock
>> * lock_set.failed = null_mut();
>> * return Guard(…); // but don't call ww_mutex_lock(), it's
>> already locked
>> - // We acquired g2 successfully!
>> - *g1 += 1;
>> - *g2 += 2;
>>
>> * etc…
>>
>> The only challenge with this is that users need to write their ww_cb()
>> implementations to be idempotent (so that calling it multiple times
>> isn't unexpected). But that's already what we do on the C side, and
>> is kind of what I expected we would want to do in rust anyhow.
>>
>> Does this make sense, or was there something I made a mistake with
>> here?
>
> Thanks a lot for analyzing and providing an alternative on this!
>
> However, collapsing everything into a single callback would require the
> caller to self-police various disciplines like "don't touch gN until
> gN+1 succeeded", which is exactly the foot-gun we are trying avoid with
> 2 closures.
>
> Separating acquire and use logics not just simpler API to implement (and
> provide), but also more effective compare to your example here. With
> single closure we basically move API responsibility to the users (e.g.,
> do not run this part of the code in the loop, do not access to any data
> behind any guard if all the locks aren't taken yet, etc.), which is not
> a good thing to do, especially from the high-level API.
+1
IMHO, we went from an API that’s simple to understand, to something
much more elaborate. I’m not sure I see the benefit. I specifically think
that having a separate function (i.e.: closure) that says “ok, all locks
are taken now” is actually helpful for users.
>
>>
>>> + /// )?;
>>> + ///
>>> + /// # Ok::<(), Error>(())
>>> + /// ```
>>> + pub fn lock_all<T, Y, Z>(
>>> + &mut self,
>>> + mut locking_algorithm: T,
>>> + mut on_all_locks_taken: Y,
>>> + ) -> Result<Z>
>>> + where
>>> + T: FnMut(&mut LockSet<'a>) -> Result,
>>> + Y: FnMut(&mut LockSet<'a>) -> Result<Z>,
>>> + {
>>> + loop {
>>> + match locking_algorithm(self) {
>>> + Ok(()) => {
>>> + // All locks in `locking_algorithm` succeeded.
>>> + // The user can now safely use them in
>>> `on_all_locks_taken`.
>>> + let res = on_all_locks_taken(self);
>>> + self.release_all_locks();
>>> +
>>> + return res;
>>> + }
>>> + Err(e) if e == EDEADLK => {
>>> + // Deadlock detected, retry from scratch.
>>> + self.cleanup_on_deadlock();
>>> + continue;
>>> + }
>>> + Err(e) => {
>>> + self.release_all_locks();
>>> + return Err(e);
>>> + }
>>> + }
>>> + }
>>> + }
>>> +
>>> + /// Executes `access` with a mutable reference to the data
>>> behind `mutex`.
>>> + ///
>>> + /// Fails with [`EINVAL`] if the mutex was not locked in this
>>> [`LockSet`].
>>> + pub fn with_locked<T: Unpin, Y>(
>>> + &mut self,
>>> + mutex: &'a Mutex<'a, T>,
>>> + access: impl for<'b> FnOnce(&'b mut T) -> Y,
>>> + ) -> Result<Y> {
>>> + let mutex_ptr = mutex.inner.get();
>>> +
>>> + if self
>>> + .taken
>>> + .iter()
>>> + .any(|guard| guard.mutex_ptr.as_ptr() == mutex_ptr)
>>> + {
>>> + // SAFETY: We hold the lock corresponding to `mutex`,
>>> so we have
>>> + // exclusive access to its protected data.
>>> + let value = unsafe { &mut *mutex.data.get() };
>>> + Ok(access(value))
>>> + } else {
>>> + // `mutex` isn't locked in this `LockSet`.
>>> + Err(EINVAL)
>>> + }
>>> + }
>>> +
>>> + /// Releases all currently held locks in this [`LockSet`].
>>> + fn release_all_locks(&mut self) {
>>> + // `Drop` implementation of the `RawGuard` takes care of
>>> the unlocking.
>>> + self.taken.clear();
>>> + }
>>> +
>>> + /// Resets this [`LockSet`] after a deadlock detection.
>>> + ///
>>> + /// Drops all held locks and reinitializes the [`AcquireCtx`].
>>> + ///
>>> + /// It is intended to be used for internal implementation only.
>>> + fn cleanup_on_deadlock(&mut self) {
>>> + self.release_all_locks();
>>> +
>>> + // SAFETY: We are passing the same `class` that was used
>>> + // to initialize `self.acquire_ctx`.
>>> + unsafe { self.acquire_ctx.as_mut().reinit(self.class) };
>>> + }
>>> +}
Powered by blists - more mailing lists