[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2812bdc6-8d7e-48a3-8f5b-a26cd5d18c32@amd.com>
Date: Fri, 1 Aug 2025 10:39:08 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: John Stultz <jstultz@...gle.com>, LKML <linux-kernel@...r.kernel.org>
CC: <syzbot+602c4720aed62576cd79@...kaller.appspotmail.com>, Maarten Lankhorst
<maarten.lankhorst@...ux.intel.com>, Ingo Molnar <mingo@...hat.com>, Peter
Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, Vincent
Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann
<dietmar.eggemann@....com>, Valentin Schneider <valentin.schneider@....com>,
Suleiman Souhlal <suleiman@...gle.com>, <airlied@...il.com>,
<mripard@...nel.org>, <simona@...ll.ch>, <tzimmermann@...e.de>,
<dri-devel@...ts.freedesktop.org>, <kernel-team@...roid.com>
Subject: Re: [RFC][PATCH] locking: Fix __clear_task_blocked_on() warning from
__ww_mutex_wound() path
Hello John,
On 8/1/2025 1:43 AM, John Stultz wrote:
[..snip..]
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 40d2fa90df425..a9a78f51f7f57 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2166,15 +2166,16 @@ static inline void set_task_blocked_on(struct task_struct *p, struct mutex *m)
>
> static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *m)
> {
> - WARN_ON_ONCE(!m);
> - /* Currently we serialize blocked_on under the mutex::wait_lock */
> - lockdep_assert_held_once(&m->wait_lock);
> - /*
> - * There may be cases where we re-clear already cleared
> - * blocked_on relationships, but make sure we are not
> - * clearing the relationship with a different lock.
> - */
> - WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
> + if (m) {
> + /* Currently we serialize blocked_on under the mutex::wait_lock */
> + lockdep_assert_held_once(&m->wait_lock);
> + /*
> + * There may be cases where we re-clear already cleared
> + * blocked_on relationships, but make sure we are not
> + * clearing the relationship with a different lock.
> + */
> + WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
Small concern since we don't hold the "owner->blocked_on->wait_lock" here
when arriving from __ww_mutex_wound() as Hillf pointed out, can we run
into a situation like:
CPU0 CPU1
(Owner of Mutex A, (Trying to acquire Mutex A)
trying to acquire Mutex B)
========================== ===========================
__mutex_lock_common(B)
... /* B->wait_lock held */
set_task_blocked_on(ownerA, B)
if (__mutex_trylock(B)) /* Returns true */ __mutex_lock_common(A)
goto acquired; /* Goes to below point */ ... /* A->wait_lock held */
__clear_task_blocked_on(ownerA, B); __ww_mutex_wound(ownerA)
WARN_ON_ONCE(m /* Mutex B*/ ...
&& ownerA->blocked_on /* Mutex B */ __clear_task_blocked_on(ownerA, NULL)
... ownerA->blocked_on = NULL;
&& ownerA->blocked_on /* NULL */ != m /* Mutex B */);
!!! SPLAT !!!
At the very least I think we should make a local copy of "p->blocked_on"
to see a consistent view throughout __clear_task_blocked_on() - task either
sees it is blocked on the mutex and clear "p->blocked_on", or it sees it is
blocked on nothing and still clears "p->blocked_on".
(Tested lightly with syzbot's C reproducer)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 02c340450469..f35d93cca64f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2165,6 +2165,8 @@ static inline void set_task_blocked_on(struct task_struct *p, struct mutex *m)
static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *m)
{
if (m) {
+ struct mutex *blocked_on = p->blocked_on;
+
/* Currently we serialize blocked_on under the mutex::wait_lock */
lockdep_assert_held_once(&m->wait_lock);
/*
@@ -2172,7 +2174,7 @@ static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *
* blocked_on relationships, but make sure we are not
* clearing the relationship with a different lock.
*/
- WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
+ WARN_ON_ONCE(m && blocked_on && blocked_on != m);
}
p->blocked_on = NULL;
}
---
End result is the same, only that we avoid an unnecessary splat in this
very unlikely case and save ourselves some head scratching later :)
Thoughts?
> + }
> p->blocked_on = NULL;
> }
>
> diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
> index 086fd5487ca77..ef8ef3c28592c 100644
> --- a/kernel/locking/ww_mutex.h
> +++ b/kernel/locking/ww_mutex.h
> @@ -342,8 +342,12 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
> * When waking up the task to wound, be sure to clear the
> * blocked_on pointer. Otherwise we can see circular
> * blocked_on relationships that can't resolve.
> + *
> + * NOTE: We pass NULL here instead of lock, because we
> + * are waking the lock owner, who may be currently blocked
> + * on a different lock.
> */
> - __clear_task_blocked_on(owner, lock);
> + __clear_task_blocked_on(owner, NULL);
> wake_q_add(wake_q, owner);
> }
> return true;
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists