lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <14ff89fd-d308-47e4-8c3e-157d19f933f3@linux.intel.com>
Date: Fri, 1 Aug 2025 09:09:25 +0200
From: Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>
To: K Prateek Nayak <kprateek.nayak@....com>, John Stultz
 <jstultz@...gle.com>, LKML <linux-kernel@...r.kernel.org>
Cc: syzbot+602c4720aed62576cd79@...kaller.appspotmail.com,
 Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Valentin Schneider <valentin.schneider@....com>,
 Suleiman Souhlal <suleiman@...gle.com>, airlied@...il.com,
 mripard@...nel.org, simona@...ll.ch, tzimmermann@...e.de,
 dri-devel@...ts.freedesktop.org, kernel-team@...roid.com
Subject: Re: [RFC][PATCH] locking: Fix __clear_task_blocked_on() warning from
 __ww_mutex_wound() path



Den 2025-08-01 kl. 07:09, skrev K Prateek Nayak:
> Hello John,
> 
> On 8/1/2025 1:43 AM, John Stultz wrote:
> 
> [..snip..]
> 
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 40d2fa90df425..a9a78f51f7f57 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -2166,15 +2166,16 @@ static inline void set_task_blocked_on(struct task_struct *p, struct mutex *m)
>>  
>>  static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *m)
>>  {
>> -	WARN_ON_ONCE(!m);
>> -	/* Currently we serialize blocked_on under the mutex::wait_lock */
>> -	lockdep_assert_held_once(&m->wait_lock);
>> -	/*
>> -	 * There may be cases where we re-clear already cleared
>> -	 * blocked_on relationships, but make sure we are not
>> -	 * clearing the relationship with a different lock.
>> -	 */
>> -	WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
>> +	if (m) {
>> +		/* Currently we serialize blocked_on under the mutex::wait_lock */
>> +		lockdep_assert_held_once(&m->wait_lock);
>> +		/*
>> +		 * There may be cases where we re-clear already cleared
>> +		 * blocked_on relationships, but make sure we are not
>> +		 * clearing the relationship with a different lock.
>> +		 */
>> +		WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
> 
> Small concern since we don't hold the "owner->blocked_on->wait_lock" here
> when arriving from __ww_mutex_wound() as Hillf pointed out, can we run
> into a situation like:
> 
>               CPU0                                                               CPU1
>         (Owner of Mutex A,                                              (Trying to acquire Mutex A)
>     trying to acquire Mutex B)
>     ==========================                                          ===========================
> 
>     __mutex_lock_common(B)
>       ... /* B->wait_lock held */
>       set_task_blocked_on(ownerA, B)
>       if (__mutex_trylock(B)) /* Returns true */                        __mutex_lock_common(A)
>         goto acquired; /* Goes to below point */                          ... /* A->wait_lock held */
>       __clear_task_blocked_on(ownerA, B);                                 __ww_mutex_wound(ownerA)
>         WARN_ON_ONCE(m /* Mutex B*/                                         ...
>                      && ownerA->blocked_on /* Mutex B */                    __clear_task_blocked_on(ownerA, NULL)
>                      ...                                                      ownerA->blocked_on = NULL;
>                      && ownerA->blocked_on /* NULL */ != m /* Mutex B */);
>           !!! SPLAT !!!
> 
> 
> At the very least I think we should make a local copy of "p->blocked_on"
> to see a consistent view throughout __clear_task_blocked_on() - task either
> sees it is blocked on the mutex and clear "p->blocked_on", or it sees it is
> blocked on nothing and still clears "p->blocked_on".
> 
> (Tested lightly with syzbot's C reproducer)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 02c340450469..f35d93cca64f 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2165,6 +2165,8 @@ static inline void set_task_blocked_on(struct task_struct *p, struct mutex *m)
>  static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *m)
>  {
>  	if (m) {
> +		struct mutex *blocked_on = p->blocked_on;
> +
>  		/* Currently we serialize blocked_on under the mutex::wait_lock */
>  		lockdep_assert_held_once(&m->wait_lock);
>  		/*
> @@ -2172,7 +2174,7 @@ static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *
>  		 * blocked_on relationships, but make sure we are not
>  		 * clearing the relationship with a different lock.
>  		 */
> -		WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
> +		WARN_ON_ONCE(m && blocked_on && blocked_on != m);
>  	}
>  	p->blocked_on = NULL;
>  }
> ---
> 
> End result is the same, only that we avoid an unnecessary splat in this
> very unlikely case and save ourselves some head scratching later :)
> 
> Thoughts?
If this is required, than it should be blocked_on = READ_ONCE(p->blocked_on);

Also the WARN_ON_ONCE() can have the "m && " part taken out because it's always true now.
> 
>> +	}
>>  	p->blocked_on = NULL;
>>  }
>>  
>> diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
>> index 086fd5487ca77..ef8ef3c28592c 100644
>> --- a/kernel/locking/ww_mutex.h
>> +++ b/kernel/locking/ww_mutex.h
>> @@ -342,8 +342,12 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
>>  			 * When waking up the task to wound, be sure to clear the
>>  			 * blocked_on pointer. Otherwise we can see circular
>>  			 * blocked_on relationships that can't resolve.
>> +			 *
>> +			 * NOTE: We pass NULL here instead of lock, because we
>> +			 * are waking the lock owner, who may be currently blocked
>> +			 * on a different lock.
>>  			 */
>> -			__clear_task_blocked_on(owner, lock);
>> +			__clear_task_blocked_on(owner, NULL);
>>  			wake_q_add(wake_q, owner);
>>  		}
>>  		return true;
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ