linux-kernel - Re: [PATCH] Fix data race in mark_rt_mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9da70674-42e0-9aaa-edab-c606ca8dd2e8@huaweicloud.com>
Date:   Thu, 26 Jan 2023 22:07:14 +0100
From:   Hernan Ponce de Leon <hernan.poncedeleon@...weicloud.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Waiman Long <longman@...hat.com>, paulmck@...nel.org,
        Arjan van de Ven <arjan@...ux.intel.com>, mingo@...hat.com,
        will@...nel.org, boqun.feng@...il.com, akpm@...l.org,
        tglx@...utronix.de, joel@...lfernandes.org,
        stern@...land.harvard.edu, diogo.behrens@...wei.com,
        jonas.oberhauser@...wei.com, linux-kernel@...r.kernel.org,
        Hernan Ponce de Leon <hernanl.leon@...wei.com>,
        stable@...r.kernel.org,
        Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
Subject: Re: [PATCH] Fix data race in mark_rt_mutex_waiters

On 1/26/2023 1:20 PM, Peter Zijlstra wrote:
> On Thu, Jan 26, 2023 at 10:42:07AM +0100, Hernan Ponce de Leon wrote:
>> On 1/24/2023 5:04 PM, Waiman Long wrote:
>>>
>>> On 1/24/23 10:52, Peter Zijlstra wrote:
>>>> On Tue, Jan 24, 2023 at 10:42:24AM -0500, Waiman Long wrote:
>>>>
>>>>> I would suggest to do it as suggested by PeterZ. Instead of set_bit(),
>>>>> however, it is probably better to use atomic_long_or() like
>>>>>
>>>>> atomic_long_or_relaxed(RT_MUTEX_HAS_WAITERS, (atomic_long_t
>>>>> *)&lock->owner)
>>>> That function doesn't exist, atomic_long_or() is implicitly relaxed for
>>>> not returning a value.
>>>>
>>> You are right. atomic_long_or() doesn't have variants like some others.
>>>
>>> Cheers,
>>> Longman
>>>
>>
>> When you say "replace the whole of that function", do you mean "barrier
>> included"? I argue in the other email that I think this should not affect
>> correctness (at least not obviously), but removing the barrier is doing more
>> than just fixing the data race as this patch suggests.
> 
> Well, set_bit() implies smp_mb(), atomic_long_or() does not and would
> need to retain the barrier.
> 
> That said, the comments are all batshit. The top comment states relaxed
> ordering is suffient since holding lock, the comment with the barrier
> utterly fails to explain what it's ordering against.

I think the top comment became obsolete after 1c0908d8e441 and this just 
went unnoticed. I agree the comment with the barrier does not say much 
and getting some more detailed information was one of the goals of my 
other email.

> 
> So all that would need to be updated as well.
> 
> That said, looking at 1c0908d8e441 I'm not at all sure we need that
> barrier. Even in the try_to_take_rt_mutex(.waiter=NULL) case, where we
> skip over the task->pi_lock region, rt_mutex_set_owner(.acquire=true)
> will be ACQUIRE.

This sentence states in a clear way the idea I was trying to express in 
my other email about why the barrier is not necessary. I think the same 
argument holds if we keep the barrier and relax the store in 
rt_mutex_set_owner as suggested by Boqun (see patch below).

> 
> And in case of rt_mutex_owner(), we fail the trylock (return with 0) and
> a failed trylock does not imply any ordering.
> 
> So why are we having this barrier?

I run again the verification with the following patch (I am aware the 
comments still need to be updated, this was just to be able to run the 
tool) and the tool still finds no violation.

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 010cf4e6d0b8..c62e409906a2 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -107,7 +107,7 @@ rt_mutex_set_owner(struct rt_mutex_base *lock, 
struct task_struct *owner)
  	 * lock->wait_lock is held but explicit acquire semantics are needed
  	 * for a new lock owner so WRITE_ONCE is insufficient.
  	 */
-	xchg_acquire(&lock->owner, rt_mutex_owner_encode(lock, owner));
+	WRITE_ONCE(lock->owner, rt_mutex_owner_encode(lock, owner));
  }

  static __always_inline void rt_mutex_clear_owner(struct rt_mutex_base 
*lock)
@@ -232,12 +232,7 @@ static __always_inline bool 
rt_mutex_cmpxchg_release(struct rt_mutex_base *lock,
   */
  static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base 
*lock)
  {
-	unsigned long owner, *p = (unsigned long *) &lock->owner;
-
-	do {
-		owner = *p;
-	} while (cmpxchg_relaxed(p, owner,
-				 owner | RT_MUTEX_HAS_WAITERS) != owner);
+	atomic_long_or(RT_MUTEX_HAS_WAITERS, (atomic_long_t *)&lock->owner);

  	/*
  	 * The cmpxchg loop above is relaxed to avoid back-to-back ACQUIRE
--