lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9Jv9yL8x7/TAq/X@hirez.programming.kicks-ass.net>
Date:   Thu, 26 Jan 2023 13:20:07 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Hernan Ponce de Leon <hernan.poncedeleon@...weicloud.com>
Cc:     Waiman Long <longman@...hat.com>, paulmck@...nel.org,
        Arjan van de Ven <arjan@...ux.intel.com>, mingo@...hat.com,
        will@...nel.org, boqun.feng@...il.com, akpm@...l.org,
        tglx@...utronix.de, joel@...lfernandes.org,
        stern@...land.harvard.edu, diogo.behrens@...wei.com,
        jonas.oberhauser@...wei.com, linux-kernel@...r.kernel.org,
        Hernan Ponce de Leon <hernanl.leon@...wei.com>,
        stable@...r.kernel.org,
        Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
Subject: Re: [PATCH] Fix data race in mark_rt_mutex_waiters

On Thu, Jan 26, 2023 at 10:42:07AM +0100, Hernan Ponce de Leon wrote:
> On 1/24/2023 5:04 PM, Waiman Long wrote:
> > 
> > On 1/24/23 10:52, Peter Zijlstra wrote:
> > > On Tue, Jan 24, 2023 at 10:42:24AM -0500, Waiman Long wrote:
> > > 
> > > > I would suggest to do it as suggested by PeterZ. Instead of set_bit(),
> > > > however, it is probably better to use atomic_long_or() like
> > > > 
> > > > atomic_long_or_relaxed(RT_MUTEX_HAS_WAITERS, (atomic_long_t
> > > > *)&lock->owner)
> > > That function doesn't exist, atomic_long_or() is implicitly relaxed for
> > > not returning a value.
> > > 
> > You are right. atomic_long_or() doesn't have variants like some others.
> > 
> > Cheers,
> > Longman
> > 
> 
> When you say "replace the whole of that function", do you mean "barrier
> included"? I argue in the other email that I think this should not affect
> correctness (at least not obviously), but removing the barrier is doing more
> than just fixing the data race as this patch suggests.

Well, set_bit() implies smp_mb(), atomic_long_or() does not and would
need to retain the barrier.

That said, the comments are all batshit. The top comment states relaxed
ordering is suffient since holding lock, the comment with the barrier
utterly fails to explain what it's ordering against.

So all that would need to be updated as well.

That said, looking at 1c0908d8e441 I'm not at all sure we need that
barrier. Even in the try_to_take_rt_mutex(.waiter=NULL) case, where we
skip over the task->pi_lock region, rt_mutex_set_owner(.acquire=true)
will be ACQUIRE.

And in case of rt_mutex_owner(), we fail the trylock (return with 0) and
a failed trylock does not imply any ordering.

So why are we having this barrier?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ