linux-kernel - Re: [GIT PULL] locking/urgent for v5.12

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <YIZ0HZGqLvU+VlYh@hirez.programming.kicks-ass.net>
Date:   Mon, 26 Apr 2021 10:04:45 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Waiman Long <llong@...hat.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Borislav Petkov <bp@...e.de>, Ali Saidi <alisaidi@...zon.com>,
        Steve Capper <steve.capper@....com>,
        Will Deacon <will@...nel.org>, x86-ml <x86@...nel.org>,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] locking/urgent for v5.12

On Sun, Apr 25, 2021 at 01:06:52PM -0400, Waiman Long wrote:
> On 4/25/21 12:39 PM, Linus Torvalds wrote:

> > > I'm assuming it's because of the switch to try_cmpxchg by PeterZ?
> 
> Yes, try_cmpxchg() requires a variable to hold the new value as well as a
> place to return the actual value before the cmpxchg(). It is just the way
> try_cmpxchg() works.

Right; by virtue of it returning a boolean, the value return needs to be
through a pointer argument.


> > > New confusion:
> > >   - Why is the truly non-critical cmpxchg using "try_cmpxhg()", when
> > >     the _first_ cmpxchg - above the loop - is not?

> At least for x86, try_cmpxchg() seems to produce a slight better assembly
> code than the regular cmpxchg(). I guess that may be one of the reason Peter
> changed it to use try_cmpxchg(). Another reason that I can think of is to
> make the code fit in one line instead of splitting it up into two lines like
> the original version from Ali.

Right, x86 generates slightly better asm (and potentially so for any
architecture that has CAS state in condition codes) while it's a wash
for other architectures (specifically we checked at the time arm64
didn't generate worse code).

> > > 
> > > Pre-existing confusion:
> > >   - Why is the code using "atomic_add()" to set a bit?
> > > 
> > > Yeah, yeah, neither of these are *bugs*, but Christ is that code hard
> > > to read. The "use add to set a bit" is valid because of the spinlock
> > > serialization (ie only one add can ever happen), and the
> > > cmpxchg-vs-try_cmpxchg confusion isn't buggy, it's just really really
> > > confusing that that same function is using two different - but
> > > equivalent - cmpxchg things on the same variable literally a couple of
> > > lines apart.
> As you have said, the spinlock serialization makes sure that only 1 writer
> is allowed to do that. I agree that using atomic_or() looks better in this
> case. Both of them are equivalent in this particular case.

Agreed, I think the reason is that because of the read-side doing the
BIAS add/sub, some of that snuck into the write side. AFAIK no arch
lacks the atomic_or() intrinsic. The one that's often an issue is
atomic_fetch_or() (x86 for one doesn't have it :/).

> > > I've pulled this, but can we please
> > > 
> > >   - make *both* of the cmpxchg's use "try_cmpxchg()" (and thus that
> > >     "cnts" variable)?
> Yes, we can certainly change the other cmpxchg() to try_cmpxchg().
> > > 
> > >   - add a comment about _why_ it's doing "atomic_add()" instead of the
> > >     much more logical "atomic_or()", and about how the spinlock serializes
> > >     it
> > > 
> > > I'm assuming the "atomic_add()" is simply because many more
> > > architectures have that as an actual intrinsic atomic. I understand.
> > > But it's really really not obvious from the code.
> > > 
> I will post a patch to make the suggested change to qrwlock.c.

Thanks.