[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YIZ0HZGqLvU+VlYh@hirez.programming.kicks-ass.net>
Date: Mon, 26 Apr 2021 10:04:45 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Waiman Long <llong@...hat.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Borislav Petkov <bp@...e.de>, Ali Saidi <alisaidi@...zon.com>,
Steve Capper <steve.capper@....com>,
Will Deacon <will@...nel.org>, x86-ml <x86@...nel.org>,
lkml <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] locking/urgent for v5.12
On Sun, Apr 25, 2021 at 01:06:52PM -0400, Waiman Long wrote:
> On 4/25/21 12:39 PM, Linus Torvalds wrote:
> > > I'm assuming it's because of the switch to try_cmpxchg by PeterZ?
>
> Yes, try_cmpxchg() requires a variable to hold the new value as well as a
> place to return the actual value before the cmpxchg(). It is just the way
> try_cmpxchg() works.
Right; by virtue of it returning a boolean, the value return needs to be
through a pointer argument.
> > > New confusion:
> > > - Why is the truly non-critical cmpxchg using "try_cmpxhg()", when
> > > the _first_ cmpxchg - above the loop - is not?
> At least for x86, try_cmpxchg() seems to produce a slight better assembly
> code than the regular cmpxchg(). I guess that may be one of the reason Peter
> changed it to use try_cmpxchg(). Another reason that I can think of is to
> make the code fit in one line instead of splitting it up into two lines like
> the original version from Ali.
Right, x86 generates slightly better asm (and potentially so for any
architecture that has CAS state in condition codes) while it's a wash
for other architectures (specifically we checked at the time arm64
didn't generate worse code).
> > >
> > > Pre-existing confusion:
> > > - Why is the code using "atomic_add()" to set a bit?
> > >
> > > Yeah, yeah, neither of these are *bugs*, but Christ is that code hard
> > > to read. The "use add to set a bit" is valid because of the spinlock
> > > serialization (ie only one add can ever happen), and the
> > > cmpxchg-vs-try_cmpxchg confusion isn't buggy, it's just really really
> > > confusing that that same function is using two different - but
> > > equivalent - cmpxchg things on the same variable literally a couple of
> > > lines apart.
> As you have said, the spinlock serialization makes sure that only 1 writer
> is allowed to do that. I agree that using atomic_or() looks better in this
> case. Both of them are equivalent in this particular case.
Agreed, I think the reason is that because of the read-side doing the
BIAS add/sub, some of that snuck into the write side. AFAIK no arch
lacks the atomic_or() intrinsic. The one that's often an issue is
atomic_fetch_or() (x86 for one doesn't have it :/).
> > > I've pulled this, but can we please
> > >
> > > - make *both* of the cmpxchg's use "try_cmpxchg()" (and thus that
> > > "cnts" variable)?
> Yes, we can certainly change the other cmpxchg() to try_cmpxchg().
> > >
> > > - add a comment about _why_ it's doing "atomic_add()" instead of the
> > > much more logical "atomic_or()", and about how the spinlock serializes
> > > it
> > >
> > > I'm assuming the "atomic_add()" is simply because many more
> > > architectures have that as an actual intrinsic atomic. I understand.
> > > But it's really really not obvious from the code.
> > >
> I will post a patch to make the suggested change to qrwlock.c.
Thanks.
Powered by blists - more mailing lists