[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <92f36f59a1c0437e9e4848cb86d1d756@AcuMS.aculab.com>
Date: Tue, 2 Jan 2024 23:32:05 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Boqun Feng' <boqun.feng@...il.com>
CC: 'Waiman Long' <longman@...hat.com>, "'linux-kernel@...r.kernel.org'"
<linux-kernel@...r.kernel.org>, "'peterz@...radead.org'"
<peterz@...radead.org>, "'mingo@...hat.com'" <mingo@...hat.com>,
"'will@...nel.org'" <will@...nel.org>, 'Linus Torvalds'
<torvalds@...ux-foundation.org>, "'xinhui.pan@...ux.vnet.ibm.com'"
<xinhui.pan@...ux.vnet.ibm.com>,
"'virtualization@...ts.linux-foundation.org'"
<virtualization@...ts.linux-foundation.org>, 'Zeng Heng'
<zengheng4@...wei.com>
Subject: RE: [PATCH next 2/5] locking/osq_lock: Avoid dirtying the local cpu's
'node' in the osq_lock() fast path.
From: Boqun Feng
> Sent: 02 January 2024 18:54
>
> On Sat, Dec 30, 2023 at 03:49:52PM +0000, David Laight wrote:
> [...]
> > But it looks odd that osq_unlock()'s fast path uses _release but the very
> > similar code in osq_wait_next() uses _acquire.
> >
>
> The _release in osq_unlock() is needed since unlocks are needed to be
> RELEASE so that lock+unlock can be a critical section (i.e. no memory
> accesses can escape). When osq_wait_next() is used in non unlock cases,
> the RELEASE is not required. As for the case where osq_wait_next() is
> used in osq_unlock(), there is a xchg() preceding it, which provides a
> full barrier, so things are fine.
I know there have been issues with ACQUIRE/RELEASE/FULL xchg in this code,
but are FULL xchg always needed on node->next?
> /me wonders whether we can relax the _acquire in osq_wait_next() into
> a _relaxed.
I wouldn't have worried about relaxed v release.
> > Indeed, apart from some (assumed) optimisations, I think osq_unlock()
> > could just be:
> > next = osq_wait_next(lock, this_cpu_ptr(&osq_node), 0);
> > if (next)
> > next->locked = 1;
> >
>
> If so we need to provide some sort of RELEASE semantics for the
> osq_unlock() in all the cases.
I wonder how often the unqueue code happens, and especially for
the last cpu in the list?
I'd only expect need_resched() to return true after spinning for
a while - in which case perhaps it is more likely that there are
a lot of cpu in the queue and the cpu being removed won't be last.
So osq_wait_next() exits on xchg(&node->next, NULL) != NULL
which is full barrier.
On a slightly different note I've also wondered if 'osq_node'
actually needs to be cache line aligned?
You definitely don't want it spanning 2 cache line, but I'd
expect that per-cpu data is mostly accessed by its own cpu?
So you really aren't going to get false sharing with some
other per-cpu data since the cpu is busy in this code.
So __aligned(16) would do?
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists