linux-kernel - RE: [PATCH next 2/5] locking/osq_lock: Avoid dirtying the local cpu's 'node' in the osq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <92f36f59a1c0437e9e4848cb86d1d756@AcuMS.aculab.com>
Date: Tue, 2 Jan 2024 23:32:05 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Boqun Feng' <boqun.feng@...il.com>
CC: 'Waiman Long' <longman@...hat.com>, "'linux-kernel@...r.kernel.org'"
	<linux-kernel@...r.kernel.org>, "'peterz@...radead.org'"
	<peterz@...radead.org>, "'mingo@...hat.com'" <mingo@...hat.com>,
	"'will@...nel.org'" <will@...nel.org>, 'Linus Torvalds'
	<torvalds@...ux-foundation.org>, "'xinhui.pan@...ux.vnet.ibm.com'"
	<xinhui.pan@...ux.vnet.ibm.com>,
	"'virtualization@...ts.linux-foundation.org'"
	<virtualization@...ts.linux-foundation.org>, 'Zeng Heng'
	<zengheng4@...wei.com>
Subject: RE: [PATCH next 2/5] locking/osq_lock: Avoid dirtying the local cpu's
 'node' in the osq_lock() fast path.

From: Boqun Feng
> Sent: 02 January 2024 18:54
> 
> On Sat, Dec 30, 2023 at 03:49:52PM +0000, David Laight wrote:
> [...]
> > But it looks odd that osq_unlock()'s fast path uses _release but the very
> > similar code in osq_wait_next() uses _acquire.
> >
> 
> The _release in osq_unlock() is needed since unlocks are needed to be
> RELEASE so that lock+unlock can be a critical section (i.e. no memory
> accesses can escape). When osq_wait_next() is used in non unlock cases,
> the RELEASE is not required. As for the case where osq_wait_next() is
> used in osq_unlock(), there is a xchg() preceding it, which provides a
> full barrier, so things are fine.

I know there have been issues with ACQUIRE/RELEASE/FULL xchg in this code,
but are FULL xchg always needed on node->next?

> /me wonders whether we can relax the _acquire in osq_wait_next() into
> a _relaxed.

I wouldn't have worried about relaxed v release.

> > Indeed, apart from some (assumed) optimisations, I think osq_unlock()
> > could just be:
> > 	next = osq_wait_next(lock, this_cpu_ptr(&osq_node), 0);
> > 	if (next)
> > 		next->locked = 1;
> >
> 
> If so we need to provide some sort of RELEASE semantics for the
> osq_unlock() in all the cases.

I wonder how often the unqueue code happens, and especially for
the last cpu in the list?
I'd only expect need_resched() to return true after spinning for
a while - in which case perhaps it is more likely that there are
a lot of cpu in the queue and the cpu being removed won't be last.
So osq_wait_next() exits on xchg(&node->next, NULL) != NULL
which is full barrier.

On a slightly different note I've also wondered if 'osq_node'
actually needs to be cache line aligned?
You definitely don't want it spanning 2 cache line, but I'd
expect that per-cpu data is mostly accessed by its own cpu?
So you really aren't going to get false sharing with some
other per-cpu data since the cpu is busy in this code.
So __aligned(16) would do?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)