[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fedbf107b9344e7b85711c0de59ae0a3@AcuMS.aculab.com>
Date: Sat, 30 Dec 2023 15:49:52 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Waiman Long' <longman@...hat.com>, "'linux-kernel@...r.kernel.org'"
<linux-kernel@...r.kernel.org>, "'peterz@...radead.org'"
<peterz@...radead.org>
CC: "'mingo@...hat.com'" <mingo@...hat.com>, "'will@...nel.org'"
<will@...nel.org>, "'boqun.feng@...il.com'" <boqun.feng@...il.com>, "'Linus
Torvalds'" <torvalds@...ux-foundation.org>, "'xinhui.pan@...ux.vnet.ibm.com'"
<xinhui.pan@...ux.vnet.ibm.com>,
"'virtualization@...ts.linux-foundation.org'"
<virtualization@...ts.linux-foundation.org>, 'Zeng Heng'
<zengheng4@...wei.com>
Subject: RE: [PATCH next 2/5] locking/osq_lock: Avoid dirtying the local cpu's
'node' in the osq_lock() fast path.
From: Waiman Long
> Sent: 30 December 2023 03:20
>
> On 12/29/23 17:11, David Laight wrote:
> > osq_lock() starts by setting node->next to NULL and node->locked to 0.
> > Careful analysis shows that node->next is always NULL on entry.
> >
> > node->locked is set non-zero by another cpu to force a wakeup.
> > This can only happen after the 'prev->next = node' assignment,
> > so locked can be set to zero just before that (along with the assignment
> > to node->prev).
> >
> > Only initialise node->cpu once, after that use its value instead
> > of smp_processor_id() - which is probably a real function call.
> >
> > Should reduce cache-line bouncing a little.
> >
> > Signed-off-by: David Laight <david.laight@...lab.com>
> > ---
> >
> > Re-send without the 'RE:' on the subject line.
> > kernel/locking/osq_lock.c | 13 ++++++-------
> > 1 file changed, 6 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > index d414eef4bec6..55f5db896c02 100644
> > --- a/kernel/locking/osq_lock.c
> > +++ b/kernel/locking/osq_lock.c
> > @@ -51,7 +51,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
> > struct optimistic_spin_node *prev)
> > {
> > struct optimistic_spin_node *next = NULL;
> > - int curr = encode_cpu(smp_processor_id());
> > + int curr = node->cpu;
> > int old;
> >
> > /*
> > @@ -98,12 +98,10 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > {
> > struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> > struct optimistic_spin_node *prev, *next;
> > - int curr = encode_cpu(smp_processor_id());
> > int old;
> >
> > - node->locked = 0;
> > - node->next = NULL;
>
> I have some concern about not clearing node->next at the beginning. I
> know that next is supposed to be cleared at the end. I do have some
> worry that there may exist a race condition that leaves next not cleared
> before it is used again. So you either have to prove that such a race
> does not exist, or initializing it to NULL at the beginning like it is
> today.
I've stared at the code some more :-)
There are two places where node->next is written non-NULL, both in osq_lock().
The first is at the top of the slow-path where prev->next = node
(this should be overwriting the NULL set (or not) on entry).
The second is at the bottom of osq_lock() prev->next = next (Step C)
where the NULL written in Step A is written with the correct 'next' node.
After either of those the 'node' cpu must later either take the unqueue
exit from osq_lock() or call osq_unlock().
Both require it wait for node->next be non-NULL and NULL it.
If code calls osq_lock() twice all bets are off!
Even if (somehow) node->next was non-NULL on entry it will be set by an
osq_lock() call from another cpu.
The only problem would be if osq_unlock() were called before the queueing
cpu set prev->next = node.
That in itself is unlikely - but would happen if node->next were always set.
I don't completely understand the 'acquire'/'release' semantics (they didn't
exist when I started doing SMP kernel code in the late 1980s).
But it looks odd that osq_unlock()'s fast path uses _release but the very
similar code in osq_wait_next() uses _acquire.
Indeed, apart from some (assumed) optimisations, I think osq_unlock()
could just be:
next = osq_wait_next(lock, this_cpu_ptr(&osq_node), 0);
if (next)
next->locked = 1;
I don't think the order of the tests for lock->tail and node->next
matter in osq_wait_next().
If they were swapped the 'Second most likely case' code from osq_unlock()
could be removed.
(The 'uncontended case' doesn't need to load the address of 'node'.)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists