[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y8/PM7ONLUkIETxy@hirez.programming.kicks-ass.net>
Date: Tue, 24 Jan 2023 13:29:39 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Waiman Long <longman@...hat.com>
Cc: Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
Boqun Feng <boqun.feng@...il.com>,
linux-kernel@...r.kernel.org, john.p.donnelly@...cle.com,
Hillf Danton <hdanton@...a.com>,
Mukesh Ojha <quic_mojha@...cinc.com>,
Ting11 Wang 王婷 <wangting11@...omi.com>
Subject: Re: [PATCH v6 5/6] locking/rwsem: Enable direct rwsem lock handoff
On Mon, Jan 23, 2023 at 12:30:59PM -0500, Waiman Long wrote:
> On 1/23/23 09:59, Peter Zijlstra wrote:
> > On Thu, Nov 17, 2022 at 09:20:15PM -0500, Waiman Long wrote:
> > > The lock handoff provided in rwsem isn't a true handoff like that in
> > > the mutex. Instead, it is more like a quiescent state where optimistic
> > > spinning and lock stealing are disabled to make it easier for the first
> > > waiter to acquire the lock.
> > >
> > > Reworking the code to enable a true lock handoff is more complex due to
> > > the following facts:
> > > 1) The RWSEM_FLAG_HANDOFF bit is protected by the wait_lock and it
> > > is too expensive to always take the wait_lock in the unlock path
> > > to prevent racing.
> > Specifically, the issue is that we'd need to turn the
> > atomic_long_add_return_release() into an atomic_try_cmpxchg_release()
> > loop, like:
> >
> > tmp = atomic_long_read(&sem->count);
> > do {
> > if (tmp & (WAITERS|HANDOFF))
> > return slow_unock();
> > } while (atomic_long_try_cmpxchg_release(&sem->count, &tmp,
> > tmp - RWSEM_{READER_BIAS,WRITE_LOCKED});
> >
> > in order to not race with a concurrent setting of the HANDOFF bit,
> > right? And we don't like to turn unlock into a cmpxchg loop.
> >
> > (OTOH we sorta do this for mutex, unconteded mutex has cmpxchg lock and
> > unlock, any fail and we go to the slow path -- I suppose the distinct
> > difference is that we sorta expect some contention on the read side)
>
> I see that your inclination is to do the handoff right at the unlock time.
> It is certainly possible to do that, but it will be more complex in the case
> of rwsem than mutex.
Still, it would make things ever so much simpler -- but I agree we'll
probably not get away with it on the performance side of things.
> > Right. In short:
> >
> > Having HANDOVER set:
> > - implies WAITERS set
> > - disables all fastpaths (spinning or otherwise)
> > - dis-allows anybody except first waiter to obtain lock
> >
> > Therefore, having the window between clearing owner and prodding first
> > waiter is 'harmless'.
>
> As said above, we need to confirm that the HANDOFF bit is set with wait_lock
> held. Now, the HANDOFF bit may not set at unlock time, or it may not be.
>
> We can pass the count value fetched at unlock time down to rwsem_wake() to
> confirm that HANDOFF bit is set at unlock time. However, it is also possible
> that the original waiter that set HANDOFF have bailed out, then a reader
> acquire the lock and another waiter set HANDOFF before the unlocker acquire
> the wait lock. Then the rwsem is really reader-owned in this case. So we
> can't perform handoff. That is why I also check for if there is an active
> lock (mostly read lock) at rwsem_wake(). However, that can be a false
> negative because an incoming reader may have just added a READER_BIAS which
> is to be removed soon. That is the reason why I have a secondary handoff
> check in the reader slowpath.
>
> >
> > > With true lock handoff, there is no need to do a NULL owner spinning
> > > anymore as wakeup will be performed if handoff is possible. So it
> > > is likely that the first waiter won't actually go to sleep even when
> > > schedule() is called in this case.
> > Right, removing that NULL spinning was the whole purpose -- except I
> > seem to have forgotten why it was a problem :-)
> >
> > OK, lemme go read the actual patch.
> >
> > Hmmm... you made it a wee bit more complicated, instead of my 3rd clause
> > above, you added a whole intermediate GRANTED state. Why?
> >
> > Since we fundamentally must deal with the release->wait_lock hole, why
> > do we need to do the whole rwsem_wake()->GRANTED->*_slowpath() dance?
> > Why can't we skip the whole rwsem_wake()->GRANTED part and only do
> > handoff in the slowpath?
>
> First of all, the owner value for a reader-owned rwsem is mostly of an
> advisory value as it holds one of the possible owners. So it may be a bit
> risky to use it as an indication that a handoff had happened as it may be
> screwed up in some rare cases. That is why I use the repurposed
> handoff_state value in the waiter structure. Also reading this value is less
> costly than reading the rwsem cacheline which can be heavily contended.
>
> I will update the patch description to highlight the points that I discussed
> in this email.
Maybe I'm being dense, but I'm not seeing it. If we make HANDOFF block
all the fastpaths, all the spinning, all the stealing, everything; then
all that is left is the slowpath that is holding wait_lock.
Then in both slowpaths, ensure only the first waiter can go on and we're
done.
What am I missing? Why does it need to be so complicated?
That is, afaict something like the below would actually work, no? Yes,
simply deleting that spinning in write_slowpath isn't ideal, but I
suspect we can do something to rwsem_try_write_lock() to make up for
that if we think about it.
Again, please explain, explicitly and in small steps, why you think you
need all that complexity.
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -40,7 +40,7 @@
*
* When the rwsem is reader-owned and a spinning writer has timed out,
* the nonspinnable bit will be set to disable optimistic spinning.
-
+ *
* When a writer acquires a rwsem, it puts its task_struct pointer
* into the owner field. It is cleared after an unlock.
*
@@ -430,6 +430,10 @@ static void rwsem_mark_wake(struct rw_se
* Mark writer at the front of the queue for wakeup.
* Until the task is actually later awoken later by
* the caller, other writers are able to steal it.
+ *
+ * *Unless* HANDOFF is set, in which case only the
+ * first waiter is allowed to take it.
+ *
* Readers, on the other hand, will block as they
* will notice the queued writer.
*/
@@ -463,6 +467,9 @@ static void rwsem_mark_wake(struct rw_se
* force the issue.
*/
if (time_after(jiffies, waiter->timeout)) {
+ /*
+ * Setting HANDOFF blocks fastpaths and stealing.
+ */
if (!(oldcount & RWSEM_FLAG_HANDOFF)) {
adjustment -= RWSEM_FLAG_HANDOFF;
lockevent_inc(rwsem_rlock_handoff);
@@ -471,6 +478,13 @@ static void rwsem_mark_wake(struct rw_se
}
atomic_long_add(-adjustment, &sem->count);
+
+ if (waiter->handoff_set) {
+ /*
+ * With HANDOFF set we must terminate all spinning.
+ */
+ rwsem_set_nonspinnable(sem);
+ }
return;
}
/*
@@ -844,7 +858,6 @@ static bool rwsem_optimistic_spin(struct
* Try to acquire the lock
*/
taken = rwsem_try_write_lock_unqueued(sem);
-
if (taken)
break;
@@ -1159,22 +1172,6 @@ rwsem_down_write_slowpath(struct rw_sema
if (signal_pending_state(state, current))
goto out_nolock;
- /*
- * After setting the handoff bit and failing to acquire
- * the lock, attempt to spin on owner to accelerate lock
- * transfer. If the previous owner is a on-cpu writer and it
- * has just released the lock, OWNER_NULL will be returned.
- * In this case, we attempt to acquire the lock again
- * without sleeping.
- */
- if (waiter.handoff_set) {
- enum owner_state owner_state;
-
- owner_state = rwsem_spin_on_owner(sem);
- if (owner_state == OWNER_NULL)
- goto trylock_again;
- }
-
schedule_preempt_disabled();
lockevent_inc(rwsem_sleep_writer);
set_current_state(state);
Powered by blists - more mailing lists