[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y8/WBFj4uLz4N2ZH@hirez.programming.kicks-ass.net>
Date: Tue, 24 Jan 2023 13:58:44 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Waiman Long <longman@...hat.com>
Cc: Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
Boqun Feng <boqun.feng@...il.com>,
linux-kernel@...r.kernel.org, john.p.donnelly@...cle.com,
Hillf Danton <hdanton@...a.com>,
Mukesh Ojha <quic_mojha@...cinc.com>,
Ting11 Wang 王婷 <wangting11@...omi.com>
Subject: Re: [PATCH v6 5/6] locking/rwsem: Enable direct rwsem lock handoff
On Mon, Jan 23, 2023 at 05:07:08PM -0500, Waiman Long wrote:
> On 1/23/23 12:30, Waiman Long wrote:
> > I will update the patch description to highlight the points that I
> > discussed in this email.
>
> I am planning to update the patch description to as follows:
>
> The lock handoff provided in rwsem isn't a true handoff like that in
> the mutex. Instead, it is more like a quiescent state where optimistic
> spinning and lock stealing are disabled to make it easier for the first
> waiter to acquire the lock.
>
> For mutex, lock handoff is done at unlock time as the owner value and
> the handoff bit is in the same lock word and can be updated atomically.
>
> That is the not case for rwsem which has a separate count value for
> locking and an owner value. The only way to update them in a
> quasi-atomic
> way is to use the wait_lock for synchronization as the handoff bit can
> only be updated while holding the wait_lock. So for rwsem, the new
> lock handoff mechanism is done mostly at rwsem_wake() time when the
> wait_lock has to be acquired anyway to minimize additional overhead.
So for first==reader, sure, and you don't need anything special, since
rwsem_mark_wake() already does the right thing.
But for first==writer, I don't follow; *WHY* do you have to complicate
this path so. The write_slowpath already takes wait_lock for
rwsem_try_write_lock() and that already knows about handoff.
> It is also likely that the active lock in this case may be a transient
> RWSEM_READER_BIAS that will be removed soon. So we have a secondary
> handoff done at reader slow path to handle this particular case.
Only because you made it so damn complicated. If instead you rely on the
wait_lock in write_slowpath you can keep it all in once place AFAICT.
> For reader-owned rwsem, the owner value other than the
> RWSEM_READER_OWNED
> bit is mostly for debugging purpose only. So it is not safe to use
> the owner value to confirm a handoff to a reader has happened. On the
What ?!? Where do we care about the owner value? There's
RWSEM_FLAG_HANDOFF which lives in sem->count and there's
waiter->handoff_set. Nowhere do we care about sem->owner in this.
> other hand, we can do that when handing off to a writer. However, it
> is simpler to use the same mechanism to notify a handoff has happened
> for both readers and writers. So a new HANDOFF_GRANTED state is added
I really can't follow whatever logic jump here.
> to enum rwsem_handoff_state to signify that. This new value will be
> written to the handoff_state value of the first waiter.
>
> With true lock handoff, there is no need to do a NULL owner spinning
> anymore as wakeup will be performed if handoff is successful. So it
> is likely that the first waiter won't actually go to sleep even when
> schedule() is called in this case.
So this spinning, this is purely for writer->write handoff (which is
exceedingly rare since it is readers that set handoff), right?
Why is that so important?
Also, why can't we add something like
owner = rwsem_owner_flags(sem, &flags);
if (owner && !(flags & RWSEM_READER_OWNED))
atomic_long_cond_read_relaxed(&sem->counter, !(VAL & RWSEM_WRITER_LOCKED))
to the start of that? If it's really needed.
Powered by blists - more mailing lists