[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fc6bb7b9-a229-4afd-d7c6-a302460fde81@redhat.com>
Date: Mon, 13 Feb 2023 21:03:30 -0500
From: Waiman Long <longman@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
Boqun Feng <boqun.feng@...il.com>,
linux-kernel@...r.kernel.org, john.p.donnelly@...cle.com,
Hillf Danton <hdanton@...a.com>,
Mukesh Ojha <quic_mojha@...cinc.com>,
Ting11 Wang 王婷 <wangting11@...omi.com>
Subject: Re: [PATCH v7 4/4] locking/rwsem: Enable direct rwsem lock handoff
On 2/13/23 16:52, Peter Zijlstra wrote:
> On Mon, Feb 13, 2023 at 12:14:59PM -0500, Waiman Long wrote:
>
>>> I am once again confused...
>>>
>>> *WHY* are you changing the writer wake-up path? The comments added here
>>> don't clarify anything.
>>>
>>> If we set handoff, we terminate/disallow the spinning/stealing. The
>>> direct consequence is that the slowpath/wait-list becomes the only way
>>> forward.
>> Yes, that is true.
>>> Since we don't take wait_lock on up, we fundamentally have a race
>>> condition. But *WHY* do you insist on handling that in rwsem_wake()?
>>> Delaying all that until rwsem_try_write_lock()? Doing so would render
>>> pretty much all of the above pointless, no?
>> There is an advantage in doing the handover earlier, if possible. A reader
>> that comes in between can spoils the takeover of the rwsem in
> How ?!? since all the stealing and spinning is out, the wait-list reigns
> supreme. A new reader will queue.
>
> Are you worried about the spurious elevation of ->count due to that
> uncondition increment on down_read() ? This is always going to be the
> case.
Yes, that is what I am talking about.
>
>> rwsem_try_write_lock() and cause it to sleep again. Since we will have to
>> take the wait lock anyway in rwsem_wake(), there isn't much additional cost
>> to do some additional check.
> There is a complexity cost -- and so far I've not seen a single line of
> justification for the added complexity.
>
>> Note that the kernel test robot had detected a 19.3% improvement of
>> will-it-scale.per_thread_ops [1] due to this commit. That indicates this
>> commit is good to have. I am planning to update the commit log to include
>> that information as well as additional reasoning as discussed here.
> Seen that; but who's saying a simpler implementation will not also have
> those gains. And if not, then we have a clear separation into
> functionality and optimization and justifications for it.
>
> But now, we have neither. I'm not saying the patch is wrong -- I'm
> saying it is needlessly complicated without justification.
>
> All this stuff is hard enough -- we should really try to keep is as
> simple as possible, having two distinct ways to wake up a writer is not
> 'as simple as possible'.
OK, a simpler implementation will be to also check for handoff readiness
in rwsem_wake() and set a flag in waiter structure. When the writer
wakes up and calls rwsem_try_write_lock(), it can ignore any spurious
READER_BIAS and take over the lock directly if the flag is set. Will you
be OK with that?
I believe the one of the reasons for the performance improvement
reported by the test robot is that the check for handoff readiness in
both rwsem_wake() and rwsem_try_write_lock() will increase the chance
that a handoff will be successful in the first attempt. Another possible
reason is that writer may not need to take the wait_lock anymore if all
the handoff work is done by the unlocker at rwsem_wake(). That will be
lost if we do a simpler solution as outlined above.
>
>>> After all, rwsem_mark_wake() already wakes the writer if it is first,
>>> no? Why invent yet another special way to wake up the writer.
>> As I said before, waking up the writer does not mean it can always get the
>> rwsem on the first rwsem_try_write_lock(). Doing early handoff in
>> rwsem_wake() can remove that ambiguity.
> But what if it is rwsem_down_read_slowpath() that drops ->count to 0 and
> does rwsem_mark_wake() and misses your fancy new path?
As said above, the current patch allows handoff to be done at both
rwsem_wake() and rwsem_try_write_lock(). I haven't removed the code to
do handoff in rwsem_try_write_lock().
>
> That is, I'm thinking there's an argument to be had that rwsem_wake() is
> fundamentally the wrong place to do anything.
>
> OTOH, you are right in that rwsem_mark_wake() is in a special position;
> it *KNOWS* the reader count has hit 0 and can ignore future spurious
> increments because by knowing it was 0 it knows they *WILL* fail and
> queue (provided WAITERS||HANDOFF) -- but I've not seen you
> articulate this point.
My bad, I sometimes miss stating assumptions and rationale that can
properly justify the patch. I usually need more back and fro like what
we are doing now to fully write out all the missing pieces. So I really
appreciate your time in reviewing the patch and your critiques :-)
>
> (note how rwsem_cond_wake_waiter() relies on exactly this to select
> WAKE_ANY)
>
> You are also right in that having these different means of waking
> readers and writers is 'confusing'.
>
> But you failed to take things to their natural conclusion -- change the
> way we do writer wakeups, all writer wakeups.
>
> So change rwsem_mark_wake()'s writer wakeup and
> rwsem_down_write_slowpath() to always so that waiter->task thing that
> the readers already do.
>
> It means putting all the logic for waking writers into
> rwsem_mark_wake(), but then we can make full use of this 'we hit zero
> future readers are temporary noise'. Althought I suspect you might need
> to pass count into it, so we can observe the flags at the time 0 was
> observed.
>
> Is all that making sense? -- it has been a long day :-)
I guess it is possible to do that, but it may reduce the performance
improvement due to optimistic spinning and it will become more like the
old way of strictly following the wait queue order in granting lock. I
need to run some tests to verify the performance impact.
>
>>> Also; and I asked this last time around; why do we care about the
>>> handoff to writer *at*all* ? It is the readers that set HANDOFF.
>> HANDOFF can happen for both readers and writers. Handoff to writer is
>> actually more important than to readers.
> Hmm, clearly I missed something. This is rwsem_try_write_lock() setting
> it, right?
Yes, handoff can be set and granted in rwsem_try_write_lock().
Cheers,
Longman
Powered by blists - more mailing lists