[<prev] [next>] [day] [month] [year] [list]
Message-ID: <edbb017b-2677-cd30-2091-ff347b00183d@quicinc.com>
Date: Wed, 12 Oct 2022 19:12:59 +0530
From: Mukesh Ojha <quic_mojha@...cinc.com>
To: Hillf Danton <hdanton@...a.com>
CC: <linux-kernel@...r.kernel.org>, <john.p.donnelly@...cle.com>,
<linux-mm@...ck.org>, Waiman Long <longman@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Will Deacon <will@...nel.org>,
Boqun Feng <boqun.feng@...il.com>,
Ingo Molnar <mingo@...hat.com>
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in
down_write() slowpath
Hi,
On 10/12/2022 9:34 AM, Hillf Danton wrote:
> On 11 Oct 2022 18:46:20 +0530 Mukesh Ojha <quic_mojha@...cinc.com>
>> On 10/11/2022 4:16 PM, Hillf Danton wrote:
>>> On 10/10/22 06:24 Mukesh Ojha <quic_mojha@...cinc.com>
>>>> Hi Waiman,
>>>>
>>>> On 9/29/2022 11:36 PM, Waiman Long wrote:
>>>>> On 9/29/22 14:04, Waiman Long wrote:
>>>>>> A non-first waiter can potentially spin in the for loop of
>>>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>>>>> lock even if the rwsem is free if the following sequence happens:
>>>>>>
>>>>>> Non-first waiter First waiter Lock holder
>>>>>> ---------------- ------------ -----------
>>>>>> Acquire wait_lock
>>>>>> rwsem_try_write_lock():
>>>>>> Set handoff bit if RT or
>>>>>> wait too long
>>>>>> Set waiter->handoff_set
>>>>>> Release wait_lock
>>>>>> Acquire wait_lock
>>>>>> Inherit waiter->handoff_set
>>>>>> Release wait_lock
>>>>>> Clear owner
>>>>>> Release lock
>>>>>> if (waiter.handoff_set) {
>>>>>> rwsem_spin_on_owner(();
>>>>>> if (OWNER_NULL)
>>>>>> goto trylock_again;
>>>>>> }
>>>>>> trylock_again:
>>>>>> Acquire wait_lock
>>>>>> rwsem_try_write_lock():
>>>>>> if (first->handoff_set && (waiter != first))
>>>>>> return false;
>>>>>> Release wait_lock
>>>>>>
>>>>>> It is especially problematic if the non-first waiter is an RT task and
>>>>>> it is running on the same CPU as the first waiter as this can lead to
>>>>>> live lock.
>>>>>>
>>>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>>>>>> consistent")
>>>>>> Signed-off-by: Waiman Long <longman@...hat.com>
>>>>>> ---
>>>>>> kernel/locking/rwsem.c | 13 ++++++++++---
>>>>>> 1 file changed, 10 insertions(+), 3 deletions(-)
>>>>>
>>>>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>>>>
>>>>
>>>> Looks like, There is still a window for a race.
>>>>
>>>> There is a chance when a reader who came first added it's BIAS and
>>>> goes to slowpath and before it gets added to wait list it got
>>>> preempted by RT task which goes to slowpath as well and being the
>>>> first waiter gets its hand-off bit set and not able to get the lock
>>>> due to following condition in rwsem_try_write_lock()
>>
>> []
>>
>>>>
>>>> 630 if (count & RWSEM_LOCK_MASK) { ==> reader has
>>>> sets its bias
>>>> ..
>>>> ...
>>>>
>>>> 634
>>>> 635 new |= RWSEM_FLAG_HANDOFF;
>>>> 636 } else {
>>>> 637 new |= RWSEM_WRITER_LOCKED;
>>>>
>>>>
>>>> ---------------------->----------------------->-------------------------
>>>>
>>>> First reader (1) writer(2) RT task Lock holder(3)
>>>>
>>>> It sets
>>>> RWSEM_READER_BIAS.
>>>> while it is going to
>>>> slowpath(as the lock
>>>> was held by (3)) and
>>>> before it got added
>>>> to the waiters list
>>>> it got preempted
>>>> by (2).
>>>> RT task also takes
>>>> the slowpath and add release the
>>>> itself into waiting list rwsem lock
>>>> and since it is the first clear the
>>>> it is the next one to get owner.
>>>> the lock but it can not
>>>> get the lock as (count &
>>>> RWSEM_LOCK_MASK) is set
>>>> as (1) has added it but
>>>> not able to remove its
>>>> adjustment.
>>
>> []
>>
>>>>
>>> Hey Mukesh,
>>>
>>> Can you test the diff if it makes sense to you?
>>>
>>> It simply prevents the first waiter from spinning any longer after detecting
>>> it barely makes any progress to spin without lock owner.
>>>
>>> Hillf
>>>
>>> --- mainline/kernel/locking/rwsem.c
>>> +++ b/kernel/locking/rwsem.c
>>> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock(
>>> long count, new;
>>>
>>> lockdep_assert_held(&sem->wait_lock);
>>> + waiter->handoff_set = false;
>>>
>>> count = atomic_long_read(&sem->count);
>>> do {
>>> bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
>>>
>>> if (has_handoff) {
>>> - /*
>>> - * Honor handoff bit and yield only when the first
>>> - * waiter is the one that set it. Otherwisee, we
>>> - * still try to acquire the rwsem.
>>> - */
>>> - if (first->handoff_set && (waiter != first))
>>> + if (waiter != first)
>>> return false;
>>
>> you mean, you want to check and change waiter->handoff_set on every run
>> rwsem_try_write_lock().
>>
> Yes, with RWSEM_FLAG_HANDOFF set, it is too late for non first waiters to
> spin, and with both RWSEM_LOCK_MASK and RWSEM_FLAG_HANDOFF set, the rivals
> in the RWSEM_LOCK_MASK have an uphand over the first waiter wrt acquiring
> the lock, and it is not a bad option for the first waiter to take a step
> back off.
>
> if (count & RWSEM_LOCK_MASK) {
> if (has_handoff || (!rt_task(waiter->task) &&
> !time_after(jiffies, waiter->timeout)))
> return false;
>
> new |= RWSEM_FLAG_HANDOFF;
> } else {
>
>> But does it break optimistic spinning ? @waiman ?
>
> Waiters spin for acquiring lock instead of lockup and your report shows
> spinning too much makes trouble. The key is stop spinning neither too
> late nor too early. My proposal is a simple one with as few heuristics
> added as possible.
From the high level, it looks like it will work.
Let me check and get back on this.
-Mukesh
>
> Hillf
Powered by blists - more mailing lists