[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fbb83171-4069-09d3-a119-68055c86797a@huaweicloud.com>
Date: Mon, 26 Sep 2022 22:08:52 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Jan Kara <jack@...e.cz>, Hugh Dickins <hughd@...gle.com>
Cc: Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
Yu Kuai <yukuai1@...weicloud.com>,
Liu Song <liusong@...ux.alibaba.com>,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH next] sbitmap: fix lockup while swapping
Hi,
在 2022/09/26 19:44, Jan Kara 写道:
> On Fri 23-09-22 16:15:29, Hugh Dickins wrote:
>> On Fri, 23 Sep 2022, Hugh Dickins wrote:
>>> On Fri, 23 Sep 2022, Keith Busch wrote:
>>>
>>>> Does the following fix the observation? Rational being that there's no reason
>>>> to spin on the current wait state that is already under handling; let
>>>> subsequent clearings proceed to the next inevitable wait state immediately.
>>>
>>> It's running fine without lockup so far; but doesn't this change merely
>>> narrow the window? If this is interrupted in between atomic_try_cmpxchg()
>>> setting wait_cnt to 0 and sbq_index_atomic_inc() advancing wake_index,
>>> don't we run the same risk as before, of sbitmap_queue_wake_up() from
>>> the interrupt handler getting stuck on that wait_cnt 0?
>>
>> Yes, it ran successfully for 50 minutes, then an interrupt came in
>> immediately after the cmpxchg, and it locked up just as before.
>>
>> Easily dealt with by disabling interrupts, no doubt, but I assume it's a
>> badge of honour not to disable interrupts here (except perhaps in waking).
>
> I don't think any magic with sbq_index_atomic_inc() is going to reliably
> fix this. After all the current waitqueue may be the only one that has active
> waiters so sbq_wake_ptr() will always end up returning this waitqueue
> regardless of the current value of sbq->wake_index.
>
> Honestly, this whole code needs a serious redesign. I have some
> simplifications in mind but it will take some thinking and benchmarking so
> we need some fix for the interim. I was pondering for quite some time about
> some band aid to the problem you've found but didn't find anything
> satisfactory.
>
> In the end I see two options:
>
> 1) Take your patch (as wrong as it is ;). Yes, it can lead to lost wakeups
> but we were living with those for a relatively long time so probably we can
> live with them for some longer.
>
> 2) Revert Yu Kuai's original fix 040b83fcecfb8 ("sbitmap: fix possible io
> hung due to lost wakeup") and my fixup 48c033314f37 ("sbitmap: Avoid leaving
> waitqueue in invalid state in __sbq_wake_up()"). But then Keith would have
> to redo his batched accounting patches on top.
>
>> Some clever way to make the wait_cnt and wake_index adjustments atomic?
I'm thinking about a hacky way to make the update of wake_cnt and
wake_index atomic, however, redesign of sbitmap_queue is probably
better. 🤣
There are only 8 wait queues and wake_batch is 8 at most, thus
only need 3 * 9 = 27 bit, and a single atomic value is enough:
- 0-2 represents ws[0].wake_cnt
- 3-5 represents ws[1].wake_cnt
- ...
- 21-24 represents ws[7].wake_cnt
- 25-27 represents sbq->wake_index
for example, assume the atomic value is:
0B 111 111 111 111 111 111 111 111 111 000,
which means wake_index is 7 and ws[0].wake_cnt is 0,
if we try to inc wake_index and reset wake_cnt together:
atomic_add(..., 0B 001 000 000 000 000 000 000 000 000 111)
Thanks,
Kuai
>>
>> Or is this sbitmap_queue_wake_up() interrupting sbitmap_queue_wake_up()
>> just supposed never to happen, the counts preventing it: but some
>> misaccounting letting it happen by mistake?
>
> No, I think that is in principle a situation that we have to accommodate.
>
> Honza
>
Powered by blists - more mailing lists