[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d00297d7-a77a-770a-1cd7-1632f8ae77e0@huaweicloud.com>
Date: Mon, 26 Dec 2022 15:50:58 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Jan Kara <jack@...e.cz>, Kemeng Shi <shikemeng@...weicloud.com>
Cc: axboe@...nel.dk, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, kbusch@...nel.org,
Laibin Qiu <qiulaibin@...wei.com>,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH RESEND v2 5/5] sbitmap: correct wake_batch recalculation
to avoid potential IO hung
Hi,
在 2022/12/22 21:41, Jan Kara 写道:
> On Thu 22-12-22 22:33:53, Kemeng Shi wrote:
>> Commit 180dccb0dba4f ("blk-mq: fix tag_get wait task can't be awakened")
>> mentioned that in case of shared tags, there could be just one real
>> active hctx(queue) because of lazy detection of tag idle. Then driver tag
>> allocation may wait forever on this real active hctx(queue) if wake_batch
>> is > hctx_max_depth where hctx_max_depth is available tags depth for the
>> actve hctx(queue). However, the condition wake_batch > hctx_max_depth is
>> not strong enough to avoid IO hung as the sbitmap_queue_wake_up will only
>> wake up one wait queue for each wake_batch even though there is only one
>> waiter in the woken wait queue. After this, there is only one tag to free
>> and wake_batch may not be reached anymore. Commit 180dccb0dba4f ("blk-mq:
>> fix tag_get wait task can't be awakened") methioned that driver tag
>> allocation may wait forever. Actually, the inactive hctx(queue) will be
>> truely idle after at most 30 seconds and will call blk_mq_tag_wakeup_all
>> to wake one waiter per wait queue to break the hung. But IO hung for 30
>> seconds is also not acceptable. Set batch size to small enough that depth
>> of the shared hctx(queue) is enough to wake up all of the queues like
>> sbq_calc_wake_batch do to fix this potential IO hung.
>>
>> Although hctx_max_depth will be clamped to at least 4 while wake_batch
>> recalculation does not do the clamp, the wake_batch will be always
>> recalculated to 1 when hctx_max_depth <= 4.
>>
>> Fixes: 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened")
>> Signed-off-by: Kemeng Shi <shikemeng@...weicloud.com>
>
> So the condition in sbitmap_queue_recalculate_wake_batch() also seemed
> strange to me and the changelogs of commits 180dccb0dba4 and 10825410b95
> ("blk-mq: Fix wrong wakeup batch configuration which will cause hang")
> didn't add much confidence about the magic batch setting to 4. Let me add
> to CC original author of this code if he has any thoughts on why using
> wake batch of 4 is safe for cards with say 32 tags in case active_users is
> currently 32. Because I don't see why that is correct either.
>
If I remember this correctly, the reason to use 4 here in the first
place is to avoid performance degradation. And for why this is safe
because 4 * 8 = 32. Someone is waiting for tag means 32 tags is all
grabbed, and wake batch of 4 will make sure at least 8 wait queues will
be awaken. It's right some waitqueue might only have one waiter, but I
don't think this will cause io hang.
Thanks,
Kuai
> Honza
>
>> ---
>> lib/sbitmap.c | 5 +----
>> 1 file changed, 1 insertion(+), 4 deletions(-)
>>
>> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
>> index b6d3bb1c3675..804fe99783e4 100644
>> --- a/lib/sbitmap.c
>> +++ b/lib/sbitmap.c
>> @@ -458,13 +458,10 @@ void sbitmap_queue_recalculate_wake_batch(struct sbitmap_queue *sbq,
>> unsigned int users)
>> {
>> unsigned int wake_batch;
>> - unsigned int min_batch;
>> unsigned int depth = (sbq->sb.depth + users - 1) / users;
>>
>> - min_batch = sbq->sb.depth >= (4 * SBQ_WAIT_QUEUES) ? 4 : 1;
>> -
>> wake_batch = clamp_val(depth / SBQ_WAIT_QUEUES,
>> - min_batch, SBQ_WAKE_BATCH);
>> + 1, SBQ_WAKE_BATCH);
>>
>> WRITE_ONCE(sbq->wake_batch, wake_batch);
>> }
>> --
>> 2.30.0
>>
Powered by blists - more mailing lists