linux-kernel - Re: stalling IO regression since linux 5.12, through 5.18

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ee48bdc1-1020-78ca-a90e-ef958171a05f@huaweicloud.com>
Date:   Thu, 1 Sep 2022 16:19:16 +0800
From:   Yu Kuai <yukuai1@...weicloud.com>
To:     Jan Kara <jack@...e.cz>, Yu Kuai <yukuai1@...weicloud.com>
Cc:     Ming Lei <ming.lei@...hat.com>,
        Chris Murphy <lists@...orremedies.com>,
        Nikolay Borisov <nborisov@...e.com>,
        Jens Axboe <axboe@...nel.dk>,
        Paolo Valente <paolo.valente@...aro.org>,
        Btrfs BTRFS <linux-btrfs@...r.kernel.org>,
        Linux-RAID <linux-raid@...r.kernel.org>,
        linux-block <linux-block@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Josef Bacik <josef@...icpanda.com>,
        "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: stalling IO regression since linux 5.12, through 5.18

在 2022/09/01 16:03, Jan Kara 写道:
> On Thu 01-09-22 15:02:03, Yu Kuai wrote:
>> Hi, Chris
>>
>> 在 2022/08/20 15:00, Ming Lei 写道:
>>> On Fri, Aug 19, 2022 at 03:20:25PM -0400, Chris Murphy wrote:
>>>>
>>>>
>>>> On Thu, Aug 18, 2022, at 1:24 AM, Ming Lei wrote:
>>>>> On Thu, Aug 18, 2022 at 12:27:04AM -0400, Chris Murphy wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 18, 2022, at 12:18 AM, Chris Murphy wrote:
>>>>>>> On Thu, Aug 18, 2022, at 12:12 AM, Chris Murphy wrote:
>>>>>>>> On Wed, Aug 17, 2022, at 11:41 PM, Ming Lei wrote:
>>>>>>>>
>>>>>>>>> OK, can you post the blk-mq debugfs log after you trigger it on v5.17?
>>>>>>
>>>>>> Same boot, 3rd log. But the load is above 300 so I kinda need to sysrq+b soon.
>>>>>>
>>>>>> https://drive.google.com/file/d/1375H558kqPTdng439rvG6LuXXWPXLToo/view?usp=sharing
>>>>>>
>>>>>
>>>>> Also please test the following one too:
>>>>>
>>>>>
>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>> index 5ee62b95f3e5..d01c64be08e2 100644
>>>>> --- a/block/blk-mq.c
>>>>> +++ b/block/blk-mq.c
>>>>> @@ -1991,7 +1991,8 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx
>>>>> *hctx, struct list_head *list,
>>>>>    		if (!needs_restart ||
>>>>>    		    (no_tag && list_empty_careful(&hctx->dispatch_wait.entry)))
>>>>>    			blk_mq_run_hw_queue(hctx, true);
>>>>> -		else if (needs_restart && needs_resource)
>>>>> +		else if (needs_restart && (needs_resource ||
>>>>> +					blk_mq_is_shared_tags(hctx->flags)))
>>>>>    			blk_mq_delay_run_hw_queue(hctx, BLK_MQ_RESOURCE_DELAY);
>>>>>
>>>>>    		blk_mq_update_dispatch_busy(hctx, true);
>>>>>
>>>>
>>>>
>>>> With just this patch on top of 5.17.0, it still hangs. I've captured block debugfs log:
>>>> https://drive.google.com/file/d/1ic4YHxoL9RrCdy_5FNdGfh_q_J3d_Ft0/view?usp=sharing
>>>
>>> The log is similar with before, and the only difference is RESTART not
>>> set.
>>>
>>> Also follows another patch merged to v5.18 and it fixes io stall too, feel free to test it:
>>>
>>> 8f5fea65b06d blk-mq: avoid extending delays of active hctx from blk_mq_delay_run_hw_queues
>>
>> Have you tried this patch?
>>
>> We meet a similar problem in our test, and I'm pretty sure about the
>> situation at the scene,
>>
>> Our test environment：nvme with bfq ioscheduler,
>>
>> How io is stalled:
>>
>> 1. hctx1 dispatch rq from bfq in service queue, bfqq becomes empty,
>> dispatch somehow fails and rq is inserted to hctx1->dispatch, new run
>> work is queued.
>>
>> 2. other hctx tries to dispatch rq, however, in service bfqq is
>> empty, bfq_dispatch_request return NULL, thus
>> blk_mq_delay_run_hw_queues is called.
>>
>> 3. for the problem described in above patch，run work from "hctx1"
>> can be stalled.
>>
>> Above patch should fix this io stall, however, it seems to me bfq do
>> have some problems that in service bfqq doesn't expire under following
>> situation:
>>
>> 1. dispatched rqs don't complete
>> 2. no new rq is issued to bfq
> 
> And I guess:
> 3. there are requests queued in other bfqqs
> ?

Yes, of course, other bfqqs still have requests, but current
implementation have flaws that even if other bfqqs doesn't have
requests, bfq_asymmetric_scenario() can still return true because
num_groups_with_pending_reqs > 0. We tried to fix this, however, there
seems to be some misunderstanding with Paolo, and it's not applied to
mainline yet...

Thanks,
Kuai
> 
> Otherwise I don't see a point in expiring current bfqq because there's
> nothing bfq could do anyway. But under normal circumstances the request
> completion should not take so long so I don't think it would be really
> worth it to implement some special mechanism for this in bfq.
> 
> 								Honza
>