linux-kernel - Re: [PATCH v2 0/3] Fix some starvation problems in block layer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1FCC991B-1D0A-4127-B95A-8021ABAF2FA8@linux.dev>
Date: Tue, 10 Sep 2024 10:49:32 +0800
From: Muchun Song <muchun.song@...ux.dev>
To: Muchun Song <songmuchun@...edance.com>
Cc: Jens Axboe <axboe@...nel.dk>,
 Ming Lei <ming.lei@...hat.com>,
 Yu Kuai <yukuai1@...weicloud.com>,
 linux-block@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/3] Fix some starvation problems in block layer



> On Sep 3, 2024, at 16:16, Muchun Song <songmuchun@...edance.com> wrote:
> 
> We encounter a problem on our servers where hundreds of UNINTERRUPTED
> processes are all waiting in the WBT wait queue. And the IO hung detector
> logged so many messages about "blocked for more than 122 seconds". The
> call trace is as follows:
> 
>    Call Trace:
>        __schedule+0x959/0xee0
>        schedule+0x40/0xb0
>        io_schedule+0x12/0x40
>        rq_qos_wait+0xaf/0x140
>        wbt_wait+0x92/0xc0
>        __rq_qos_throttle+0x20/0x30
>        blk_mq_make_request+0x12a/0x5c0
>        generic_make_request_nocheck+0x172/0x3f0
>        submit_bio+0x42/0x1c0
>        ...
> 
> The WBT module is used to throttle buffered writeback, which will block
> any buffered writeback IO request until the previous inflight IOs have
> been completed. So I checked the inflight IO counter. That was one meaning
> one IO request was submitted to the downstream interface like block core
> layer or device driver (virtio_blk driver in our case). We need to figure
> out why the inflight IO is not completed in time. I confirmed that all
> the virtio ring buffers of virtio_blk are empty and the hardware dispatch
> list had one IO request, so the root cause is not related to the block
> device or the virtio_blk driver since the driver has never received that
> IO request.
> 
> We know that block core layer could submit IO requests to the driver through
> kworker (the callback function is blk_mq_run_work_fn). I thought maybe the
> kworker was blocked by some other resources causing the callback to not be
> evoked in time. So I checked all the kworkers and workqueues and confirmed
> there was no pending work on any kworker or workqueue.
> 
> Integrate all the investigation information, the problem should be in the
> block core layer missing a chance to submit that IO request. After
> some investigation of code, I found some scenarios which could cause the
> problem.

Hi Jens Axboe,

May I ask if you have any suggestions for those fixes? Or if they could
be merged?

Muchun,
Thanks.

> 
> Changes in v2:
>  - Collect RB tag from Ming Lei.
>  - Use barrier-less approach to fix QUEUE_FLAG_QUIESCED ordering problem
>    suggested by Ming Lei.
>  - Apply new approach to fix BLK_MQ_S_STOPPED ordering for easier maintenance.
>  - Add Fixes tag to each patch.
> 
> Muchun Song (3):
>  block: fix missing dispatching request when queue is started or
>    unquiesced
>  block: fix ordering between checking QUEUE_FLAG_QUIESCED and adding
>    requests
>  block: fix ordering between checking BLK_MQ_S_STOPPED and adding
>    requests
> 
> block/blk-mq.c | 55 ++++++++++++++++++++++++++++++++++++++------------
> block/blk-mq.h | 13 ++++++++++++
> 2 files changed, 55 insertions(+), 13 deletions(-)
> 
> -- 
> 2.20.1
>