[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <27e4f72f-3b6e-4dc8-a722-0ace9aa8c066@oracle.com>
Date: Thu, 26 Jun 2025 10:34:26 +0100
From: John Garry <john.g.garry@...cle.com>
To: Yu Kuai <yukuai1@...weicloud.com>, axboe@...nel.dk, hare@...e.de,
hch@...radead.org, yukuai3@...wei.com
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
yi.zhang@...hat.com, calvin@...nvd.org, david@...morbit.com,
yi.zhang@...wei.com, yangerkun@...wei.com, johnny.chenyi@...wei.com
Subject: Re: [PATCH] block: fix false warning in bdev_count_inflight_rw()
On 26/06/2025 09:39, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@...wei.com>
>
> While bdev_count_inflight is interating all cpus, if some IOs are issued
> from traversed cpu and then completed from the cpu that is not traversed
> yet:
>
> cpu0
> cpu1
> bdev_count_inflight
> //for_each_possible_cpu
> // cpu0 is 0
> infliht += 0
> // issue a io
> blk_account_io_start
> // cpu0 inflight ++
>
> cpu2
> // the io is done
> blk_account_io_done
> // cpu2 inflight --
> // cpu 1 is 0
> inflight += 0
> // cpu2 is -1
> inflight += -1
> ...
>
> In this case, the total inflight will be -1, causing lots of false
> warning. Fix the problem by removing the warning.
Is it even safe to even use this function when not used for just
informative purposes? I mean, for example, it is used by md code to
check for idle state - could that check return an invalid result (and
cause harm)?
>
> Noted there is still a valid warning for nvme-mpath(From Yi) that is not
> fixed yet.
>
> Fixes: f5482ee5edb9 ("block: WARN if bdev inflight counter is negative")
> Reported-by: Yi Zhang <yi.zhang@...hat.com>
> Closes: https://urldefense.com/v3/__https://lore.kernel.org/linux-block/aFtUXy-lct0WxY2w@mozart.vkv.me/T/*mae89155a5006463d0a21a4a2c35ae0034b26a339__;Iw!!ACWV5N9M2RV99hQ!LLzonI0PgLV8uruViz5LkA_QGoFQSsfBMNDhb45qsRoJqxuTMcO_2BxJXhMOADfnwncgrR3o99lVDCnq75I7_UI$
> Reported-and-tested-by: Calvin Owens <calvin@...nvd.org>
> Closes: https://urldefense.com/v3/__https://lore.kernel.org/linux-block/aFtUXy-lct0WxY2w@mozart.vkv.me/T/*m1d935a00070bf95055d0ac84e6075158b08acaef__;Iw!!ACWV5N9M2RV99hQ!LLzonI0PgLV8uruViz5LkA_QGoFQSsfBMNDhb45qsRoJqxuTMcO_2BxJXhMOADfnwncgrR3o99lVDCnqYruhFG0$
> Reported-by: Dave Chinner <david@...morbit.com>
> Closes: https://urldefense.com/v3/__https://lore.kernel.org/linux-block/aFuypjqCXo9-5_En@dread.disaster.area/__;!!ACWV5N9M2RV99hQ!LLzonI0PgLV8uruViz5LkA_QGoFQSsfBMNDhb45qsRoJqxuTMcO_2BxJXhMOADfnwncgrR3o99lVDCnqj32KGls$
> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
> ---
> block/genhd.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/block/genhd.c b/block/genhd.c
> index 8171a6bc3210..680fa717082f 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -141,9 +141,14 @@ static void bdev_count_inflight_rw(struct block_device *part,
> }
> }
>
> - if (WARN_ON_ONCE((int)inflight[READ] < 0))
> + /*
> + * While iterating all cpus, some IOs might issued from traversed cpu
> + * and then completed from the cpu that is not traversed yet, causing
> + * the inflight number to be negative.
> + */
> + if ((int)inflight[READ] < 0)
> inflight[READ] = 0;
> - if (WARN_ON_ONCE((int)inflight[WRITE] < 0))
> + if ((int)inflight[WRITE] < 0)
> inflight[WRITE] = 0;
> }
>
Powered by blists - more mailing lists