[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <218f0cd0-61bf-4afa-afb0-a559cd085d4a@nvidia.com>
Date: Wed, 26 Nov 2025 08:07:21 +0000
From: Chaitanya Kulkarni <chaitanyak@...dia.com>
To: Yongpeng Yang <yangyongpeng.storage@...il.com>, Chaitanya Kulkarni
<ckulkarnilinux@...il.com>, "axboe@...nel.dk" <axboe@...nel.dk>,
"agk@...hat.com" <agk@...hat.com>, "snitzer@...nel.org" <snitzer@...nel.org>,
"mpatocka@...hat.com" <mpatocka@...hat.com>, "song@...nel.org"
<song@...nel.org>, "yukuai@...as.com" <yukuai@...as.com>, "hch@....de"
<hch@....de>, "sagi@...mberg.me" <sagi@...mberg.me>, Chaitanya Kulkarni
<chaitanyak@...dia.com>, "jaegeuk@...nel.org" <jaegeuk@...nel.org>,
"chao@...nel.org" <chao@...nel.org>, "cem@...nel.org" <cem@...nel.org>
CC: "dm-devel@...ts.linux.dev" <dm-devel@...ts.linux.dev>,
"linux-raid@...r.kernel.org" <linux-raid@...r.kernel.org>, Johannes Thumshirn
<johannes.thumshirn@....com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-nvme@...ts.infradead.org"
<linux-nvme@...ts.infradead.org>, "linux-f2fs-devel@...ts.sourceforge.net"
<linux-f2fs-devel@...ts.sourceforge.net>, "linux-block@...r.kernel.org"
<linux-block@...r.kernel.org>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>, Yongpeng Yang
<yangyongpeng@...omi.com>
Subject: Re: [f2fs-dev] [PATCH V3 6/6] xfs: ignore discard return value
On 11/25/25 18:37, Yongpeng Yang wrote:
>> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
>> index 6917de832191..b6ffe4807a11 100644
>> --- a/fs/xfs/xfs_discard.c
>> +++ b/fs/xfs/xfs_discard.c
>> @@ -108,7 +108,7 @@ xfs_discard_endio(
>> * list. We plug and chain the bios so that we only need a single
>> completion
>> * call to clear all the busy extents once the discards are complete.
>> */
>> -int
>> +void
>> xfs_discard_extents(
>> struct xfs_mount *mp,
>> struct xfs_busy_extents *extents)
>> @@ -116,7 +116,6 @@ xfs_discard_extents(
>> struct xfs_extent_busy *busyp;
>> struct bio *bio = NULL;
>> struct blk_plug plug;
>> - int error = 0;
>> blk_start_plug(&plug);
>> list_for_each_entry(busyp, &extents->extent_list, list) {
>> @@ -126,18 +125,10 @@ xfs_discard_extents(
>> trace_xfs_discard_extent(xg, busyp->bno, busyp->length);
>> - error = __blkdev_issue_discard(btp->bt_bdev,
>> + __blkdev_issue_discard(btp->bt_bdev,
>> xfs_gbno_to_daddr(xg, busyp->bno),
>> XFS_FSB_TO_BB(mp, busyp->length),
>> GFP_KERNEL, &bio);
>
> If blk_alloc_discard_bio() fails to allocate a bio inside
> __blkdev_issue_discard(), this may lead to an invalid loop in
> list_for_each_entry{}. Instead of using __blkdev_issue_discard(), how
> about allocate and submit the discard bios explicitly in
> list_for_each_entry{}?
>
> Yongpeng,
Calling __blkdev_issue_discard() keeps managing all the bio with the
appropriate GFP mask, so the semantics stay the same. You are just
moving memory allocation to the caller and potentially looking at
implementing retry on bio allocation failure.
The retry for discard bio memory allocation is not desired I think,
since it's only a hint to the controller.
This patch is simply cleaning up dead error-handling branches at the
callers no behavioral changes intended.
What maybe useful is to stop iterating once we fail to allocate the
bio [1].
-ck
[1] Potential addition on the top of this to exit early in discard loop
on bio allocation failure.
diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index b6ffe4807a11..1519f708bb79 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -129,6 +129,13 @@ xfs_discard_extents(
xfs_gbno_to_daddr(xg, busyp->bno),
XFS_FSB_TO_BB(mp, busyp->length),
GFP_KERNEL, &bio);
+ /*
+ * We failed to allocate bio instead of continuing the loop
+ * so it will lead to inconsistent discards to the disk
+ * exit early and jump into xfs_discard_busy_clear().
+ */
+ if (!bio)
+ break;
}
if (bio) {
If we keep looping after the first bio == NULL, the rest of the range is
guaranteed to be inconsistent anyways, because every subsequent iteration
will fall into one of three cases:
- The allocator keeps returning NULL, so none of the remaining LBAs receive
discard.
- Rest of the allocator succeeds, but we’ve already skipped a chunk, leaving
a hole in the discard range.
- We get intermittent successes, which produces alternating chunks of
discarded and undiscarded blocks.
In each of those scenarios, the disk ends up with a partially discarded
range, so the correct fix is to break out of the loop immediately and
proceed to xfs_discard_busy_clear() once the very first allocation fails.
Powered by blists - more mailing lists