[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240815174229.GI865349@frogsfrogsfrogs>
Date: Thu, 15 Aug 2024 10:42:29 -0700
From: "Darrick J. Wong" <djwong@...nel.org>
To: Zizhi Wo <wozizhi@...wei.com>
Cc: chandan.babu@...cle.com, dchinner@...hat.com, osandov@...com,
john.g.garry@...cle.com, linux-xfs@...r.kernel.org,
linux-kernel@...r.kernel.org, yangerkun@...wei.com
Subject: Re: [PATCH V3 2/2] xfs: Fix missing interval for missing_owner in
xfs fsmap
On Mon, Aug 12, 2024 at 09:15:05AM +0800, Zizhi Wo wrote:
> In the fsmap query of xfs, there is an interval missing problem:
> [root@...ora ~]# xfs_io -c 'fsmap -vvvv' /mnt
> EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL
> 0: 253:16 [0..7]: static fs metadata 0 (0..7) 8
> 1: 253:16 [8..23]: per-AG metadata 0 (8..23) 16
> 2: 253:16 [24..39]: inode btree 0 (24..39) 16
> 3: 253:16 [40..47]: per-AG metadata 0 (40..47) 8
> 4: 253:16 [48..55]: refcount btree 0 (48..55) 8
> 5: 253:16 [56..103]: per-AG metadata 0 (56..103) 48
> 6: 253:16 [104..127]: free space 0 (104..127) 24
> ......
>
> BUG:
> [root@...ora ~]# xfs_io -c 'fsmap -vvvv -d 104 107' /mnt
> [root@...ora ~]#
> Normally, we should be able to get [104, 107), but we got nothing.
>
> The problem is caused by shifting. The query for the problem-triggered
> scenario is for the missing_owner interval (e.g. freespace in rmapbt/
> unknown space in bnobt), which is obtained by subtraction (gap). For this
> scenario, the interval is obtained by info->last. However, rec_daddr is
> calculated based on the start_block recorded in key[1], which is converted
> by calling XFS_BB_TO_FSBT. Then if rec_daddr does not exceed
> info->next_daddr, which means keys[1].fmr_physical >> (mp)->m_blkbb_log
> <= info->next_daddr, no records will be displayed. In the above example,
> 104 >> (mp)->m_blkbb_log = 12 and 107 >> (mp)->m_blkbb_log = 12, so the two
> are reduced to 0 and the gap is ignored:
>
> before calculate ----------------> after shifting
> 104(st) 107(ed) 12(st/ed)
> |---------| |
> sector size block size
>
> Resolve this issue by introducing the "end_daddr" field in
> xfs_getfsmap_info. This records |key[1].fmr_physical + key[1].length| at
> the granularity of sector. If the current query is the last, the rec_daddr
> is end_daddr to prevent missing interval problems caused by shifting. We
> only need to focus on the last query, because xfs disks are internally
> aligned with disk blocksize that are powers of two and minimum 512, so
> there is no problem with shifting in previous queries.
>
> After applying this patch, the above problem have been solved:
> [root@...ora ~]# xfs_io -c 'fsmap -vvvv -d 104 107' /mnt
> EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL
> 0: 253:16 [104..106]: free space 0 (104..106) 3
>
> Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl")
> Signed-off-by: Zizhi Wo <wozizhi@...wei.com>
> ---
> fs/xfs/xfs_fsmap.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
> index d346acff7725..4ae273b54129 100644
> --- a/fs/xfs/xfs_fsmap.c
> +++ b/fs/xfs/xfs_fsmap.c
> @@ -162,6 +162,7 @@ struct xfs_getfsmap_info {
> xfs_daddr_t next_daddr; /* next daddr we expect */
> /* daddr of low fsmap key when we're using the rtbitmap */
> xfs_daddr_t low_daddr;
> + xfs_daddr_t end_daddr; /* daddr of high fsmap key */
This ought to be initialized to an obviously impossible value (e.g.
-1ULL) in xfs_getfsmap before we start walking btrees.
> u64 missing_owner; /* owner of holes */
> u32 dev; /* device id */
> /*
> @@ -294,6 +295,13 @@ xfs_getfsmap_helper(
> return 0;
> }
>
> + /*
> + * To prevent missing intervals in the last query, consider using
> + * sectors as the granularity.
> + */
> + if (info->last && info->end_daddr)
> + rec_daddr = info->end_daddr;
I think this needs a better comment. How about:
/*
* For an info->last query, we're looking for a gap between the
* last mapping emitted and the high key specified by userspace.
* If the user's query spans less than 1 fsblock, then
* info->high and info->low will have the same rm_startblock,
* which causes rec_daddr and next_daddr to be the same.
* Therefore, use the end_daddr that we calculated from
* userspace's high key to synthesize the record. Note that if
* the btree query found a mapping, there won't be a gap.
*/
> +
> /* Are we just counting mappings? */
> if (info->head->fmh_count == 0) {
> if (info->head->fmh_entries == UINT_MAX)
> @@ -973,8 +981,10 @@ xfs_getfsmap(
> * low key, zero out the low key so that we get
> * everything from the beginning.
> */
> - if (handlers[i].dev == head->fmh_keys[1].fmr_device)
> + if (handlers[i].dev == head->fmh_keys[1].fmr_device) {
> dkeys[1] = head->fmh_keys[1];
> + info.end_daddr = dkeys[1].fmr_physical + dkeys[1].fmr_length;
dkeys[1].fmr_length is never used by anything in the fsmap code --
__xfs_getfsmap_datadev sets end_fsb using only dkeys[1].fmr_physical.
You shouldn't add it to end_daddr here because then they won't be
describing the same thing.
Anyway I'll figure out a reproducer for fstests and send the whole pile
back to the mailing list once it passes QA. Thanks for finding the bug
and attempting a fix. :)
--D
> + }
> if (handlers[i].dev > head->fmh_keys[0].fmr_device)
> memset(&dkeys[0], 0, sizeof(struct xfs_fsmap));
>
> --
> 2.39.2
>
>
Powered by blists - more mailing lists