[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241108173006.GA168069@frogsfrogsfrogs>
Date: Fri, 8 Nov 2024 09:30:06 -0800
From: "Darrick J. Wong" <djwong@...nel.org>
To: Zizhi Wo <wozizhi@...wei.com>
Cc: chandan.babu@...cle.com, dchinner@...hat.com, osandov@...com,
john.g.garry@...cle.com, linux-xfs@...r.kernel.org,
linux-kernel@...r.kernel.org, yangerkun@...wei.com
Subject: Re: [PATCH 1/2] xfs: Fix missing block calculations in xfs datadev
fsmap
On Fri, Nov 08, 2024 at 10:29:08AM +0800, Zizhi Wo wrote:
>
>
> 在 2024/11/8 7:43, Darrick J. Wong 写道:
> > On Mon, Aug 26, 2024 at 11:10:04AM +0800, Zizhi Wo wrote:
> > > In xfs datadev fsmap query, I noticed a missing block calculation problem:
> > > [root@...ora ~]# xfs_db -r -c "sb 0" -c "p" /dev/vdb
> > > magicnum = 0x58465342
> > > blocksize = 4096
> > > dblocks = 5242880
> > > ......
> > > [root@...ora ~]# xfs_io -c 'fsmap -vvvv' /mnt
> > > ...
> > > 30: 253:16 [31457384..41943031]: free space 3 (104..10485751) 10485648
> > >
> > > (41943031 + 1) / 8 = 5242879 != 5242880
> > > We missed one block in our fsmap calculation!
> >
> > Eek.
> >
> > > The root cause of the problem lies in __xfs_getfsmap_datadev(), where the
> > > calculation of "end_fsb" requires a classification discussion. If "end_fsb"
> > > is calculated based on "eofs", we need to add an extra sentinel node for
> > > subsequent length calculations. Otherwise, one block will be missed. If
> > > "end_fsb" is calculated based on "keys[1]", then there is no need to add an
> > > extra node. Because "keys[1]" itself is unreachable, it cancels out one of
> > > the additions. The diagram below illustrates this:
> > >
> > > |0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|-----eofs
> > > |---------------|---------------------|
> > > a n b n+1 c
> > >
> > > Assume that eofs is 16, the start address of the previous query is block n,
> > > sector 0, and the length is 1, so the "info->next" is at point b, sector 8.
> > > In the last query, suppose the "rm_startblock" calculated based on
> > > "eofs - 1" is the last block n+1 at point b. All we get is the starting
> > > address of the block, not the end. Therefore, an additional sentinel node
> > > needs to be added to move it to point c. After that, subtracting one from
> > > the other will yield the remaining 1.
> > >
> > > Although we can now calculate the exact last query using "info->end_daddr",
> > > we will still get an incorrect value if the device at this point is not the
> > > boundary device specified by "keys[1]", as "end_daddr" is still the initial
> > > value. Therefore, the eofs situation here needs to be corrected. The issue
> > > is resolved by adding a sentinel node.
> >
> > Why don't we set end_daddr unconditionally, then?
> >
> > Hmm, looking at the end_daddr usage in fsmap.c, I think it's wrong. If
> > end_daddr is set at all, it's set either to the last sector for which
> > the user wants a mapping; or it's set to the last sector for the device.
> > But then look at how we use it:
> >
> > if (info->last...)
> > frec->start_daddr = info->end_daddr;
> >
> > ...
> >
> > /* "report the gap..."
> > if (frec->start_daddr > info->next_daddr) {
> > fmr.fmr_length = frec->start_daddr - info->next_daddr;
> > }
> >
> > This is wrong -- we're using start_daddr to compute the distance from
> > the last mapping that we output up to the end of the range that we want.
> > The "end of the range" is modeled with a phony rmap record that starts
> > at the first fsblock after that range.
> >
>
> In the current code, we set "rec_daddr = end_daddr" only when
> (info->last && info->end_daddr != NULL), which should ensure that this
> is the last query?
Right.
> Because end_daddr is set to the last device, and
> info->last is set to the last query. Therefore, assigning it to
> start_daddr should not cause issues in the next query?
Right, the code currently sets end_daddr only for the last device, so
there won't be any issues with the next query.
That said, we reset the xfs_getfsmap_info state between each device, so
it's safe to set end_daddr for every device, not just the last time
through that loop.
> Did I misunderstand something? Or is it because the latest code
> constantly updates end_daddr, which is why this issue arises?
The 6.13 metadir/rtgroups patches didn't change when end_daddr gets set,
but my fixpatch *does* make it set end_daddr for all devices. Will send
a patch + fstests update shortly to demonstrate. :)
> > IOWs, that assignment should have been
> > frec->start_daddr = info->end_daddr + 1.
> >
> > Granted in August the codebase was less clear about the difference
> > between rec_daddr and rmap->rm_startblock. For 6.13, hch cleaned all
> > that up -- rec_daddr is now called start_daddr and the fsmap code passes
> > rmap records with space numbers in units of daddrs via a new struct
> > xfs_fsmap_rec. Unfortunately, that's all buried in the giant pile of
> > pull requests I sent a couple of days ago which hasn't shown up on
> > for-next yet.
> >
> > https://lore.kernel.org/linux-xfs/173084396955.1871025.18156568347365549855.stgit@frogsfrogsfrogs/
> >
> > So I think I know how to fix this against the 6.13 codebase, but I'm
> > going to take a slightly different approach than yours...
> >
> > > Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl")
> > > Signed-off-by: Zizhi Wo <wozizhi@...wei.com>
> > > ---
> > > fs/xfs/xfs_fsmap.c | 19 +++++++++++++++++--
> > > 1 file changed, 17 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
> > > index 85dbb46452ca..8a2dfe96dae7 100644
> > > --- a/fs/xfs/xfs_fsmap.c
> > > +++ b/fs/xfs/xfs_fsmap.c
> > > @@ -596,12 +596,27 @@ __xfs_getfsmap_datadev(
> > > xfs_agnumber_t end_ag;
> > > uint64_t eofs;
> > > int error = 0;
> > > + int sentinel = 0;
> > > eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
> > > if (keys[0].fmr_physical >= eofs)
> > > return 0;
> > > start_fsb = XFS_DADDR_TO_FSB(mp, keys[0].fmr_physical);
> > > - end_fsb = XFS_DADDR_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
> > > + /*
> > > + * For the case of eofs, we need to add a sentinel node;
> > > + * otherwise, one block will be missed when calculating the length
> > > + * in the last query.
> > > + * For the case of key[1], there is no need to add a sentinel node
> > > + * because it already represents a value that cannot be reached.
> > > + * For the case where key[1] after shifting is within the same
> > > + * block as the starting address, it is resolved using end_daddr.
> > > + */
> > > + if (keys[1].fmr_physical > eofs - 1) {
> > > + sentinel = 1;
> > > + end_fsb = XFS_DADDR_TO_FSB(mp, eofs - 1);
> > > + } else {
> > > + end_fsb = XFS_DADDR_TO_FSB(mp, keys[1].fmr_physical);
> > > + }
> >
> > ...because running against djwong-wtf, I actually see the same symptoms
> > for the realtime device. So I think a better solution is to change
> > xfs_getfsmap to set end_daddr always, and then fix the off by one error.
> >
>
> Yes, my second patch looks at this rt problem...
> Thank you for your reply
<nod>
--D
> Thanks,
> Zizhi Wo
>
>
> > I also don't really like "sentinel" values because they're not
> > intuitive.
> >
> > I will also go update xfs/273 to check that there are no gaps in the
> > mappings returned, and that they go to where the filesystem thinks is
> > the end of the device. Thanks for reporting this, sorry I was too busy
> > trying to get metadir/rtgroups done to look at this until now. :(
> >
> > --D
> >
>
>
>
> > > /*
> > > * Convert the fsmap low/high keys to AG based keys. Initialize
> > > @@ -649,7 +664,7 @@ __xfs_getfsmap_datadev(
> > > info->pag = pag;
> > > if (pag->pag_agno == end_ag) {
> > > info->high.rm_startblock = XFS_FSB_TO_AGBNO(mp,
> > > - end_fsb);
> > > + end_fsb) + sentinel;
> > > info->high.rm_offset = XFS_BB_TO_FSBT(mp,
> > > keys[1].fmr_offset);
> > > error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]);
> > > --
> > > 2.39.2
> > >
> > >
>
Powered by blists - more mailing lists