lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1549f04a-8431-405d-adfc-23e5988abe51@huawei.com>
Date: Fri, 8 Nov 2024 10:29:08 +0800
From: Zizhi Wo <wozizhi@...wei.com>
To: "Darrick J. Wong" <djwong@...nel.org>
CC: <chandan.babu@...cle.com>, <dchinner@...hat.com>, <osandov@...com>,
	<john.g.garry@...cle.com>, <linux-xfs@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <yangerkun@...wei.com>
Subject: Re: [PATCH 1/2] xfs: Fix missing block calculations in xfs datadev
 fsmap



在 2024/11/8 7:43, Darrick J. Wong 写道:
> On Mon, Aug 26, 2024 at 11:10:04AM +0800, Zizhi Wo wrote:
>> In xfs datadev fsmap query, I noticed a missing block calculation problem:
>> [root@...ora ~]# xfs_db -r -c "sb 0" -c "p" /dev/vdb
>> magicnum = 0x58465342
>> blocksize = 4096
>> dblocks = 5242880
>> ......
>> [root@...ora ~]# xfs_io -c 'fsmap -vvvv' /mnt
>> ...
>> 30: 253:16 [31457384..41943031]: free space            3  (104..10485751)    10485648
>>
>> (41943031 + 1) / 8 = 5242879 != 5242880
>> We missed one block in our fsmap calculation!
> 
> Eek.
> 
>> The root cause of the problem lies in __xfs_getfsmap_datadev(), where the
>> calculation of "end_fsb" requires a classification discussion. If "end_fsb"
>> is calculated based on "eofs", we need to add an extra sentinel node for
>> subsequent length calculations. Otherwise, one block will be missed. If
>> "end_fsb" is calculated based on "keys[1]", then there is no need to add an
>> extra node. Because "keys[1]" itself is unreachable, it cancels out one of
>> the additions. The diagram below illustrates this:
>>
>> |0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|-----eofs
>> |---------------|---------------------|
>> a       n       b         n+1         c
>>
>> Assume that eofs is 16, the start address of the previous query is block n,
>> sector 0, and the length is 1, so the "info->next" is at point b, sector 8.
>> In the last query, suppose the "rm_startblock" calculated based on
>> "eofs - 1" is the last block n+1 at point b. All we get is the starting
>> address of the block, not the end. Therefore, an additional sentinel node
>> needs to be added to move it to point c. After that, subtracting one from
>> the other will yield the remaining 1.
>>
>> Although we can now calculate the exact last query using "info->end_daddr",
>> we will still get an incorrect value if the device at this point is not the
>> boundary device specified by "keys[1]", as "end_daddr" is still the initial
>> value. Therefore, the eofs situation here needs to be corrected. The issue
>> is resolved by adding a sentinel node.
> 
> Why don't we set end_daddr unconditionally, then?
> 
> Hmm, looking at the end_daddr usage in fsmap.c, I think it's wrong.  If
> end_daddr is set at all, it's set either to the last sector for which
> the user wants a mapping; or it's set to the last sector for the device.
> But then look at how we use it:
> 
> 	if (info->last...)
> 		frec->start_daddr = info->end_daddr;
> 
> 	...
> 
> 	/* "report the gap..."
> 	if (frec->start_daddr > info->next_daddr) {
> 		fmr.fmr_length = frec->start_daddr - info->next_daddr;
> 	}
> 
> This is wrong -- we're using start_daddr to compute the distance from
> the last mapping that we output up to the end of the range that we want.
> The "end of the range" is modeled with a phony rmap record that starts
> at the first fsblock after that range.
> 

In the current code, we set "rec_daddr = end_daddr" only when
(info->last && info->end_daddr != NULL), which should ensure that this
is the last query? Because end_daddr is set to the last device, and
info->last is set to the last query. Therefore, assigning it to
start_daddr should not cause issues in the next query?

Did I misunderstand something? Or is it because the latest code
constantly updates end_daddr, which is why this issue arises?

> IOWs, that assignment should have been
> frec->start_daddr = info->end_daddr + 1.
> 
> Granted in August the codebase was less clear about the difference
> between rec_daddr and rmap->rm_startblock.  For 6.13, hch cleaned all
> that up -- rec_daddr is now called start_daddr and the fsmap code passes
> rmap records with space numbers in units of daddrs via a new struct
> xfs_fsmap_rec.  Unfortunately, that's all buried in the giant pile of
> pull requests I sent a couple of days ago which hasn't shown up on
> for-next yet.
> 
> https://lore.kernel.org/linux-xfs/173084396955.1871025.18156568347365549855.stgit@frogsfrogsfrogs/
> 
> So I think I know how to fix this against the 6.13 codebase, but I'm
> going to take a slightly different approach than yours...
> 
>> Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl")
>> Signed-off-by: Zizhi Wo <wozizhi@...wei.com>
>> ---
>>   fs/xfs/xfs_fsmap.c | 19 +++++++++++++++++--
>>   1 file changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
>> index 85dbb46452ca..8a2dfe96dae7 100644
>> --- a/fs/xfs/xfs_fsmap.c
>> +++ b/fs/xfs/xfs_fsmap.c
>> @@ -596,12 +596,27 @@ __xfs_getfsmap_datadev(
>>   	xfs_agnumber_t			end_ag;
>>   	uint64_t			eofs;
>>   	int				error = 0;
>> +	int				sentinel = 0;
>>   
>>   	eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
>>   	if (keys[0].fmr_physical >= eofs)
>>   		return 0;
>>   	start_fsb = XFS_DADDR_TO_FSB(mp, keys[0].fmr_physical);
>> -	end_fsb = XFS_DADDR_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
>> +	/*
>> +	 * For the case of eofs, we need to add a sentinel node;
>> +	 * otherwise, one block will be missed when calculating the length
>> +	 * in the last query.
>> +	 * For the case of key[1], there is no need to add a sentinel node
>> +	 * because it already represents a value that cannot be reached.
>> +	 * For the case where key[1] after shifting is within the same
>> +	 * block as the starting address, it is resolved using end_daddr.
>> +	 */
>> +	if (keys[1].fmr_physical > eofs - 1) {
>> +		sentinel = 1;
>> +		end_fsb = XFS_DADDR_TO_FSB(mp, eofs - 1);
>> +	} else {
>> +		end_fsb = XFS_DADDR_TO_FSB(mp, keys[1].fmr_physical);
>> +	}
> 
> ...because running against djwong-wtf, I actually see the same symptoms
> for the realtime device.  So I think a better solution is to change
> xfs_getfsmap to set end_daddr always, and then fix the off by one error.
> 

Yes, my second patch looks at this rt problem...
Thank you for your reply

Thanks,
Zizhi Wo


> I also don't really like "sentinel" values because they're not
> intuitive.
> 
> I will also go update xfs/273 to check that there are no gaps in the
> mappings returned, and that they go to where the filesystem thinks is
> the end of the device.  Thanks for reporting this, sorry I was too busy
> trying to get metadir/rtgroups done to look at this until now. :(
> 
> --D
> 



>>   
>>   	/*
>>   	 * Convert the fsmap low/high keys to AG based keys.  Initialize
>> @@ -649,7 +664,7 @@ __xfs_getfsmap_datadev(
>>   		info->pag = pag;
>>   		if (pag->pag_agno == end_ag) {
>>   			info->high.rm_startblock = XFS_FSB_TO_AGBNO(mp,
>> -					end_fsb);
>> +					end_fsb) + sentinel;
>>   			info->high.rm_offset = XFS_BB_TO_FSBT(mp,
>>   					keys[1].fmr_offset);
>>   			error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]);
>> -- 
>> 2.39.2
>>
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ