lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39b76473-fe00-0f1b-62e3-ae349a9f80d3@bytedance.com>
Date:   Tue, 28 Nov 2023 17:39:50 +0800
From:   Jiachen Zhang <zhangjiachen.jaycee@...edance.com>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     Chandan Babu R <chandan.babu@...cle.com>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Dave Chinner <dchinner@...hat.com>,
        Allison Henderson <allison.henderson@...cle.com>,
        Zhang Tianci <zhangtianci.1997@...edance.com>,
        Brian Foster <bfoster@...hat.com>, Ben Myers <bpm@....com>,
        linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
        xieyongji@...edance.com, me@...x.top
Subject: [PATCH 2/2] xfs: update dir3 leaf block metadata after swap

On 2023/11/28 16:39, Christoph Hellwig wrote:
> On Tue, Nov 28, 2023 at 01:32:02PM +0800, Jiachen Zhang wrote:
>> From: Zhang Tianci <zhangtianci.1997@...edance.com>
>>
>> xfs_da3_swap_lastblock() copy the last block content to the dead block,
>> but do not update the metadata in it. We need update some metadata
>> for some kinds of type block, such as dir3 leafn block records its
>> blkno, we shall update it to the dead block blkno. Otherwise,
>> before write the xfs_buf to disk, the verify_write() will fail in
>> blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown.
> 
> Do you have a reproducer for this?  It would be very helpful to add it
> to xfstests.

Hi Christoph,

Thanks for the review!

It's hard to reproduce the issue. Currently we can reproduce it with
some kernel code changes. We forcely reserve 0 t_blk_res for xfs_remove
on kernel version 4.19:

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index f2d06e1e4906..c8f84b95a0ec 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2551,13 +2551,8 @@ xfs_remove(
          * insert tries to happen, instead trimming the LAST
          * block from the directory.
          */
-       resblks = XFS_REMOVE_SPACE_RES(mp);
-       error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, resblks, 0, 
0, &tp);
-       if (error == -ENOSPC) {
-               resblks = 0;
-               error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0,
-                               &tp);
-       }
+       resblks = 0;
+       error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0, &tp);
         if (error) {
                 ASSERT(error != -ENOSPC);
                 goto std_return


After insmod the new modified xfs.ko, run the following scripts, and it
can reproduce the problem consistently on the final `umount mnt`:

fallocate -l 1G xfs.img
mkfs.xfs -f xfs.img
mkdir -p mnt
losetup /dev/loop0 xfs.img
mount -t xfs /dev/loop0 mnt
pushd mnt
mkdir dir3
prefix="a_"
for j in $(seq 0 13); do
     for i in $(seq 0 2800); do
             touch dir3/${prefix}_${i}_${j}
     done
     for i in $(seq 0 2500); do
             rm -f dir3/${prefix}_${i}_${j}
             if [ "$i" == "2094" ] && [ "$j" == "13" ]; then
                     echo "should reproduce now, so break here!"
                     break;
             fi
     done
done
popd
umount mnt


We are still trying to make a reproducer without any kernel changes. Do
you have any suggestions on this?


> 
>>
>> We will get this warning:
>>
>>    XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178
>>    XFS (dm-0): Unmount and run xfs_repair
>>    XFS (dm-0): First 128 bytes of corrupted metadata buffer:
>>    00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00  ........=.......
>>    000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00  ................
>>    000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf  .D...dDA..`.PC..
>>    00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00  .........s......
>>    00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00  .).8.....).@....
>>    000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00  .).H.....I......
>>    00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe  .I....E%.I....H.
>>    0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97  .I....Lk.I. ..M.
>>    XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c.  Return address = 00000000c0ff63c1
>>    XFS (dm-0): Corruption of in-memory data detected.  Shutting down filesystem
>>    XFS (dm-0): Please umount the filesystem and rectify the problem(s)
>>
>> >From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record
>> its blkno is 0x1a0.
>>
>> Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks")
>> Signed-off-by: Zhang Tianci <zhangtianci.1997@...edance.com>
>> ---
>>   fs/xfs/libxfs/xfs_da_btree.c | 12 +++++++++++-
>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
>> index e576560b46e9..35f70e4c6447 100644
>> --- a/fs/xfs/libxfs/xfs_da_btree.c
>> +++ b/fs/xfs/libxfs/xfs_da_btree.c
>> @@ -2318,8 +2318,18 @@ xfs_da3_swap_lastblock(
>>   	 * Copy the last block into the dead buffer and log it.
>>   	 */
>>   	memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
>> -	xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
>>   	dead_info = dead_buf->b_addr;
>> +	/*
>> +	 * Update the moved block's blkno if it's a dir3 leaf block
>> +	 */
>> +	if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
>> +	    dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
>> +	    dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
>> +		struct xfs_da3_blkinfo *dap = (struct xfs_da3_blkinfo *)dead_info;
>> +
>> +		dap->blkno = cpu_to_be64(dead_buf->b_bn);
>> +	}
>> +	xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
> 
> The fix here looks correct to me, but also a little ugly and ad-hoc.
> 
> At last we should be using container_of and not casts for getting from a
> xfs_da_blkinfo to a xfs_da3_blkinfo (even if there is bad precedence
> for the cast in existing code).
> 

Thanks, we will optimize the code in the next version of the patchset.

> But I think it would be useful to add a helper that stamps in the blkno
> in for a caller that only has as xfs_da_blkinfo but no xfs_da3_blkinfo
> and use in all the places that do it currently in an open coded fashion
> e.g. xfs_da3_root_join, xfs_da3_root_split, xfs_attr3_leaf_to_node.
> 
> That should probably be done on top of the small backportable fix.
> 

I think the idea to add helper is great, and we can do it after this
fixes patch is merged.


Thanks,
Jiachen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ