[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39b76473-fe00-0f1b-62e3-ae349a9f80d3@bytedance.com>
Date: Tue, 28 Nov 2023 17:39:50 +0800
From: Jiachen Zhang <zhangjiachen.jaycee@...edance.com>
To: Christoph Hellwig <hch@...radead.org>
Cc: Chandan Babu R <chandan.babu@...cle.com>,
"Darrick J. Wong" <djwong@...nel.org>,
Dave Chinner <dchinner@...hat.com>,
Allison Henderson <allison.henderson@...cle.com>,
Zhang Tianci <zhangtianci.1997@...edance.com>,
Brian Foster <bfoster@...hat.com>, Ben Myers <bpm@....com>,
linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
xieyongji@...edance.com, me@...x.top
Subject: [PATCH 2/2] xfs: update dir3 leaf block metadata after swap
On 2023/11/28 16:39, Christoph Hellwig wrote:
> On Tue, Nov 28, 2023 at 01:32:02PM +0800, Jiachen Zhang wrote:
>> From: Zhang Tianci <zhangtianci.1997@...edance.com>
>>
>> xfs_da3_swap_lastblock() copy the last block content to the dead block,
>> but do not update the metadata in it. We need update some metadata
>> for some kinds of type block, such as dir3 leafn block records its
>> blkno, we shall update it to the dead block blkno. Otherwise,
>> before write the xfs_buf to disk, the verify_write() will fail in
>> blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown.
>
> Do you have a reproducer for this? It would be very helpful to add it
> to xfstests.
Hi Christoph,
Thanks for the review!
It's hard to reproduce the issue. Currently we can reproduce it with
some kernel code changes. We forcely reserve 0 t_blk_res for xfs_remove
on kernel version 4.19:
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index f2d06e1e4906..c8f84b95a0ec 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2551,13 +2551,8 @@ xfs_remove(
* insert tries to happen, instead trimming the LAST
* block from the directory.
*/
- resblks = XFS_REMOVE_SPACE_RES(mp);
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, resblks, 0,
0, &tp);
- if (error == -ENOSPC) {
- resblks = 0;
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0,
- &tp);
- }
+ resblks = 0;
+ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0, &tp);
if (error) {
ASSERT(error != -ENOSPC);
goto std_return
After insmod the new modified xfs.ko, run the following scripts, and it
can reproduce the problem consistently on the final `umount mnt`:
fallocate -l 1G xfs.img
mkfs.xfs -f xfs.img
mkdir -p mnt
losetup /dev/loop0 xfs.img
mount -t xfs /dev/loop0 mnt
pushd mnt
mkdir dir3
prefix="a_"
for j in $(seq 0 13); do
for i in $(seq 0 2800); do
touch dir3/${prefix}_${i}_${j}
done
for i in $(seq 0 2500); do
rm -f dir3/${prefix}_${i}_${j}
if [ "$i" == "2094" ] && [ "$j" == "13" ]; then
echo "should reproduce now, so break here!"
break;
fi
done
done
popd
umount mnt
We are still trying to make a reproducer without any kernel changes. Do
you have any suggestions on this?
>
>>
>> We will get this warning:
>>
>> XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178
>> XFS (dm-0): Unmount and run xfs_repair
>> XFS (dm-0): First 128 bytes of corrupted metadata buffer:
>> 00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00 ........=.......
>> 000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00 ................
>> 000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf .D...dDA..`.PC..
>> 00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00 .........s......
>> 00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00 .).8.....).@....
>> 000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00 .).H.....I......
>> 00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe .I....E%.I....H.
>> 0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97 .I....Lk.I. ..M.
>> XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c. Return address = 00000000c0ff63c1
>> XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem
>> XFS (dm-0): Please umount the filesystem and rectify the problem(s)
>>
>> >From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record
>> its blkno is 0x1a0.
>>
>> Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks")
>> Signed-off-by: Zhang Tianci <zhangtianci.1997@...edance.com>
>> ---
>> fs/xfs/libxfs/xfs_da_btree.c | 12 +++++++++++-
>> 1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
>> index e576560b46e9..35f70e4c6447 100644
>> --- a/fs/xfs/libxfs/xfs_da_btree.c
>> +++ b/fs/xfs/libxfs/xfs_da_btree.c
>> @@ -2318,8 +2318,18 @@ xfs_da3_swap_lastblock(
>> * Copy the last block into the dead buffer and log it.
>> */
>> memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
>> - xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
>> dead_info = dead_buf->b_addr;
>> + /*
>> + * Update the moved block's blkno if it's a dir3 leaf block
>> + */
>> + if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
>> + dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
>> + dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
>> + struct xfs_da3_blkinfo *dap = (struct xfs_da3_blkinfo *)dead_info;
>> +
>> + dap->blkno = cpu_to_be64(dead_buf->b_bn);
>> + }
>> + xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
>
> The fix here looks correct to me, but also a little ugly and ad-hoc.
>
> At last we should be using container_of and not casts for getting from a
> xfs_da_blkinfo to a xfs_da3_blkinfo (even if there is bad precedence
> for the cast in existing code).
>
Thanks, we will optimize the code in the next version of the patchset.
> But I think it would be useful to add a helper that stamps in the blkno
> in for a caller that only has as xfs_da_blkinfo but no xfs_da3_blkinfo
> and use in all the places that do it currently in an open coded fashion
> e.g. xfs_da3_root_join, xfs_da3_root_split, xfs_attr3_leaf_to_node.
>
> That should probably be done on top of the small backportable fix.
>
I think the idea to add helper is great, and we can do it after this
fixes patch is merged.
Thanks,
Jiachen
Powered by blists - more mailing lists