[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b526ae9-cba6-35dd-0424-61e8fa5ab016@linux.alibaba.com>
Date: Tue, 19 May 2020 11:29:17 +0800
From: JeffleXu <jefflexu@...ux.alibaba.com>
To: Eric Whitney <enwlinux@...il.com>
Cc: linux-ext4@...r.kernel.org, tytso@....edu,
joseph.qi@...ux.alibaba.com
Subject: Re: [PATCH RFC] ext4: fix partial cluster initialization when
splitting extent
On 5/19/20 6:08 AM, Eric Whitney wrote:
> Hi, Jeffle:
>
> What kernel were you running when you observed your failures? Does your
> patch resolve all observed failures, or do any remain? Do you have a
> simple test script that reproduces the bug?
>
> I've made almost 1000 runs of shared/298 on various bigalloc configurations
> using Ted's test appliance on 5.7-rc5 and have not observed a failure.
> Several auto group runs have also passed without failures. Ideally, I'd
> like to be able to reproduce your failure to be sure we fully understand
> what's going on. It's still the case that the "2" is wrong, but I think
> that code in rm_leaf may be involved in an unexpected way.
>
> Thanks,
> Eric
Hi Eric,
Following on is my test environment.
kernel: 5.7-rc4-git-eb24fdd8e6f5c6bb95129748a1801c6476492aba
e2fsprog: latest release version 1.45.6 (20-Mar-2020)
xfstests: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git, master
branch, latest commit
1. Test device
I run the test in a VM and the VM is setup by qemu. The size of vdb is 1G,
```
#lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vdb 254:16 0 1G 0 disk
```
and is initialized by:
```
qemu-img create -f qcow2 /XX/disk1.qcow2 1G
qemu-kvm -drive file=/XX/disk1.qcow2,if=virtio,format=qcow2 ...
```
2. Test script
local.config of xfstests is like:
export TEST_DEV=/dev/vdb
export TEST_DIR=/mnt/test
export SCRATCH_DEV=/dev/vdc
export SCRATCH_MNT=/mnt/scratch
Following on is an example script to reproduce the failure:
```sh
#!/bin/bash
for i in `seq 100`; do
echo y | mkfs.ext4 -O bigalloc -C 16K /dev/vdb
./check shared/298
status=$?
if [[ $status == 1 ]]; then
echo "$i exit"
exit
fi
done
```
Indeed the failure occurs occasionally. Sometimes the script stops at
iteration 4, or sometimes
at iteration 2, 7, 24.
The failure occurs with the following dmesg report:
```
[ 387.471876] EXT4-fs error (device vdb): mb_free_blocks:1457: group 1,
block 158084:freeing already freed block (bit 6753); block bitmap corrupt.
[ 387.473729] EXT4-fs error (device vdb): ext4_mb_generate_buddy:747:
group 1, block bitmap and bg descriptor inconsistent: 19550 vs 19551
free clusters
```
3. About the applied patch
The applied patch does fix the failure in my test environment. At least
the failure doesn't occur after running the full 100 iterations.
Thanks
Jeffle
Powered by blists - more mailing lists