lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 22 May 2020 11:09:37 +0800 From: JeffleXu <jefflexu@...ux.alibaba.com> To: Eric Whitney <enwlinux@...il.com> Cc: linux-ext4@...r.kernel.org, tytso@....edu, joseph.qi@...ux.alibaba.com Subject: Re: [PATCH RFC] ext4: fix partial cluster initialization when splitting extent Thanks for reviewing. I will send a formal patch later ;) Thanks, Jeffle On 5/22/20 5:26 AM, Eric Whitney wrote: > * JeffleXu <jefflexu@...ux.alibaba.com>: >> On 5/19/20 6:08 AM, Eric Whitney wrote: >>> Hi, Jeffle: >>> >>> What kernel were you running when you observed your failures? Does your >>> patch resolve all observed failures, or do any remain? Do you have a >>> simple test script that reproduces the bug? >>> >>> I've made almost 1000 runs of shared/298 on various bigalloc configurations >>> using Ted's test appliance on 5.7-rc5 and have not observed a failure. >>> Several auto group runs have also passed without failures. Ideally, I'd >>> like to be able to reproduce your failure to be sure we fully understand >>> what's going on. It's still the case that the "2" is wrong, but I think >>> that code in rm_leaf may be involved in an unexpected way. >>> >>> Thanks, >>> Eric >> Hi Eric, >> >> Following on is my test environment. >> >> >> kernel: 5.7-rc4-git-eb24fdd8e6f5c6bb95129748a1801c6476492aba >> >> e2fsprog: latest release version 1.45.6 (20-Mar-2020) >> >> xfstests: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git, master >> branch, latest commit >> >> >> 1. Test device >> >> I run the test in a VM and the VM is setup by qemu. The size of vdb is 1G, >> >> ``` >> >> #lsblk >> >> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT >> vdb 254:16 0 1G 0 disk >> >> ``` >> >> >> and is initialized by: >> >> ``` >> >> qemu-img create -f qcow2 /XX/disk1.qcow2 1G >> >> qemu-kvm -drive file=/XX/disk1.qcow2,if=virtio,format=qcow2 ... >> >> ``` >> >> >> 2. Test script >> >> >> local.config of xfstests is like: >> >> export TEST_DEV=/dev/vdb >> export TEST_DIR=/mnt/test >> export SCRATCH_DEV=/dev/vdc >> export SCRATCH_MNT=/mnt/scratch >> >> >> Following on is an example script to reproduce the failure: >> >> ```sh >> >> #!/bin/bash >> >> for i in `seq 100`; do >> echo y | mkfs.ext4 -O bigalloc -C 16K /dev/vdb >> >> ./check shared/298 >> status=$? >> >> if [[ $status == 1 ]]; then >> echo "$i exit" >> exit >> fi >> done >> >> ``` >> >> >> Indeed the failure occurs occasionally. Sometimes the script stops at >> iteration 4, or sometimes >> >> at iteration 2, 7, 24. >> >> >> The failure occurs with the following dmesg report: >> >> ``` >> >> [ 387.471876] EXT4-fs error (device vdb): mb_free_blocks:1457: group 1, >> block 158084:freeing already freed block (bit 6753); block bitmap corrupt. >> [ 387.473729] EXT4-fs error (device vdb): ext4_mb_generate_buddy:747: group >> 1, block bitmap and bg descriptor inconsistent: 19550 vs 19551 free clusters >> >> ``` >> >> >> 3. About the applied patch >> >> The applied patch does fix the failure in my test environment. At least the >> failure doesn't occur after running the full 100 iterations. >> >> >> Thanks >> >> Jeffle >> >> >> > Hi, Jeffle: > > Thanks for that information. I'm still unable to reproduce your failure, > but by inspection your patch clearly fixes a bug, and of course, you're seeing > that. I suspect the code in rm_leaf that also sets the partial cluster nofree > state is masking the bug in my testing. In your case, my best guess is that > your testing is occasionally getting into the retry loop for EAGAIN in > remove_space. This would effectively expose the bug again and could lead to > the failure you've described. > > Your patch has survived all the heavy testing I've thrown at it. So, please > repost your RFC patch as a fix, and feel free to add: > Reviewed-by: Eric Whitney <enwlinux@...il.com> > > This points out that the cluster freeing code really needs to be cleaned up, > so I'm working on a patch series that does that. > > Thanks for your patience, > Eric
Powered by blists - more mailing lists