lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b526ae9-cba6-35dd-0424-61e8fa5ab016@linux.alibaba.com>
Date:   Tue, 19 May 2020 11:29:17 +0800
From:   JeffleXu <jefflexu@...ux.alibaba.com>
To:     Eric Whitney <enwlinux@...il.com>
Cc:     linux-ext4@...r.kernel.org, tytso@....edu,
        joseph.qi@...ux.alibaba.com
Subject: Re: [PATCH RFC] ext4: fix partial cluster initialization when
 splitting extent


On 5/19/20 6:08 AM, Eric Whitney wrote:
> Hi, Jeffle:
>
> What kernel were you running when you observed your failures?  Does your
> patch resolve all observed failures, or do any remain?  Do you have a
> simple test script that reproduces the bug?
>
> I've made almost 1000 runs of shared/298 on various bigalloc configurations
> using Ted's test appliance on 5.7-rc5 and have not observed a failure.
> Several auto group runs have also passed without failures.  Ideally, I'd
> like to be able to reproduce your failure to be sure we fully understand
> what's going on.  It's still the case that the "2" is wrong, but I think
> that code in rm_leaf may be involved in an unexpected way.
>
> Thanks,
> Eric

Hi Eric,

Following on is my test environment.


kernel: 5.7-rc4-git-eb24fdd8e6f5c6bb95129748a1801c6476492aba

e2fsprog: latest release version 1.45.6 (20-Mar-2020)

xfstests: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git, master 
branch, latest commit


1. Test device

I run the test in a VM and the VM is setup by qemu. The size of vdb is 1G,

```

#lsblk

NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vdb    254:16   0   1G  0 disk

```


and is initialized by:

```

qemu-img create -f qcow2 /XX/disk1.qcow2 1G

qemu-kvm -drive file=/XX/disk1.qcow2,if=virtio,format=qcow2 ...

```


2. Test script


local.config of xfstests is like:

export TEST_DEV=/dev/vdb
export TEST_DIR=/mnt/test
export SCRATCH_DEV=/dev/vdc
export SCRATCH_MNT=/mnt/scratch


Following on is an example script to reproduce the failure:

```sh

#!/bin/bash

for i in `seq 100`; do
         echo y | mkfs.ext4 -O bigalloc -C 16K /dev/vdb

         ./check shared/298
         status=$?

         if [[ $status == 1 ]]; then
                 echo "$i exit"
                 exit
         fi
done

```


Indeed the failure occurs occasionally. Sometimes the script stops at 
iteration 4, or sometimes

at iteration 2, 7, 24.


The failure occurs with the following dmesg report:

```

[  387.471876] EXT4-fs error (device vdb): mb_free_blocks:1457: group 1, 
block 158084:freeing already freed block (bit 6753); block bitmap corrupt.
[  387.473729] EXT4-fs error (device vdb): ext4_mb_generate_buddy:747: 
group 1, block bitmap and bg descriptor inconsistent: 19550 vs 19551 
free clusters

```


3. About the applied patch

The applied patch does fix the failure in my test environment. At least 
the failure doesn't occur after running the full 100 iterations.


Thanks

Jeffle



Powered by blists - more mailing lists