linux-ext4 - Re: [PATCH RFC] ext4: fix partial cluster initialization when splitting extent

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9b526ae9-cba6-35dd-0424-61e8fa5ab016@linux.alibaba.com>
Date:   Tue, 19 May 2020 11:29:17 +0800
From:   JeffleXu <jefflexu@...ux.alibaba.com>
To:     Eric Whitney <enwlinux@...il.com>
Cc:     linux-ext4@...r.kernel.org, tytso@....edu,
        joseph.qi@...ux.alibaba.com
Subject: Re: [PATCH RFC] ext4: fix partial cluster initialization when
 splitting extent

On 5/19/20 6:08 AM, Eric Whitney wrote:
> Hi, Jeffle:
>
> What kernel were you running when you observed your failures?  Does your
> patch resolve all observed failures, or do any remain?  Do you have a
> simple test script that reproduces the bug?
>
> I've made almost 1000 runs of shared/298 on various bigalloc configurations
> using Ted's test appliance on 5.7-rc5 and have not observed a failure.
> Several auto group runs have also passed without failures.  Ideally, I'd
> like to be able to reproduce your failure to be sure we fully understand
> what's going on.  It's still the case that the "2" is wrong, but I think
> that code in rm_leaf may be involved in an unexpected way.
>
> Thanks,
> Eric

Hi Eric,

Following on is my test environment.

kernel: 5.7-rc4-git-eb24fdd8e6f5c6bb95129748a1801c6476492aba

e2fsprog: latest release version 1.45.6 (20-Mar-2020)

xfstests: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git, master 
branch, latest commit

1. Test device

I run the test in a VM and the VM is setup by qemu. The size of vdb is 1G,

```

#lsblk

NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vdb    254:16   0   1G  0 disk

```

and is initialized by:

```

qemu-img create -f qcow2 /XX/disk1.qcow2 1G

qemu-kvm -drive file=/XX/disk1.qcow2,if=virtio,format=qcow2 ...

```

2. Test script

local.config of xfstests is like:

export TEST_DEV=/dev/vdb
export TEST_DIR=/mnt/test
export SCRATCH_DEV=/dev/vdc
export SCRATCH_MNT=/mnt/scratch

Following on is an example script to reproduce the failure:

```sh

#!/bin/bash

for i in `seq 100`; do
         echo y | mkfs.ext4 -O bigalloc -C 16K /dev/vdb

         ./check shared/298
         status=$?

         if [[ $status == 1 ]]; then
                 echo "$i exit"
                 exit
         fi
done

```

Indeed the failure occurs occasionally. Sometimes the script stops at 
iteration 4, or sometimes

at iteration 2, 7, 24.

The failure occurs with the following dmesg report:

```

[  387.471876] EXT4-fs error (device vdb): mb_free_blocks:1457: group 1, 
block 158084:freeing already freed block (bit 6753); block bitmap corrupt.
[  387.473729] EXT4-fs error (device vdb): ext4_mb_generate_buddy:747: 
group 1, block bitmap and bg descriptor inconsistent: 19550 vs 19551 
free clusters

```

3. About the applied patch

The applied patch does fix the failure in my test environment. At least 
the failure doesn't occur after running the full 100 iterations.

Thanks

Jeffle