linux-ext4 - Re: Ext4 corruption with VM images as 3 > drop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20200320053451.B7AD0AE04D@d06av26.portsmouth.uk.ibm.com>
Date:   Fri, 20 Mar 2020 11:04:50 +0530
From:   Ritesh Harjani <riteshh@...ux.ibm.com>
To:     linux-ext4@...r.kernel.org, "Theodore Y. Ts'o" <tytso@....edu>
Cc:     "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        Jan Kara <jack@...e.cz>
Subject: Re: Ext4 corruption with VM images as 3 > drop_caches



On 3/19/20 6:54 PM, Ritesh Harjani wrote:
> 
> 
> On 3/18/20 9:17 AM, Aneesh Kumar K.V wrote:
>> Hi,
>>
>> With new vm install I am finding corruption with the vm image if I
>> follow up the install with echo 3 > /proc/sys/vm/drop_caches
>>
>> The file system reports below error.
>>
>> Begin: Running /scripts/local-bottom ... done.
>> Begin: Running /scripts/init-bottom ...
>> [    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode 
>> #787185: comm sh: iget: checksum invalid
>> done.
>> [    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode 
>> #917954: comm init: iget: checksum invalid
>> [    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode 
>> #917954: comm init: iget: checksum invalid
>> /sbin/init: error while loading shared libraries: libc.so.6: cannot 
>> open shared object file: Error 74
>> [    5.271207] Kernel panic - not syncing: Attempted to kill init! 
>> exitcode=0x00007f00
>>
>> And debugfs reports
>>
>> debugfs:  stat <917954>
>> Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
>> Generation: 0    Version: 0x00000000
>> User:     0   Group:     0   Size: 0
>> File ACL: 0
>> Links: 0   Blockcount: 0
>> Fragment:  Address: 0    Number: 0    Size: 0
>> ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
>> Size of extra inode fields: 0
>> Inode checksum: 0x00000000
>> BLOCKS:
>> debugfs:
>>
>> Bisecting this finds
>> Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make 
>> dioread_nolock the default")
>> as bad. If I revert the same on top of linus 
>> upstream(fb33c6510d5595144d585aa194d377cf74d31911)
>> I don't hit the corrupttion anymore.
> 
> Tried replicating this and could easily replicate it on Power box.
> I tried to reproduce this on x86 too, but could not reproduce on x86.
> Now one difference on Power could be that pagesize is 64K and fs
> blocksize is 4K.
> 
> The issue looks like the guest qemu image file is not properly written
> back, after host does echo 3 > drop_caches. (correct me if this is not
> the case).

Ok. So tried this issue with passing "cache=directsync" parameter to
drive file. This parameter says it should bypass the host side page
cache. With this parameter, I don't see this issue on Power box.

-ritesh


> 
> I tried replicating via below test, but it could not reproduce.
> 
> Any idea what kind of unit test could be written for this?
> I am not sure how exactly qemu is writing to it's image file.
> 
> 
> 1. Create 2 files. "mmap-file", "mmap-data".
> 2. "mmap-file" is a 2GB sparse file. Then at some random offsets (tried 
> with both 64KB align and 4KB align offsets), try to write
> pagesize/blocksize amount of known data pattern.
> 3. These offsets (which are pagesize/blocksize align) are recorded into
> "mmap-data" file via normal read/write calls.
> 4. Then after we wrote to both files, we munmap the "mmap-file" and
> close both of these files.
> 5. Then we do echo 3 > drop_caches.
> 6. Then in the verify phase, using the offsets written in "mmap-data"
> file, I read the "mmap-file" to verify if it's contents are proper or
> not.
> With that could not reproduce this issue.
> 
> 
> -ritesh
> 
>