lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1620526887.tg1zx7w5np.none@localhost>
Date:   Sat, 08 May 2021 22:29:57 -0400
From:   "Alex Xu (Hello71)" <alex_y_xu@...oo.ca>
To:     linux-kernel@...r.kernel.org, linux-ext4@...r.kernel.org,
        dm-crypt@...ut.de, linux-nvme@...ts.infradead.org,
        linux-block@...r.kernel.org, Jens Axboe <axboe@...nel.dk>,
        Changheun Lee <nanich.lee@...sung.com>, bvanassche@....org,
        yi.zhang@...hat.com, ming.lei@...hat.com, bgoncalv@...hat.com,
        hch@....de, jaegeuk@...nel.org
Subject: Re: regression: data corruption with ext4 on LUKS on nvme with
 torvalds master

Excerpts from Alex Xu (Hello71)'s message of May 8, 2021 1:54 pm:
> Hi all,
> 
> Using torvalds master, I recently encountered data corruption on my ext4 
> volume on LUKS on NVMe. Specifically, during heavy writes, the system 
> partially hangs; SysRq-W shows that processes are blocked in the kernel 
> on I/O. After forcibly rebooting, chunks of files are replaced with 
> other, unrelated data. I'm not sure exactly what the data is; some of it 
> is unknown binary data, but in at least one case, a list of file paths 
> was inserted into a file, indicating that the data is misdirected after 
> encryption.
> 
> This issue appears to affect files receiving writes in the temporal 
> vicinity of the hang, but affects both new and old data: for example, my 
> shell history file was corrupted up to many months before.
> 
> The drive reports no SMART issues.
> 
> I believe this is a regression in the kernel related to something merged 
> in the last few days, as it consistently occurs with my most recent 
> kernel versions, but disappears when reverting to an older kernel.
> 
> I haven't investigated further, such as by bisecting. I hope this is 
> sufficient information to give someone a lead on the issue, and if it is 
> a bug, nail it down before anybody else loses data.
> 
> Regards,
> Alex.
> 

I found the following test to reproduce a hang, which I guess may be the 
cause:

host$ cd /tmp
host$ truncate -s 10G drive
host$ qemu-system-x86_64 -drive format=raw,file=drive,if=none,id=drive -device nvme,drive=drive,serial=1 [... more VM setup options]
guest$ cryptsetup luksFormat /dev/nvme0n1
[accept warning, use any password]
guest$ cryptsetup open /dev/nvme0n1
[enter password]
guest$ mkfs.ext4 /dev/mapper/test
[normal output...]
Creating journal (16384 blocks): [hangs forever]

I bisected this issue to:

cd2c7545ae1beac3b6aae033c7f31193b3255946 is the first bad commit
commit cd2c7545ae1beac3b6aae033c7f31193b3255946
Author: Changheun Lee <nanich.lee@...sung.com>
Date:   Mon May 3 18:52:03 2021 +0900

    bio: limit bio max size

I didn't try reverting this commit or further reducing the test case. 
Let me know if you need my kernel config or other information.

Regards,
Alex.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ