[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1620526887.tg1zx7w5np.none@localhost>
Date: Sat, 08 May 2021 22:29:57 -0400
From: "Alex Xu (Hello71)" <alex_y_xu@...oo.ca>
To: linux-kernel@...r.kernel.org, linux-ext4@...r.kernel.org,
dm-crypt@...ut.de, linux-nvme@...ts.infradead.org,
linux-block@...r.kernel.org, Jens Axboe <axboe@...nel.dk>,
Changheun Lee <nanich.lee@...sung.com>, bvanassche@....org,
yi.zhang@...hat.com, ming.lei@...hat.com, bgoncalv@...hat.com,
hch@....de, jaegeuk@...nel.org
Subject: Re: regression: data corruption with ext4 on LUKS on nvme with
torvalds master
Excerpts from Alex Xu (Hello71)'s message of May 8, 2021 1:54 pm:
> Hi all,
>
> Using torvalds master, I recently encountered data corruption on my ext4
> volume on LUKS on NVMe. Specifically, during heavy writes, the system
> partially hangs; SysRq-W shows that processes are blocked in the kernel
> on I/O. After forcibly rebooting, chunks of files are replaced with
> other, unrelated data. I'm not sure exactly what the data is; some of it
> is unknown binary data, but in at least one case, a list of file paths
> was inserted into a file, indicating that the data is misdirected after
> encryption.
>
> This issue appears to affect files receiving writes in the temporal
> vicinity of the hang, but affects both new and old data: for example, my
> shell history file was corrupted up to many months before.
>
> The drive reports no SMART issues.
>
> I believe this is a regression in the kernel related to something merged
> in the last few days, as it consistently occurs with my most recent
> kernel versions, but disappears when reverting to an older kernel.
>
> I haven't investigated further, such as by bisecting. I hope this is
> sufficient information to give someone a lead on the issue, and if it is
> a bug, nail it down before anybody else loses data.
>
> Regards,
> Alex.
>
I found the following test to reproduce a hang, which I guess may be the
cause:
host$ cd /tmp
host$ truncate -s 10G drive
host$ qemu-system-x86_64 -drive format=raw,file=drive,if=none,id=drive -device nvme,drive=drive,serial=1 [... more VM setup options]
guest$ cryptsetup luksFormat /dev/nvme0n1
[accept warning, use any password]
guest$ cryptsetup open /dev/nvme0n1
[enter password]
guest$ mkfs.ext4 /dev/mapper/test
[normal output...]
Creating journal (16384 blocks): [hangs forever]
I bisected this issue to:
cd2c7545ae1beac3b6aae033c7f31193b3255946 is the first bad commit
commit cd2c7545ae1beac3b6aae033c7f31193b3255946
Author: Changheun Lee <nanich.lee@...sung.com>
Date: Mon May 3 18:52:03 2021 +0900
bio: limit bio max size
I didn't try reverting this commit or further reducing the test case.
Let me know if you need my kernel config or other information.
Regards,
Alex.
Powered by blists - more mailing lists