lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f0c90fc0-c239-df68-371d-a5c74c8f32eb@kernel.dk>
Date:   Sat, 8 May 2021 21:51:36 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     "Alex Xu (Hello71)" <alex_y_xu@...oo.ca>,
        linux-kernel@...r.kernel.org, linux-ext4@...r.kernel.org,
        dm-crypt@...ut.de, linux-nvme@...ts.infradead.org,
        linux-block@...r.kernel.org,
        Changheun Lee <nanich.lee@...sung.com>, bvanassche@....org,
        yi.zhang@...hat.com, ming.lei@...hat.com, bgoncalv@...hat.com,
        hch@....de, jaegeuk@...nel.org
Subject: Re: regression: data corruption with ext4 on LUKS on nvme with
 torvalds master

On 5/8/21 8:29 PM, Alex Xu (Hello71) wrote:
> Excerpts from Alex Xu (Hello71)'s message of May 8, 2021 1:54 pm:
>> Hi all,
>>
>> Using torvalds master, I recently encountered data corruption on my ext4 
>> volume on LUKS on NVMe. Specifically, during heavy writes, the system 
>> partially hangs; SysRq-W shows that processes are blocked in the kernel 
>> on I/O. After forcibly rebooting, chunks of files are replaced with 
>> other, unrelated data. I'm not sure exactly what the data is; some of it 
>> is unknown binary data, but in at least one case, a list of file paths 
>> was inserted into a file, indicating that the data is misdirected after 
>> encryption.
>>
>> This issue appears to affect files receiving writes in the temporal 
>> vicinity of the hang, but affects both new and old data: for example, my 
>> shell history file was corrupted up to many months before.
>>
>> The drive reports no SMART issues.
>>
>> I believe this is a regression in the kernel related to something merged 
>> in the last few days, as it consistently occurs with my most recent 
>> kernel versions, but disappears when reverting to an older kernel.
>>
>> I haven't investigated further, such as by bisecting. I hope this is 
>> sufficient information to give someone a lead on the issue, and if it is 
>> a bug, nail it down before anybody else loses data.
>>
>> Regards,
>> Alex.
>>
> 
> I found the following test to reproduce a hang, which I guess may be the 
> cause:
> 
> host$ cd /tmp
> host$ truncate -s 10G drive
> host$ qemu-system-x86_64 -drive format=raw,file=drive,if=none,id=drive -device nvme,drive=drive,serial=1 [... more VM setup options]
> guest$ cryptsetup luksFormat /dev/nvme0n1
> [accept warning, use any password]
> guest$ cryptsetup open /dev/nvme0n1
> [enter password]
> guest$ mkfs.ext4 /dev/mapper/test
> [normal output...]
> Creating journal (16384 blocks): [hangs forever]
> 
> I bisected this issue to:
> 
> cd2c7545ae1beac3b6aae033c7f31193b3255946 is the first bad commit
> commit cd2c7545ae1beac3b6aae033c7f31193b3255946
> Author: Changheun Lee <nanich.lee@...sung.com>
> Date:   Mon May 3 18:52:03 2021 +0900
> 
>     bio: limit bio max size
> 
> I didn't try reverting this commit or further reducing the test case. 
> Let me know if you need my kernel config or other information.

If you have time, please do test with that reverted. I'd be anxious to
get this revert queued up for 5.13-rc1.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ