linux-kernel - zram corruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-Id: <202202010229.27088.luke@dashjr.org>
Date:   Tue, 1 Feb 2022 02:29:26 +0000
From:   Luke Dashjr <luke@...hjr.org>
To:     Minchan Kim <minchan@...nel.org>, Nitin Gupta <ngupta@...are.org>
Cc:     linux-kernel@...r.kernel.org
Subject: zram corruption

I use ext4 on zram for my temp directories, and sometimes rarely, things get 
corrupted. Using ext4 on a normal disk works fine in the same scenarios.

I haven't managed to figure out what exactly is going on, but I do have a
157 GB strace log of it happening.

One scenario that fairly reliably reproduces it, is building 3 copies of 
binutils in parallel. About half the 
time, /var/tmp/portage/cross-i686-w64-mingw32/binutils-2.37_p1-r2/work/build/binutils/.deps/stabs.Po 
ends up truncated, and one of the builds fails.

The only other scenario I've seen it happen in (much less reproducible), is 
running Bitcoin functional tests. In this case, however, the ext4 structure 
itself got corrupted, and Linux was unable to recover (the directories 
affected became unusable until reboot).

I suspect it's probably a threading-related issue, but it's plausible it could 
be page size related (I *think* I'm using 64k pages) though in the latter 
case I would expect it to be much more common.

https://bugzilla.kernel.org/show_bug.cgi?id=215557