lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <f87dcf1e-5a8a-36c2-a864-88099a66d220@keff.org>
Date:   Mon, 29 Jun 2020 01:55:40 +0700
From:   Sebastian Hyrwall <sh@...f.org>
To:     linux-kernel@...r.kernel.org
Subject: BTRFS/EXT4 Data Corruption

Hi

Sorry if this is not the right place for this email but I can't think of 
another place (might be linux-fsdevel)
Someone here is ought to be an expert in this.

It all started as having file corruptions inside VMs that then led to 
alot of testing that
resulted in replicatable results on the backend NAS.

Tests where done by generating 100 1GB files from /dev/urandom to 
"volume1" (both BTRFS and EXT4 tested).
MD5 hashing the files and then copying the files to "volume2". 2-4% of 
the files would fail the hash match every time
the test was done.

After alot of fiddling around it turned out that the problem goes away 
if doing "cp --sparse=never"
when copying the files. This would to me exclude any hardware errors and 
feels more like something
deeper inside the kernel.

The box runs Kernel 3.10.105. Version >4 seems unaffected (not 100% 
confirmed, too few testboxes).

Here is a diff between a hexdump of a failed file,

43861581c43861581
< 29d464c0: aca0 d68f 0ff4 0bad fa4M-5 1339 8148 30e8 .........E.9.H0.
---
 > 29d464c0: aca0 d68f 0ff4 0bad fa45 1339 8148 30e8 .........E.9.H0.
55989446c55989446
< 35654c50: 31f4 f7b5 40be 2188 c539 043b 35b4 abb5 1...@....9.;5...
---
 > 35654c50: 3174 f7b5 40be 2188 c539 043b 35b4 abb5 1t..@....9.;5...

As you can see it's a single flipped bit (31f4, 3174). I'm not sure 
about "fa4M-5". Never seen "M-" before.


Details,

Linux 3.10.105,
Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz,
Volume ontop of lvm and md-raid,
md2 : active raid5 sda3[0] sdj3[5] sdi3[4] sdf3[3] sde3[2] sdb3[1]
       39046022720 blocks super 1.2 level 5, 64k chunk, algorithm 2 
[6/6] [UUUUUU],
cp (GNU coreutils) 8.24

BTRFS and EXT4 default mount options.



// Sebastian H

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ