linux-kernel - possible nfsv3 write corruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200227162843.n2qjuka2rjc44qcv@matt-gen-desktop-p01.matt.pallissard.net>
Date:   Thu, 27 Feb 2020 08:28:43 -0800
From:   "Pallissard, Matthew" <matt@...lissard.net>
To:     linux-kernel@...r.kernel.org
Subject: possible nfsv3 write corruption

Forgive me if this is the wrong list.

Ok, I have this super infrequent data corruption on write that seems to be limited to nfsv3 async mounts.  I have not tested nfsv4 yet.  I _think_ I've narrowed down to the 5.5.0 > X >= 5.1.4 (maybe earlier) kernels.  I had some users report they had random data corruption.  A bit of testing shows that it's reproducible and the corruption is nearly identical every time.

I'd like to get to the bottom of this so I can guarantee that a kernel upgrade will resolve the issue.

What winds up happening is every several hundred GiB[ish] we wind up with the first half of a 64 bit segment corrupted.  Here is some example output from a test.  My test writes a few Gib, alternating between 64 bits of `0`'s and 64 bits of `1`'s.  I then read it in and check the contents. Re-reading the file shows that it's corrupted on write, not read.

> 2020-02-14 11:04:34 crit   found mis-match on word segment 11911168 / 33554432!
> 2020-02-14 11:04:34 crit   found mis-match on byte 7, 188 != 255
> 2020-02-14 11:04:34 crit   found mis-match on byte 6, 0 != 255
> 2020-02-14 11:04:34 crit   found mis-match on byte 5, 16 != 255
> 2020-02-14 11:04:34 crit   found mis-match on byte 4, 128 != 255
> 2020-02-14 11:04:34 crit   1011110000000000000100001000000011111111111111111111111111111111

> 2020-02-14 13:38:11 crit   found mis-match on word segment 1982464 / 33554432!
> 2020-02-14 13:38:11 crit   found mis-match on byte 7, 188 != 255
> 2020-02-14 13:38:11 crit   found mis-match on byte 6, 0 != 255
> 2020-02-14 13:38:11 crit   found mis-match on byte 5, 16 != 255
> 2020-02-14 13:38:11 crit   found mis-match on byte 4, 128 != 255
> 2020-02-14 13:38:11 crit   1011110000000000000100001000000011111111111111111111111111111111

Knowns;

	* does not appear to happen on CentOS/EL 3.10 series kernel

	* does not appear to happen on a 5.5 series kernel
		* I'm re-running all my tests now to confirm this.

	* not hardware dependent

	* not processor dependent
		* I tested 3 different Intel processors

	* appears to only happen on NFS v3 async mounts
		* local disk and `-o sync` NFS v3 mounts have been tested

	* It happens on random 64 bit segments

	* It's *always* the same 4 bytes that are corrupted

	* While often identical, the corrupted bytes are not always identical
		* the identical corruption pattern can appear on separate computers.

	* It's *always* on words that are written with `1`'s <- this is the part I find most interesting

	* whether or not I explicitly call `fflush` and `sync` has no effect on the results.

	* usually takes ~80-2000Gib to reproduce, sometimes higher or lower but infrequent.
		* I've been writing 2GiB files
		* sometimes I never hit the corruption case.

	* I've yet to see more than one corrupted segment in a file.

A little bit about the build/run environments;

the hardware
	CentOS 7.
	CentOS glibc 2.17
	clang 9 / lld
	Dell PowerEdge R620
	Dell PowerEdge C6320
	Dell PowerEdge C6420
	Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
	Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz
	Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz

* I did compile locally on every box.  I also tested every compiled binary on every box.  It didn't seem to affect the results.
* I don't have a tcpdump of this yet.  I'm hoping to get that started before the end of the week.
* I read and write to the same file every time, unlinking it before writing again
* I have not tried dropping the cache between any of the steps.
* I have engaged our storage vendor to see what they have to say.  They're pretty good at getting useful metrics and insight so if there is anything I should have them gather server-side please let me know.

If anyone as any insight or additional testing I can perform I would *greatly* appreciate it.  I would be thrilled if this turned out to be some dumb configuration option or other operational thing performed incorrectly.

Thank you for your time.

Matt Pallissard

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)