linux-kernel - Strange network related data corruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20071007184741.3b1951c1@highlander.home.lan>
Date:	Sun, 7 Oct 2007 18:47:41 +0200
From:	Malte Schröder <maltesch@....de>
To:	linux-kernel@...r.kernel.org
Subject: Strange network related data corruption

Hello,
I am encountering some strange data corruption when transferring
data from one of my PCs that I use as a file-server.

on the server:
FILE=<large file>; | cut -d" " -f1 | nc -lp5000 -q0; while nc
-lp5000 -q0 < $FILE; do : ; done

on the client:
H=<server>; SUM=$(nc -q0 $H 5000);sleep 1s; while nc -q0 $H 5000 |
sha1sum | (grep -v $SUM || echo -n .); do sleep 1s ;done

(output looks somewhat like this:
..............6dd5fb1ce29d270acdfbb02d00921bf75d141773  -
...
)

I would expect the sha1sum to be the same in every pass (assuming the
source file does not change). But every few passes (with no apparent
pattern) there is a different sum returned. I first noticed this when
transferring large files (backups) with with SMB and NFS(v3 and v4) but
to rule that out I tried netcat in the way noted above.

When I have the server do the sha1sum of the file locally the problem
is not reproducible. When I do this with a small file that easily fits
into the cache the problem stays reproducible.

Another thing I did was to use dd to transfer data in 1GiB chunks from
/dev/zero and generate the sha1sum on the client. There I was not able
to reproduce the problem.

The server is a Athlon64 3400+ (good old Clawhammer) with 1GiB RAM. I
use 4 SATA drives in a software RAID5 configuration, attached to a
Promise TX4 300 SATA-II controller. The filesystem is ext3 without
special mount-options. The dist is Debian/Sid for AMD64 with
self-compiled kernel 2.6.23-rc9 (.config attached).

The clients I tried are a Core2Duo 6600 with 3GiB of RAM, also
Debian/Sid AMD64 (kernel 2.6.23-rc9) and a Centrino notebook with
Pentium M and 1GiB of RAM (Debian/Sid i386, kernel 2.6.23-rc7).

All PCs mentioned have gigabit ethernet and are connected via a gigabit
switch.

I tried these tests between the clients and could not reproduce the
problem there.

I had the server run memtest68+ with 20 passes without problems.

I tried several kernel versions on the server (from .18 to .23-rc9), all
showed the problem. I suspect a hardware problem, but I cannot isolate
the part responsible. I tried another ethernet adapter (the 3com905cin
lspci output) and I also tried the onboard sata controller(s) (2 ports
via and 2 ports promise tx2).

I don't know if this is a kernel problem or just my and my setup, but
maybe some one on this list has an idea wher I could look next.

Thanks and regards
Malte
-- 
---------------------------------------
Malte Schröder
MalteSch@....de
ICQ# 68121508
---------------------------------------

View attachment "config-2.6.23-rc9" of type "text/plain" (244 bytes)