[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20071007184741.3b1951c1@highlander.home.lan>
Date: Sun, 7 Oct 2007 18:47:41 +0200
From: Malte Schröder <maltesch@....de>
To: linux-kernel@...r.kernel.org
Subject: Strange network related data corruption
Hello,
I am encountering some strange data corruption when transferring
data from one of my PCs that I use as a file-server.
on the server:
FILE=<large file>; | cut -d" " -f1 | nc -lp5000 -q0; while nc
-lp5000 -q0 < $FILE; do : ; done
on the client:
H=<server>; SUM=$(nc -q0 $H 5000);sleep 1s; while nc -q0 $H 5000 |
sha1sum | (grep -v $SUM || echo -n .); do sleep 1s ;done
(output looks somewhat like this:
..............6dd5fb1ce29d270acdfbb02d00921bf75d141773 -
...
)
I would expect the sha1sum to be the same in every pass (assuming the
source file does not change). But every few passes (with no apparent
pattern) there is a different sum returned. I first noticed this when
transferring large files (backups) with with SMB and NFS(v3 and v4) but
to rule that out I tried netcat in the way noted above.
When I have the server do the sha1sum of the file locally the problem
is not reproducible. When I do this with a small file that easily fits
into the cache the problem stays reproducible.
Another thing I did was to use dd to transfer data in 1GiB chunks from
/dev/zero and generate the sha1sum on the client. There I was not able
to reproduce the problem.
The server is a Athlon64 3400+ (good old Clawhammer) with 1GiB RAM. I
use 4 SATA drives in a software RAID5 configuration, attached to a
Promise TX4 300 SATA-II controller. The filesystem is ext3 without
special mount-options. The dist is Debian/Sid for AMD64 with
self-compiled kernel 2.6.23-rc9 (.config attached).
The clients I tried are a Core2Duo 6600 with 3GiB of RAM, also
Debian/Sid AMD64 (kernel 2.6.23-rc9) and a Centrino notebook with
Pentium M and 1GiB of RAM (Debian/Sid i386, kernel 2.6.23-rc7).
All PCs mentioned have gigabit ethernet and are connected via a gigabit
switch.
I tried these tests between the clients and could not reproduce the
problem there.
I had the server run memtest68+ with 20 passes without problems.
I tried several kernel versions on the server (from .18 to .23-rc9), all
showed the problem. I suspect a hardware problem, but I cannot isolate
the part responsible. I tried another ethernet adapter (the 3com905cin
lspci output) and I also tried the onboard sata controller(s) (2 ports
via and 2 ports promise tx2).
I don't know if this is a kernel problem or just my and my setup, but
maybe some one on this list has an idea wher I could look next.
Thanks and regards
Malte
--
---------------------------------------
Malte Schröder
MalteSch@....de
ICQ# 68121508
---------------------------------------
View attachment "config-2.6.23-rc9" of type "text/plain" (244 bytes)
Powered by blists - more mailing lists