lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 18 Sep 2007 04:47:02 -0400
From:	Bill Fink <billfink@...dspring.com>
To:	Urs Thuermann <urs@...ogud.escape.de>
Cc:	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
	"L F" <lfabio.linux@...il.com>,
	"Kok, Auke-jan H" <auke-jan.h.kok@...el.com>,
	"James Chapman" <jchapman@...alix.com>, <netdev@...r.kernel.org>
Subject: Re: e1000 driver and samba

On 18 Sep 2007, Urs Thuermann wrote:

> Bill Fink <billfink@...dspring.com> writes:
> 
> > It may also be a useful test to disable hardware TSO support
> > via "ethtool -K ethX tso off".
> 
> All suggestions here on the list, i.e. checking for flow control,
> duplex, cable problems, etc. don't explain (at least to me) why LF
> sees file corruption.  How can a corrupted frame pass the TCP checksum
> check?  Does TCP use the hardware checksum of the NIC if available?
> AFAICS, this would be the only way for a corrupt frame to make it into
> the file.  But Bill already suggested this and LF reported that it
> didn't make a difference.
> 
> A few months ago I had hadware problems with an embedded device, where
> transmission from the NIC via the PCI bus to the CPU had some bits
> flipped.  But tcpdump clearly showed the TCP checksum errors and also
> TCP recognized the errors and the connection was stalled.  And, BTW,
> we also observed an increasing percentage of corrupted frames with
> increasing traffic on that interface, i.e. increasing load on the PCI
> bus.
> 
> So I would run tcpdump -s0 and watch for "incorrect checksum" messages.

I agree TSO is an unlikely candidate since it should only affect
transmits and the problem as I understand it is with receives.
But still one of the first things I try doing when dealing with
weird problems is disabling all hardware assists.

But I also agree with you that network errors should normally be
detected by the TCP checksum (unless hardware checksumming was
messed up), and from what I recall there were no receive checksum
errors being seen.  That and the fact that the problem was seen
with two different NICs would lead me to believe that the problem
is elsewhere in the system.

That leaves many possibilities.  It could be a memory problem,
although it was indicated that memory testing was successfully
performed (but we don't know how extensive the memory checking
is enabled via the BIOS).  It could be the PCI bus writes back
to the disk, or a problem with the disk/controller/fs writes
themselves (some kind of disk stress test might be useful).

						-Bill
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ