lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <479E2247.8080109@cybernetics.com>
Date:	Mon, 28 Jan 2008 13:43:19 -0500
From:	Tony Battersby <tonyb@...ernetics.com>
To:	Stephen Hemminger <shemminger@...ux-foundation.org>,
	netdev@...r.kernel.org
Subject: sky2: tx hang on dual-port Yukon XL when rx csum disabled

I am experiencing network tx hangs on a dual-port SK-9E22 with sky2 in
2.6.24.  The problem is triggered by both ports transmitting at high
speed simultaneously.  This problem is 100% quickly reproducible.  Here
is the setup:

PC #1 with Intel PRO/1000 NIC:
e1000 IP address 192.168.1.1
running iperf -s

PC #2 with Intel PRO/1000 NIC:
e1000 IP address 192.168.2.1
running iperf -s

PC #3 with SysKonnect SK-9E22 (dual-port copper PCI-express)
sky2 IP address 192.168.1.2
sky2 IP address 192.168.2.2

So basically, I have two PCs with Intel PRO/1000 NICs running "iperf
-s".  Each of these Intel NICs is directly cabled to one of the two
ports of the SysKonnect NIC.

When I run:
(PC #3 tty1) iperf -c 192.168.1.1 -t 30
(wait for a second or two)
(PC #3 tty2) iperf -c 192.168.2.1 -t 30

"iperf -c 192.168.1.1" never finishes, but "iperf -c 192.168.2.1" does
finish.  Press Ctrl-C to abort the hung iperf.  Ping 192.168.1.1 does
not respond.  Ping 192.168.2.1 does respond, but each ping has almost
exactly 1 second latency (the latency should be < 1 ms).

When I switch the order of the tests, whichever iperf -c was started
_first_ is the one that locks up with no ping afterward, and whichever
was started _second_ is the one that finishes, but with a 1-second ping
latency afterward.  So the problem follows the ordering of the tests
rather than a specific port.

Also, the trigger seems to be transmitting, not receiving.  If I run
"iperf -s" on the SysKonnect PC and "iperf -c" on the two Intel PRO/1000
PCs, then the tests pass.

When I do "ethtool -K eth0 rx on; ethtool -K eth1 rx on" to turn on rx
checksumming on both ports of the SysKonnect NIC, both tests pass
successfully.  Commit 8b31cfbcd1b54362ef06c85beb40e65a349169a2 "sky2:
disable rx checksum on Yukon XL" disabled rx checksumming by default on
this NIC to get rid of some "hw csum failure" messages
(http://marc.info/?l=linux-netdev&m=119497815523843&w=4).  However, this
seems to have exposed a different (and arguably worse) bug.

I also tried booting with "maxcpus=1 pci=nomsi", but that didn't affect
the problem.

As a temporary workaround, I will use ethtool to turn on rx checksumming
and live with the "hw csum failure" messages, since they are better than
network lockups.

Let me know if I can be of any further assistance in tracking down this
problem.

Tony Battersby
Cybernetics

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ