lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 25 Jan 2016 11:08:51 +0100
From:	Nikola Ciprich <nikola.ciprich@...uxbox.cz>
To:	netdev <netdev@...r.kernel.org>
Cc:	nik@...uxbox.cz, Stanislav Schattke <schattke@...uxbox.cz>
Subject: Supermicro AOC-STGN-i2S w intel 82599ES on Brocade ICX6610 - random
 link failures

Hello netdev readers,

I'd like to consult following problem we're dealing with:

I have a cluster of three nodes connected to stacked Brocade ICX6610
switches using bonded AOC-STGN-i2S adapters (they're using 82599ES
chipsets).

The problem is, I see random link failures on practically all
interfaces. Link always goes down for very short time, then adapter
is reset and link goes up again.

Here's dmesg snippet:

[Jan22 22:09] ixgbe 0000:03:00.0 eth0: NIC Link is Down
[  +0.005610] ixgbe 0000:03:00.0 eth0: initiating reset to clear Tx work after link loss
[  +0.012792] bond0: link status definitely down for interface eth0, disabling it
[  +1.105826] ixgbe 0000:03:00.0 eth0: Reset adapter
[  +0.307518] ixgbe 0000:03:00.0 eth0: detected SFP+: 3
[  +0.145881] ixgbe 0000:03:00.0 eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX

since I'm using bonding, it doesn't disrupt traffic, but I'd still like to
resolve it. We're using 5m passive SFP cables, we tried replacing one with 3m
piece, to no avail. 

all three boxes are supermicro X10DRW, running vanilla x86_64 4.0.5 kernel (I'll upgrade it to 4.1.16 soon)

we were using broadcom adapter before and they were working without such problems
(except for one particular port, which showed mysterious packet drops every few
months, thats why we switched to intel-based adapters), so I think cables and switches
should be fine, but I'm not sure of course

I think I've seen similar problems and they were PM related, but I'm not sure..

anyone seen similar problem?

or some tips on how could I debug it?

If I could provide more information, please let me know

BR

nik

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@...uxbox.cz
-------------------------------------

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists