lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1727409.Ct3NHyBiXb@rofl>
Date:	Wed, 04 Nov 2015 12:31:46 +0100
From:	Patrick Schaaf <netdev@....de>
To:	NETDEV <netdev@...r.kernel.org>
Cc:	Greg KH <gregkh@...uxfoundation.org>, ariele@...adcom.com
Subject: kernel 3.14.53 + bnx2x loss of connectivity / parity errors / MCP SCPAD

Dear netdevs,

on a production server (HP DL380 Gen9 with HP 10GE dual port card - bnx2x 
driver), I just encountered a full loss of connectivity through the 10 GE 
ports. Kernel in use is vanilla 3.14.53.

On the console I could see this (timestamps omitted, have to type by hand, 
damn ILO console does not let me copy+paste text...)

MCP SCPAD
MCP SCPAD
bnx2x 0000:04:00.1 eth1: Parity errors detected in blocks:
MCP SCPAD
MCP SCPAD
bnx2x 0000:04:00.0 eth0: Parity errors detected in blocks:
bnx2x: [bnx2x_attn_int_deasserted3:4080(eth0)]LATCHED attention 0x80000000 
(masked)
MCP SCPAD
...
systemd-journald[491]: /dev/kmsg buffer overrun, some messages lost.

Some googling around finds:

https://github.com/torvalds/linux/commit/ad6afbe9578d1fa26680faf78c846bd8c00d1d6e 

which might be related. If I read that correctly (and I have no real idea what 
I'm talking about, sorry...) that patch removes superflous printks which 
might, e.g. in our case, hide the real cause. i.e. even with that patch we 
would have had a problem / loss of connectivity, but we might know better why.

Maybe that changeset would be suitable for backporting to long term stable 
kernels?

Incidentally, how should these parity events be judged generally? Hope it's a 
one time cosmic ray incident? Cry "faulty hardware, please repair" to the 
supplier? Anything else?

best regards
  Patrick
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ