lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 03 May 2012 11:08:43 +0100
From:	Nix <nix@...eri.org.uk>
To:	Jeff Kirsher <jeffrey.t.kirsher@...el.com>
Cc:	davem@...emloft.net, Chris Boot <bootc@...tc.net>,
	netdev@...r.kernel.org, gospo@...hat.com, sassmann@...hat.com,
	"Wyborny\, Carolyn" <carolyn.wyborny@...el.com>
Subject: Re: [net-next 5/9] e1000e: Disable ASPM L1 on 82574

On 3 May 2012, Jeff Kirsher spake thusly:

> From: Chris Boot <bootc@...tc.net>
>
> ASPM on the 82574 causes trouble. Currently the driver disables L0s for
> this NIC but only disables L1 if the MTU is >1500. This patch simply
> causes L1 to be disabled regardless of the MTU setting.
>
> Signed-off-by: Chris Boot <bootc@...tc.net>
> Cc: "Wyborny, Carolyn" <carolyn.wyborny@...el.com>
> Cc: Nix <nix@...eri.org.uk>
> Link: https://lkml.org/lkml/2012/3/19/362
> Tested-by: Jeff Pieper <jeffrey.e.pieper@...el.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@...el.com>

(reminder: this is known not to fix the instance of this problem I am
experiencing, where ASPM is being re-enabled by something even if turned
off via setpci during boot, though it does fix those instances seen by
others where that doesn't happen. I'd have done more printf()-scattering
debugging to see where it's turned back on if it wasn't that this is
happening on an always-on server for which rebooting outside the dead of
night is a long-winded chore...)

FWIW I have also seen -- very rare -- lockups of the same nature on
82574L links in 100MbE mode using non-jumbo frames. However they are far
more common on GbE jumbo-framed links, normally taking less than an hour
to take the link down with a wildly corrupted register set (as shown by
ethtool).

(It's annoying this firmware isn't flashable so we could just *fix* this
bug rather than working around it. :( )


I think I might cheat a bit next and printk_once() the state of ASPM L1
on the errant PCI device from inside the scheduler when it flips from L1
off to L1 on again. At 100 tests per second that should indicate at what
time the thing is turned back on fairly tightly: even if not providing a
direct clue as to which bit of the kernel is doing it, if I combine it
with a set -x in userspace it should at least indicate what bit of the
boot process is happening at the same time. It'll be the weekend before
I can try that though.

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ