lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 6 Apr 2012 14:48:39 +0100
From:	Chris Boot <bootc@...tc.net>
To:	Henrique de Moraes Holschuh <hmh@....eng.br>
Cc:	Bjorn Helgaas <bhelgaas@...gle.com>, Nix <nix@...eri.org.uk>,
	"Wyborny, Carolyn" <carolyn.wyborny@...el.com>,
	e1000-devel@...ts.sourceforge.net, netdev <netdev@...r.kernel.org>,
	lkml <linux-kernel@...r.kernel.org>, linux-pci@...r.kernel.org,
	Matthew Garrett <mjg@...hat.com>
Subject: Re: [E1000-devel] e1000e interface hang on 82574L

On 6 Apr 2012, at 14:41, Henrique de Moraes Holschuh <hmh@....eng.br> wrote:

> On Fri, 06 Apr 2012, Bjorn Helgaas wrote:
>> On Fri, Apr 6, 2012 at 4:17 AM, Chris Boot <bootc@...tc.net> wrote:
>>> On 19 Mar 2012, at 17:31, Nix wrote:
>>> 
>>>> On 19 Mar 2012, Carolyn Wyborny said:
>>>> 
>>>>>> you'll see that I tested that, and it doesn't work :( even if it
>>>>>> did work, it shouldn't be needed: the driver attempts to turn off
>>>>>> PCIe ASPM on affected NICs, and fails, apparently because
>>>>>> *something* turns it back on again.
>>>>>> 
>>>>> The driver attempts to disable L0s state, not the entire feature.
>>>>> It
>>>> 
>>>> It tries to disable L1 state as well (or it did when I tested this
>>>> last, although I suspect you're right and it may leave L1 turned on
>>>> these days: judging by the contents of e1000_82574_info, anyway.)
>>>> 
>>>>> is also required that the device upstream on the bus from the
>>>>> 82574L have this disabled. Yes, I agree there appears to be
>>>>> something in the os that either ren-enables or fails to disable
>>>>> the feature on the upstream device, as desired. Platforms/systems
>>>>> also appear to vary in this regard, so the solutions may vary a
>>>>> bit as well.
>>>>> 
>>>>> Its worth trying your solution as well if what I suggested doesn't
>>>>> work, but there is not one solution that fits all, unfortunately.
>>>> 
>>>> I don't *have* a solution. :( 'setpci by hand some unknown amount
>>>> of time after booting once the interface has stabilized' hardly
>>>> counts as a solution of any sort. It's, at best, a workaround that
>>>> lets me use my systems without hourly lockups until a real solution
>>>> is found.
>>>> 
>>>> (To clarify: manual setpci to force off the ASPM bits is the only
>>>> thing that works for me. The driver's automatic disabling of L0s
>>>> and L1 doesn't work: nor does booting with pcie_aspm=off. In both
>>>> cases, I end up with both L0s and L1 turned on, and a lockup some
>>>> time later, unless I setpci the bits off by hand.)
>>> 
>>> 
>>> Well, with that setpci incantation run against the NIC and its
>>> upstream device to disable ASPM L1s (setpci -s <dev>
>>> CAP_EXP+10.b=40), everything has been working very well indeed. Is
>>> there something the e1000e driver could do to disable L1s as well as
>>> L0s if we know there's a problem with them for these devices?
>>> 
>>> Adding Bjorn Helgaas and linux-pci to CCs to try to get the ball
>>> rolling some more, as this is crippling without the fixes.
>> 
>> [+cc Matthew Garrett for ASPM stuff]
>> 
>> If I understand correctly, e1000e attempts to disable ASPM to work
>> around an 82574L hardware erratum, but the PCI core either doesn't
>> disable ASPM or it gets re-enabled somehow.
> 
> You probably need to disable it upstream of the 82574L as well.  Here
> (SuperMicro C7X58) I managed to get it to be stable by telling the BIOS
> to disable L0s and L1 system-wide.
> 
> But not all BIOSes will have that option...


This is not something I can really do as ASPM makes a real difference to power consumption across the system, and I have a strict power budget to adhere to (else I will be charged more to host my servers). Disabling it for the NIC and upstream device is enough to make it stable, and doesn't increase power consumption by enough to matter.

The driver seems to disable ASPM L0s just fine, but L1s are not disabled on the NIC nor are they on the upstream device. If e1000e can't do it maybe we can do so using a PCI quirk or something?

Cheers,
Chris

-- 
Chris Boot
bootc@...tc.net

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ