lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 05 May 2012 17:33:45 +0100
From:	Nix <nix@...eri.org.uk>
To:	"Wyborny\, Carolyn" <carolyn.wyborny@...el.com>,
	Matthew Garrett <mjg@...hat.com>
Cc:	"Kirsher\, Jeffrey T" <jeffrey.t.kirsher@...el.com>,
	"davem\@davemloft.net" <davem@...emloft.net>,
	Chris Boot <bootc@...tc.net>,
	"netdev\@vger.kernel.org" <netdev@...r.kernel.org>,
	"gospo\@redhat.com" <gospo@...hat.com>,
	"sassmann\@redhat.com" <sassmann@...hat.com>
Subject: Re: [net-next 5/9] e1000e: Disable ASPM L1 on 82574

On 3 May 2012, nix@...eri.org.uk outgrape:

> On 3 May 2012, Carolyn Wyborny told this:
>
>> It would be good to know why/how your system is re-enabling the
>> setting. The problem is not solvable in firmware unfortunately and is
>> somewhat platform dependent. MMIO-tracer might be used to try and see
>
> I entirely forgot about that tool! *Definitely* worth trying.
>
> I'll give it a try this weekend.

Well, mmiotrace was a total flop: massive numbers of unexpected
secondary interrupts and a hard lockup. Still, I've now diagnosed this
bug and it's right up Matthew Garrett's street!

Matthew: the problem here is a server with an 82574L (controlled by the
e1000e driver). This NIC has a hardware bug causing it to lock up in a
way that only a reboot can solve in an hour or two if PCIe ASPM is not
disabled during boot (leaving me with my home directory stuck behind a
dead NIC on a headless machine, most annoying). The driver is attempting
to disable it, but failing.

>> when the re-enabling config space is written, but it might be too
>> heavyweight for a live production system.
>
> Given that the re-enabling happens at around the same time as the boot
> scripts finish running (it's done by the time I can log in), that's not
> going to be a problem. Hence my speculation that it's being re-enabled
> when the interface stabilizes (which is, of course, asynchronous) or
> something like that.

This is wrong. The disable never happens. The BIOS has been told to
enable PCIe ASPM. However, the kernel log says:

May  5 17:06:53 spindle info: [    0.629699]  pci0000:00: Requesting ACPI _OSC control (0x1d)
May  5 17:06:53 spindle info: [    0.629941]  pci0000:00: ACPI _OSC request failed (AE_NOT_FOUND), returned control mask: 0x1d
May  5 17:06:53 spindle info: [    0.630373] ACPI _OSC control for PCIe not granted, disabling ASPM

Unless pcie_aspm=force has been specified on the kernel command line,
this flips aspm_disabled to 1.

The e1000e driver then says (with a bit of extra debugging info I
added):

May  5 17:06:53 spindle info: [    1.248153] e1000e 0000:03:00.0: Disabling ASPM L0s L1
May  5 17:06:53 spindle info: [    1.248393] e1000e 0000:03:00.0: Disabling ASPM via pci_disable_link_state_locked()
May  5 17:06:53 spindle info: [    1.248823] e1000e 0000:03:00.0: aspm disabled, not forcing

i.e. because aspm_disabled is set, pci/pcie/aspm.c refuses to make any
changes at all to ASPM link state, not even to turn *off* ASPM on a
device on which the BIOS turned it on at boot. So ASPM remains enabled
and the NIC eventually locks up.

The question here is how to fix it. It appears that the motherboard or
BIOS on this machine does not grant _OSC control even (especially?) if
you have turned on PCIe ASPM in the BIOS. But perhaps even if _OSC is
not granted you should permit PCIe to be *disabled* by drivers, just not
enabled? (The BIOS appears to be buggy in this area: if you turn off
ASPM, save, and go back into setup, ASPM has turned itself back on
again!)

I'm not sure what the right thing to do is here: I don't know enough
about this area. But it does seem very strange that the only way I have
to turn off PCIe ASPM reliably on this device is to tell the kernel to
forcibly turn it *on*!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ