lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <87CA0F2C-5CEB-4CE7-8399-534CABE5ADD8@canonical.com>
Date:   Fri, 24 Jul 2020 22:32:05 +0800
From:   Kai-Heng Feng <kai.heng.feng@...onical.com>
To:     Robert Hancock <hancockrwd@...il.com>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        "open list:PCI SUBSYSTEM" <linux-pci@...r.kernel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: 5.7 regression: Lots of PCIe AER errors and suspend failure
 without pcie=noaer

Hi Robert,

> On Jul 22, 2020, at 07:55, Robert Hancock <hancockrwd@...il.com> wrote:
> 
> On Fri, Jul 10, 2020 at 6:28 PM Robert Hancock <hancockrwd@...il.com> wrote:
>> 
>> On Fri, Jul 10, 2020 at 6:23 PM Robert Hancock <hancockrwd@...il.com> wrote:
>>> 
>>> Noticed a problem on my desktop with an Asus PRIME H270-PRO
>>> motherboard after Fedora 32 upgraded to the 5.7 kernel (now on 5.7.8):
>>> periodically there are PCIe AER errors getting spewed in dmesg that
>>> weren't happening before, and this also seems to causes suspend to
>>> fail - the system just wakes back up again right away, I am assuming
>>> due to some AER errors interrupting the process. 5.6 kernels didn't
>>> have this problem. Setting "pcie=noaer" on the kernel command line
>>> works around the issue, but I'm not sure what would have changed to
>>> trigger this to occur?
>> 
>> Correction: the workaround option is "pci=noaer".
> 
> As a follow-up, from some more experimentation, it appears that
> disabling PCIe ASPM with setpci on both the ASMedia PCIe-PCI bridge as
> well as the PCIe root port it is connected to seems to silence the AER
> errors and allow suspend/resume to work again:
> 
> setpci -s 00:1c.0 0x50.B=0x00
> setpci -s 02:00.0 0x90.B=0x00
> 
> It appears the behavior changed as a result of this patch (which went
> into the stable tree for 5.7.6 and so affects 5.7 kernels as well):
> 
> commit 66ff14e59e8a30690755b08bc3042359703fb07a
> Author: Kai-Heng Feng <kai.heng.feng@...onical.com>
> Date:   Wed May 6 01:34:21 2020 +0800
> 
>    PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges
> 
>    7d715a6c1ae5 ("PCI: add PCI Express ASPM support") added the ability for
>    Linux to enable ASPM, but for some undocumented reason, it didn't enable
>    ASPM on links where the downstream component is a PCIe-to-PCI/PCI-X Bridge.
> 
>    Remove this exclusion so we can enable ASPM on these links.
> 
>    The Dell OptiPlex 7080 mentioned in the bugzilla has a TI XIO2001
>    PCIe-to-PCI Bridge.  Enabling ASPM on the link leading to it allows the
>    Intel SoC to enter deeper Package C-states, which is a significant power
>    savings.
> 
>    [bhelgaas: commit log]
>    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207571
>    Link: https://lore.kernel.org/r/20200505173423.26968-1-kai.heng.feng@canonical.com
>    Signed-off-by: Kai-Heng Feng <kai.heng.feng@...onical.com>
>    Signed-off-by: Bjorn Helgaas <bhelgaas@...gle.com>
>    Reviewed-by: Mika Westerberg <mika.westerberg@...ux.intel.com>
> 
> Unfortunately it appears that this ASMedia PCIe-PCI bridge:
> 
> 02:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe
> to PCI Bridge [1b21:1080] (rev 04)
> 
> doesn't cope with ASPM properly and causes a bunch of PCIe link
> errors. (This is in addition to some broken-ness known as far back as
> 2012 with these ASM1083/1085 chips with regard to PCI interrupts
> getting stuck, but this ASPM problem causes issues even if no devices
> are connected to the PCI side of the bridge, as is the case on my
> system.)
> 
> Might need a quirk to disable ASPM on this device?

Yes I think it's a great idea to do it.

Can you please file a bug on [1] and we can continue our discussion there.

[1] https://bugzilla.kernel.org

Kai-Heng

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ