lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADLC3L1R2hssRjxHJv9yhdN_7-hGw58rXSfNp-FraZh0Tw+gRw@mail.gmail.com>
Date:   Tue, 21 Jul 2020 17:55:35 -0600
From:   Robert Hancock <hancockrwd@...il.com>
To:     linux-kernel <linux-kernel@...r.kernel.org>,
        linux-pci@...r.kernel.org,
        Kai-Heng Feng <kai.heng.feng@...onical.com>
Cc:     Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: 5.7 regression: Lots of PCIe AER errors and suspend failure
 without pcie=noaer

On Fri, Jul 10, 2020 at 6:28 PM Robert Hancock <hancockrwd@...il.com> wrote:
>
> On Fri, Jul 10, 2020 at 6:23 PM Robert Hancock <hancockrwd@...il.com> wrote:
> >
> > Noticed a problem on my desktop with an Asus PRIME H270-PRO
> > motherboard after Fedora 32 upgraded to the 5.7 kernel (now on 5.7.8):
> > periodically there are PCIe AER errors getting spewed in dmesg that
> > weren't happening before, and this also seems to causes suspend to
> > fail - the system just wakes back up again right away, I am assuming
> > due to some AER errors interrupting the process. 5.6 kernels didn't
> > have this problem. Setting "pcie=noaer" on the kernel command line
> > works around the issue, but I'm not sure what would have changed to
> > trigger this to occur?
>
> Correction: the workaround option is "pci=noaer".

As a follow-up, from some more experimentation, it appears that
disabling PCIe ASPM with setpci on both the ASMedia PCIe-PCI bridge as
well as the PCIe root port it is connected to seems to silence the AER
errors and allow suspend/resume to work again:

setpci -s 00:1c.0 0x50.B=0x00
setpci -s 02:00.0 0x90.B=0x00

It appears the behavior changed as a result of this patch (which went
into the stable tree for 5.7.6 and so affects 5.7 kernels as well):

commit 66ff14e59e8a30690755b08bc3042359703fb07a
Author: Kai-Heng Feng <kai.heng.feng@...onical.com>
Date:   Wed May 6 01:34:21 2020 +0800

    PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges

    7d715a6c1ae5 ("PCI: add PCI Express ASPM support") added the ability for
    Linux to enable ASPM, but for some undocumented reason, it didn't enable
    ASPM on links where the downstream component is a PCIe-to-PCI/PCI-X Bridge.

    Remove this exclusion so we can enable ASPM on these links.

    The Dell OptiPlex 7080 mentioned in the bugzilla has a TI XIO2001
    PCIe-to-PCI Bridge.  Enabling ASPM on the link leading to it allows the
    Intel SoC to enter deeper Package C-states, which is a significant power
    savings.

    [bhelgaas: commit log]
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207571
    Link: https://lore.kernel.org/r/20200505173423.26968-1-kai.heng.feng@canonical.com
    Signed-off-by: Kai-Heng Feng <kai.heng.feng@...onical.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@...gle.com>
    Reviewed-by: Mika Westerberg <mika.westerberg@...ux.intel.com>

Unfortunately it appears that this ASMedia PCIe-PCI bridge:

02:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe
to PCI Bridge [1b21:1080] (rev 04)

doesn't cope with ASPM properly and causes a bunch of PCIe link
errors. (This is in addition to some broken-ness known as far back as
2012 with these ASM1083/1085 chips with regard to PCI interrupts
getting stuck, but this ASPM problem causes issues even if no devices
are connected to the PCI side of the bridge, as is the case on my
system.)

Might need a quirk to disable ASPM on this device?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ