lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 7 Jan 2022 12:09:57 +0800
From:   Kai-Heng Feng <kai.heng.feng@...onical.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     bhelgaas@...gle.com, mika.westerberg@...ux.intel.com,
        koba.ko@...onical.com, Lukas Wunner <lukas@...ner.de>,
        Stuart Hayes <stuart.w.hayes@...il.com>,
        Jan Kiszka <jan.kiszka@...mens.com>, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI/portdrv: Skip enabling AER on external facing ports

On Thu, Jan 6, 2022 at 4:12 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
>
> On Wed, Jan 05, 2022 at 02:06:41PM +0800, Kai-Heng Feng wrote:
> > The Thunderbolt root ports may constantly spew out uncorrected errors
> > from AER service:
> > [   30.100211] pcieport 0000:00:1d.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:1d.0
> > [   30.100251] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > [   30.100256] pcieport 0000:00:1d.0:   device [8086:7ab0] error status/mask=00100000/00004000
> > [   30.100262] pcieport 0000:00:1d.0:    [20] UnsupReq               (First)
> > [   30.100267] pcieport 0000:00:1d.0: AER:   TLP Header: 34000000 08000052 00000000 00000000
> > [   30.100372] thunderbolt 0000:0a:00.0: AER: can't recover (no error_detected callback)
> > [   30.100401] xhci_hcd 0000:3e:00.0: AER: can't recover (no error_detected callback)
> > [   30.100427] pcieport 0000:00:1d.0: AER: device recovery failed
>
> No timestamps needed here; they don't add to understanding the
> problem.

Got it. Will remove it for later iteration.

>
> > The link may not be reliable on external facing ports, so don't enable
> > AER on those ports.
>
> I'm not sure what you want to accomplish here.  If the errors are
> legitimate and the result of some hardware issue like a bad cable, why
> should we ignore them?  If they're caused by a software problem, we
> should figure that out and fix it.
>
> Does this occur on a specific instance of possibly flaky hardware?

Only from root ports of thunderbolt devices.

The error occurs as soon as the root port is runtime suspended to D3cold.

Runtime suspend the AER service can resolve the issue. I wonder if
it's the right thing to do here?
D3cold should also mean the PCI link is gone, disabling AER seems to
be a reasonable approach.

Kai-Heng

>
> You mention a spew of errors; do you think this is a single error that
> we fail to clear correctly?  Or is it really many separate errors?
>
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453
> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@...onical.com>
> > ---
> >  drivers/pci/pcie/portdrv_core.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> > index bda630889f955..d464d00ade8f2 100644
> > --- a/drivers/pci/pcie/portdrv_core.c
> > +++ b/drivers/pci/pcie/portdrv_core.c
> > @@ -219,7 +219,8 @@ static int get_port_device_capability(struct pci_dev *dev)
> >
> >  #ifdef CONFIG_PCIEAER
> >       if (dev->aer_cap && pci_aer_available() &&
> > -         (pcie_ports_native || host->native_aer)) {
> > +         (pcie_ports_native || host->native_aer) &&
> > +         !dev->external_facing) {
> >               services |= PCIE_PORT_SERVICE_AER;
> >
> >               /*
> > --
> > 2.33.1
> >

Powered by blists - more mailing lists