[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220105201226.GA218998@bhelgaas>
Date: Wed, 5 Jan 2022 14:12:26 -0600
From: Bjorn Helgaas <helgaas@...nel.org>
To: Kai-Heng Feng <kai.heng.feng@...onical.com>
Cc: bhelgaas@...gle.com, mika.westerberg@...ux.intel.com,
koba.ko@...onical.com, Lukas Wunner <lukas@...ner.de>,
Stuart Hayes <stuart.w.hayes@...il.com>,
Jan Kiszka <jan.kiszka@...mens.com>, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI/portdrv: Skip enabling AER on external facing ports
On Wed, Jan 05, 2022 at 02:06:41PM +0800, Kai-Heng Feng wrote:
> The Thunderbolt root ports may constantly spew out uncorrected errors
> from AER service:
> [ 30.100211] pcieport 0000:00:1d.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:1d.0
> [ 30.100251] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> [ 30.100256] pcieport 0000:00:1d.0: device [8086:7ab0] error status/mask=00100000/00004000
> [ 30.100262] pcieport 0000:00:1d.0: [20] UnsupReq (First)
> [ 30.100267] pcieport 0000:00:1d.0: AER: TLP Header: 34000000 08000052 00000000 00000000
> [ 30.100372] thunderbolt 0000:0a:00.0: AER: can't recover (no error_detected callback)
> [ 30.100401] xhci_hcd 0000:3e:00.0: AER: can't recover (no error_detected callback)
> [ 30.100427] pcieport 0000:00:1d.0: AER: device recovery failed
No timestamps needed here; they don't add to understanding the
problem.
> The link may not be reliable on external facing ports, so don't enable
> AER on those ports.
I'm not sure what you want to accomplish here. If the errors are
legitimate and the result of some hardware issue like a bad cable, why
should we ignore them? If they're caused by a software problem, we
should figure that out and fix it.
Does this occur on a specific instance of possibly flaky hardware?
You mention a spew of errors; do you think this is a single error that
we fail to clear correctly? Or is it really many separate errors?
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@...onical.com>
> ---
> drivers/pci/pcie/portdrv_core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index bda630889f955..d464d00ade8f2 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -219,7 +219,8 @@ static int get_port_device_capability(struct pci_dev *dev)
>
> #ifdef CONFIG_PCIEAER
> if (dev->aer_cap && pci_aer_available() &&
> - (pcie_ports_native || host->native_aer)) {
> + (pcie_ports_native || host->native_aer) &&
> + !dev->external_facing) {
> services |= PCIE_PORT_SERVICE_AER;
>
> /*
> --
> 2.33.1
>
Powered by blists - more mailing lists