lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 5 Jan 2022 14:12:26 -0600
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Kai-Heng Feng <kai.heng.feng@...onical.com>
Cc:     bhelgaas@...gle.com, mika.westerberg@...ux.intel.com,
        koba.ko@...onical.com, Lukas Wunner <lukas@...ner.de>,
        Stuart Hayes <stuart.w.hayes@...il.com>,
        Jan Kiszka <jan.kiszka@...mens.com>, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI/portdrv: Skip enabling AER on external facing ports

On Wed, Jan 05, 2022 at 02:06:41PM +0800, Kai-Heng Feng wrote:
> The Thunderbolt root ports may constantly spew out uncorrected errors
> from AER service:
> [   30.100211] pcieport 0000:00:1d.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:1d.0
> [   30.100251] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> [   30.100256] pcieport 0000:00:1d.0:   device [8086:7ab0] error status/mask=00100000/00004000
> [   30.100262] pcieport 0000:00:1d.0:    [20] UnsupReq               (First)
> [   30.100267] pcieport 0000:00:1d.0: AER:   TLP Header: 34000000 08000052 00000000 00000000
> [   30.100372] thunderbolt 0000:0a:00.0: AER: can't recover (no error_detected callback)
> [   30.100401] xhci_hcd 0000:3e:00.0: AER: can't recover (no error_detected callback)
> [   30.100427] pcieport 0000:00:1d.0: AER: device recovery failed

No timestamps needed here; they don't add to understanding the
problem.

> The link may not be reliable on external facing ports, so don't enable
> AER on those ports.

I'm not sure what you want to accomplish here.  If the errors are
legitimate and the result of some hardware issue like a bad cable, why
should we ignore them?  If they're caused by a software problem, we
should figure that out and fix it.

Does this occur on a specific instance of possibly flaky hardware?

You mention a spew of errors; do you think this is a single error that
we fail to clear correctly?  Or is it really many separate errors?

> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@...onical.com>
> ---
>  drivers/pci/pcie/portdrv_core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index bda630889f955..d464d00ade8f2 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -219,7 +219,8 @@ static int get_port_device_capability(struct pci_dev *dev)
>  
>  #ifdef CONFIG_PCIEAER
>  	if (dev->aer_cap && pci_aer_available() &&
> -	    (pcie_ports_native || host->native_aer)) {
> +	    (pcie_ports_native || host->native_aer) &&
> +	    !dev->external_facing) {
>  		services |= PCIE_PORT_SERVICE_AER;
>  
>  		/*
> -- 
> 2.33.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ