lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 13 Feb 2023 18:10:05 -0600
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Kai-Heng Feng <kai.heng.feng@...onical.com>
Cc:     bhelgaas@...gle.com, Mario Limonciello <mario.limonciello@....com>,
        Mika Westerberg <mika.westerberg@...ux.intel.com>,
        Keith Busch <kbusch@...nel.org>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Pali Rohár <pali@...nel.org>,
        Stefan Roese <sr@...x.de>, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI/portdrv: Avoid enabling AER on Thunderbolt devices

On Wed, Feb 08, 2023 at 09:33:18PM +0800, Kai-Heng Feng wrote:
> Hi Bjorn,
> 
> Sorry for the belated response.
> 
> On Wed, Jan 18, 2023 at 7:14 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
> >
> > On Mon, Dec 26, 2022 at 11:30:31PM +0800, Kai-Heng Feng wrote:
> > > We are seeing igc ethernet device on Thunderbolt dock stops working
> > > after S3 resume because of AER error, or even make S3 resume freeze:
> > > pcieport 0000:00:1d.0: AER: Multiple Corrected error received: 0000:00:1d.0
> > > pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Transaction Layer, (Receiver ID)
> > > pcieport 0000:00:1d.0:   device [8086:7ab0] error status/mask=00008000/00002000
> > > pcieport 0000:00:1d.0:    [15] HeaderOF
> > > pcieport 0000:00:1d.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:1d.0
> > > pcieport 0000:00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > > pcieport 0000:00:1d.0:   device [8086:7ab0] error status/mask=00100000/00004000
> > > pcieport 0000:00:1d.0:    [20] UnsupReq               (First)
> > > pcieport 0000:00:1d.0: AER:   TLP Header: 34000000 0a000052 00000000 00000000
> > > pcieport 0000:00:1d.0: AER:   Error of this Agent is reported first
> > > pcieport 0000:04:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > > pcieport 0000:04:01.0:   device [8086:1136] error status/mask=00300000/00000000
> > > pcieport 0000:04:01.0:    [20] UnsupReq               (First)
> > > pcieport 0000:04:01.0:    [21] ACSViol
> > > pcieport 0000:04:01.0: AER:   TLP Header: 34000000 04000052 00000000 00000000
> > > thunderbolt 0000:05:00.0: AER: can't recover (no error_detected callback)
> >
> > Is this a regression?  E.g., is this something that started after
> > f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native") or
> > something similar?
> 
> Reverting the commit doesn't help. Because 0000:00:1d.0 is already
> native so AER is already enabled.

OK.  Unless I missed it, we don't really have a root cause or a good
reason to disable AER on removable devices.  I don't want to disable
AER indiscriminately.  The fact that we see errors doesn't seem like a
good enough reason.

Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ