lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 14 Feb 2022 08:23:05 +0800
From:   Kai-Heng Feng <kai.heng.feng@...onical.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Keith Busch <kbusch@...nel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Linux PM <linux-pm@...r.kernel.org>,
        Nirmal Patel <nirmal.patel@...ux.intel.com>,
        Jonathan Derrick <jonathan.derrick@...ux.dev>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Rob Herring <robh@...nel.org>,
        Krzysztof WilczyƄski <kw@...ux.com>,
        Linux PCI <linux-pci@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3] PCI: vmd: Honor ACPI _OSC on PCIe features

Hi Bjorn,

On Thu, Feb 10, 2022 at 5:36 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
>
> On Tue, Dec 07, 2021 at 02:15:04PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Dec 7, 2021 at 12:12 AM Keith Busch <kbusch@...nel.org> wrote:
> > > On Fri, Dec 03, 2021 at 11:15:41AM +0800, Kai-Heng Feng wrote:
> > > > When Samsung PCIe Gen4 NVMe is connected to Intel ADL VMD, the
> > > > combination causes AER message flood and drags the system performance
> > > > down.
> > > >
> > > > The issue doesn't happen when VMD mode is disabled in BIOS, since AER
> > > > isn't enabled by acpi_pci_root_create() . When VMD mode is enabled, AER
> > > > is enabled regardless of _OSC:
> > > > [    0.410076] acpi PNP0A08:00: _OSC: platform does not support [AER]
> > > > ...
> > > > [    1.486704] pcieport 10000:e0:06.0: AER: enabled with IRQ 146
> > > >
> > > > Since VMD is an aperture to regular PCIe root ports, honor ACPI _OSC to
> > > > disable PCIe features accordingly to resolve the issue.
> > >
> > > At least for some versions of this hardare, I recall ACPI is unaware of
> > > any devices in the VMD domain; the platform can not see past the VMD
> > > endpoint, so I throught the driver was supposed to always let the VMD
> > > domain use OS native support regardless of the parent's ACPI _OSC.
> >
> > This is orthogonal to whether or not ACPI is aware of the VMD domain
> > or the devices in it.
> >
> > If the platform firmware does not allow the OS to control specific
> > PCIe features at the physical host bridge level, that extends to the
> > VMD "bus", because it is just a way to expose a hidden part of the
> > PCIe hierarchy.
>
> I don't understand what's going on here.  Do we understand the AER
> message flood?  Are we just papering over it by disabling AER?

To be more precise, AER is disabled by the platform vendor in BIOS to
paper over the issue.
The only viable solution for us is to follow their settings. We may
never know what really happens underneath.

Disabling ASPM/AER/PME etc is a normal practice for ODMs unfortunately.

Kai-Heng

>
> If an error occurs below a VMD, who notices and reports it?  If we
> disable native AER below VMD because of _OSC, as this patch does, I
> guess we're assuming the platform will handle AER events below VMD.
> Is that really true?  Does the platform know how to find AER log
> registers of devices below VMD?
>
> > The platform firmware does that through ACPI _OSC under the host
> > bridge device (not under the VMD device) which it is very well aware
> > of.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ