lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240501175450.GA866742@bhelgaas>
Date: Wed, 1 May 2024 12:54:50 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: PJ Waskiewicz <ppwaskie@...nel.org>
Cc: Dan Williams <dan.j.williams@...el.com>, linux-cxl@...r.kernel.org,
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] cxl/acpi.c: Add buggy BIOS hint for CXL ACPI lookup
 failure

On Wed, May 01, 2024 at 08:28:22AM -0700, PJ Waskiewicz wrote:
> On Mon, 2024-04-29 at 11:35 -0700, Dan Williams wrote:
> > Bjorn Helgaas wrote:
> > > On Sun, Apr 28, 2024 at 10:57:13PM -0700, PJ Waskiewicz wrote:
> > > > On Tue, 2024-04-09 at 08:22 -0500, Bjorn Helgaas wrote:
> > > > > On Sun, Apr 07, 2024 at 02:05:26PM -0700,
> > > > > ppwaskie@...nel.org wrote:
> > > > > > From: PJ Waskiewicz <ppwaskie@...nel.org>
> > > > > > 
> > > > > > Currently, Type 3 CXL devices (CXL.mem) can train using
> > > > > > host CXL drivers on Emerald Rapids systems.  However, on
> > > > > > some production systems from some vendors, a buggy BIOS
> > > > > > exists that improperly populates the ACPI => PCI mappings.
> > > > > 
> > > > > Can you be more specific about what this ACPI => PCI mapping
> > > > > is?  If you already know what the problem is, I'm sure this
> > > > > is obvious, but otherwise it's not.
> > [..] 
> > > It's just a buggy BIOS that doesn't supply _UID for an ACPI0016
> > > object, so you can't locate the corresponding CEDT entry, right?
> > 
> > Correct, the problem is 100% contained to ACPI, and PCI is
> > innocent.  The ACPI bug leads to failures to associate ACPI
> > host-bridge objects with CEDT.CHBS entries.
> 
> Sorry for the confusion here!!  I was definitely not trying to blame
> PCI.  :)
>
> > ACPI to PCI association is then typical pci_root lookup, i.e.:
> > 
> >         pci_root = acpi_pci_find_root(hb->handle);
> >         bridge = pci_root->bus->bridge;
> 
> Yes, this here.  In my use case, I'm starting with a PCIe/CXL device.
> In my driver, I try to discover the host bridge, and then the ACPI _UID
> so I can look things up in the CEDT.
> 
> So I'm trying to do the programmatic equivalent of this:
> 
> Start here in my PCIe/CXL host driver:
> 
> /sys/devices/pci0000:37/firmware_node =>
> ../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:02
> 
> Retrieve _UID (uid) from /sys/devices/pci0000:37/firmware_node/uid
> 
> Buggy BIOS, that above value resolves to CX02.  In fact, it *should* be
> 49.  This is very much a bug in the ACPI arena.
> 
> The kernel APIs allowing me to walk this path would fail in the
> acpi_evaluate_object() when trying to pass in the bad _UID (CX02).
> 
> Again, sorry for the confusion if it looked like I was trying to
> implicate PCI in any way.  The whole intent here was to leave some
> breadcrumbs so anyone else running into this wouldn't be left
> scratching their heads wondering wtf was going on.


No worries, I didn't suspect a PCI issue here; I just wasn't clear on
what ACPI=>PCI mapping was involved.  It sounds like there *is* no
such mapping in this picture (you find the ACPI object for a PCIe/CXL
host bridge, evaluate _UID from that object, and get a bogus value).

So the commit log text:

  However, on some production systems from some vendors, a buggy BIOS
  exists that improperly populates the ACPI => PCI mappings.

apparently refers to improper implementation of the _UID, which
doesn't return anything PCI related.

It also says:

  This leads to the cxl_acpi driver to fail probe when it cannot find
  the root port's _UID, in order to look up the device's CXL
  attributes in the CEDT.

I *think* strictly speaking this should refer to the *host bridge's*
_UID, not the Root Port's, e.g., something like this:

  However, on some production systems from some vendors, a buggy BIOS
  provides a CXL host bridge _UID that doesn't match anything in the
  CEDT.

Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ