lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 8 Apr 2024 12:29:53 -0700
From: PJ Waskiewicz <ppwaskie@...nel.org>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: Lukas Wunner <lukas@...ner.de>, Dan Williams <dan.j.williams@...el.com>,
	linux-cxl@...r.kernel.org, linux-pci@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] cxl/acpi.c: Add buggy BIOS hint for CXL ACPI lookup
 failure

On 24/04/08 09:34AM, Jonathan Cameron wrote:
> On Sun, 7 Apr 2024 19:03:23 -0700
> PJ Waskiewicz <ppwaskie@...nel.org> wrote:
> 
> > On 24/04/07 11:28PM, Lukas Wunner wrote:
> > 
> > Hi Lukas,
> > 
> > > On Sun, Apr 07, 2024 at 02:05:26PM -0700, ppwaskie@...nel.org wrote:  
> > > > --- a/drivers/cxl/acpi.c
> > > > +++ b/drivers/cxl/acpi.c
> > > > @@ -504,7 +504,7 @@ static int cxl_get_chbs(struct device *dev, struct acpi_device *hb,
> > > >  
> > > >  	rc = acpi_evaluate_integer(hb->handle, METHOD_NAME__UID, NULL, &uid);
> > > >  	if (rc != AE_OK) {
> > > > -		dev_err(dev, "unable to retrieve _UID\n");
> > > > +		dev_err(dev, "unable to retrieve _UID. Potentially buggy BIOS\n");
> > > >  		return -ENOENT;
> > > >  	}  
> > > 
> > > dev_err(dev, FW_BUG "unable to retrieve _UID\n");
> > >              ^^^^^^
> > > 
> > > There's a macro for that.  
> > 
> > Doh...it's been awhile since I've crossed buggy BIOS's.  Thanks for the
> > review and comment.
> > 
> > Updated patch:
> > 
> > cxl/acpi.c: Add buggy BIOS hint for CXL ACPI lookup failure
> > 
> > From: PJ Waskiewicz <ppwaskie@...nel.org>
> > 
> > Currently, Type 3 CXL devices (CXL.mem) can train using host CXL
> > drivers on Emerald Rapids systems.  However, on some production
> > systems from some vendors, a buggy BIOS exists that improperly
> > populates the ACPI => PCI mappings.  This leads to the cxl_acpi
> > driver to fail probe when it cannot find the root port's _UID, in
> > order to look up the device's CXL attributes in the CEDT.
> > 
> > Add a bit more of a descriptive message that the lookup failure
> > could be a bad BIOS, rather than just "failed."
> > 
> > v2: Updated message to use existing FW_BUG macro
> Move the change log "v2..." etc below the ---
> as we don't want it in the long term git log + better to send a fresh
> patch in a separate thread.

Thanks, it's been awhile, and my normal (i.e. old) workflow isn't
available to me just quite yet.  I can fix and send a new patch, but
I'll hold off and see what Dan's thoughts are after my reply to his
reply.

> Other than that seems reasonable to hint it is probably a bios
> bug - however I wonder how many other cases we should do this for and
> whether it is worth the effort of marking them all?

I can confirm this was definitely a BIOS bug in this particular case.
The vendor spun a quick test BIOS for us to test on an EMR and SPR host,
and the _UID's were finally correct.  I could successfully walk the CEDT
and get to the CAPS structs I was after (link speed, bus width, etc.).

I'd be fine also marking the others, but I don't have any easy way to
validate that I'd hit those cases.  My BIOS for this platform is only
minorly broken.  I suppose it could be mocked in QEMU to cause those to
fail...

-PJ

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ