lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zcu-cVG9deqfwdiV@rric.localdomain>
Date: Tue, 13 Feb 2024 20:09:37 +0100
From: Robert Richter <rrichter@....com>
To: Dan Williams <dan.j.williams@...el.com>
Cc: Davidlohr Bueso <dave@...olabs.net>,
	Jonathan Cameron <jonathan.cameron@...wei.com>,
	Dave Jiang <dave.jiang@...el.com>,
	Alison Schofield <alison.schofield@...el.com>,
	Vishal Verma <vishal.l.verma@...el.com>,
	Ira Weiny <ira.weiny@...el.com>, linux-cxl@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cxl/pci: Fix disabling CXL memory for zero-based
 addressing

On 13.02.24 10:40:07, Dan Williams wrote:
> Robert Richter wrote:
> > Dan,
> > 
> > On 09.02.24 12:22:01, Dan Williams wrote:
> > > Robert Richter wrote:
> > 
> > > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> > > > index 569354a5536f..3a36a2f0c94f 100644
> > > > --- a/drivers/cxl/core/pci.c
> > > > +++ b/drivers/cxl/core/pci.c
> > > > @@ -466,6 +466,18 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
> > > >  	for (i = 0, allowed = 0; info->mem_enabled && i < info->ranges; i++) {
> > > >  		struct device *cxld_dev;
> > > >  
> > > > +		/*
> > > > +		 * Handle zero-based hardware addresses
> > > > +		 */
> > > > +		if (!info->dvsec_range[i].start &&
> > > > +		    info->dvsec_range[i].end != CXL_RESOURCE_NONE &&
> > > > +		    info->dvsec_range[i].end) {
> > > > +			dev_dbg(dev, "Zero-based hardware range found [%#llx - %#llx]\n",
> > > > +				info->dvsec_range[i].start, info->dvsec_range[i].end);
> > > > +			allowed++;
> > > > +			continue;
> > > > +		}
> > > > +
> > > 
> > > I am not comfortable with this. It should be checking a platform
> > > specific quirk, or similar for the possibility of HPA != SPA. The
> > > entirety of the Linux CXL subsystem is built on the assumption that HPA
> > > == SPA, and if a platform wants to inject an offset between those Linux
> > > needs some way to enumerate that it is running in that new world. Yes,
> > > nothing in the CXL specification precludes HPA != SPA, but Linux has
> > > long since shipped the opposite assumption.
> > 
> > this check prevents the memory from disabling an enabled decoder. So it
> > just keeps everything as it comes out of firmware.
> > 
> > Can you explain the motivation why active memory is disabled?
> 
> It is a sanity check that Linux is operating in a CXL world that it
> understands. The fundamental assumption is that the CFMWS correctly
> conveys the CXL space, and that the HW decoder resources match platform
> expectations match Linux resource management.

It would be sane to just not use CXL if assumptions on it are not
valid and not to break system to boot.

> 
> > This may take system memory offline and could lead to a kernel hang.
> 
> Yes, that is not an unreasonable result when Linux fundamental
> assumptions are violated.

BUG_ON(fw_table_broken)? If at all, it is not mandatory to have a
CFMWS. Btw, the check is more strict and also checks memory
attributes. It is very likely something can break.

> 
> > The same could happen if the CEDT is broken or just missing (it is not
> > mandatory for 1.1). Such systems just die when booting. So the check
> > to take memory offline should be changed in a way that it will be safe
> > to disable it.
> > 
> > A platform check would fix that only for certain systems.
> 
> I am not worried about platforms that accidentally break the CEDT, those
> mistakes are typically caught pre-production. Otherwise, if the platform
> is designed to break assumptions then Linux needs explicit enabling for
> the address translation sitting between the endpoint and the platform
> address map.

As said, if assumptions break, just let cxl init fail.

Decoders should just be disabled if the address mapping is invalid.
There is no point to tear down the system.

A platform check is fine to me, but in the end this looks like a
whitelist to let systems boot.

Thanks,

-Robert

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ