[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zctg4Lx_y_hbChUW@rric.localdomain>
Date: Tue, 13 Feb 2024 13:30:24 +0100
From: Robert Richter <rrichter@....com>
To: Dan Williams <dan.j.williams@...el.com>
Cc: Davidlohr Bueso <dave@...olabs.net>,
Jonathan Cameron <jonathan.cameron@...wei.com>,
Dave Jiang <dave.jiang@...el.com>,
Alison Schofield <alison.schofield@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>,
Ira Weiny <ira.weiny@...el.com>, linux-cxl@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cxl/pci: Fix disabling CXL memory for zero-based
addressing
Dan,
On 09.02.24 12:22:01, Dan Williams wrote:
> Robert Richter wrote:
> > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> > index 569354a5536f..3a36a2f0c94f 100644
> > --- a/drivers/cxl/core/pci.c
> > +++ b/drivers/cxl/core/pci.c
> > @@ -466,6 +466,18 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
> > for (i = 0, allowed = 0; info->mem_enabled && i < info->ranges; i++) {
> > struct device *cxld_dev;
> >
> > + /*
> > + * Handle zero-based hardware addresses
> > + */
> > + if (!info->dvsec_range[i].start &&
> > + info->dvsec_range[i].end != CXL_RESOURCE_NONE &&
> > + info->dvsec_range[i].end) {
> > + dev_dbg(dev, "Zero-based hardware range found [%#llx - %#llx]\n",
> > + info->dvsec_range[i].start, info->dvsec_range[i].end);
> > + allowed++;
> > + continue;
> > + }
> > +
>
> I am not comfortable with this. It should be checking a platform
> specific quirk, or similar for the possibility of HPA != SPA. The
> entirety of the Linux CXL subsystem is built on the assumption that HPA
> == SPA, and if a platform wants to inject an offset between those Linux
> needs some way to enumerate that it is running in that new world. Yes,
> nothing in the CXL specification precludes HPA != SPA, but Linux has
> long since shipped the opposite assumption.
this check prevents the memory from disabling an enabled decoder. So it
just keeps everything as it comes out of firmware.
Can you explain the motivation why active memory is disabled? This may
take system memory offline and could lead to a kernel hang. The same
could happen if the CEDT is broken or just missing (it is not
mandatory for 1.1). Such systems just die when booting. So the check
to take memory offline should be changed in a way that it will be safe
to disable it.
A platform check would fix that only for certain systems.
Thanks,
-Robert
Powered by blists - more mailing lists