[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <687fef9ec0dd9_137e6b100c8@dwillia2-xfh.jf.intel.com.notmuch>
Date: Tue, 22 Jul 2025 13:07:58 -0700
From: <dan.j.williams@...el.com>
To: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>,
<linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<nvdimm@...ts.linux.dev>, <linux-fsdevel@...r.kernel.org>,
<linux-pm@...r.kernel.org>
CC: Davidlohr Bueso <dave@...olabs.net>, Jonathan Cameron
<jonathan.cameron@...wei.com>, Dave Jiang <dave.jiang@...el.com>, "Alison
Schofield" <alison.schofield@...el.com>, Vishal Verma
<vishal.l.verma@...el.com>, Ira Weiny <ira.weiny@...el.com>, Dan Williams
<dan.j.williams@...el.com>, Matthew Wilcox <willy@...radead.org>, Jan Kara
<jack@...e.cz>, "Rafael J . Wysocki" <rafael@...nel.org>, Len Brown
<len.brown@...el.com>, Pavel Machek <pavel@...nel.org>, Li Ming
<ming.li@...omail.com>, Jeff Johnson <jeff.johnson@....qualcomm.com>, "Ying
Huang" <huang.ying.caritas@...il.com>, Yao Xingtao <yaoxt.fnst@...itsu.com>,
Peter Zijlstra <peterz@...radead.org>, Greg KH <gregkh@...uxfoundation.org>,
Nathan Fontenot <nathan.fontenot@....com>, Smita Koralahalli
<Smita.KoralahalliChannabasappa@....com>, Terry Bowman
<terry.bowman@....com>, Robert Richter <rrichter@....com>, Benjamin Cheatham
<benjamin.cheatham@....com>, PradeepVineshReddy Kodamati
<PradeepVineshReddy.Kodamati@....com>, Zhijian Li <lizhijian@...itsu.com>
Subject: Re: [PATCH v5 0/7] Add managed SOFT RESERVE resource handling
Smita Koralahalli wrote:
> This series introduces the ability to manage SOFT RESERVED iomem
> resources, enabling the CXL driver to remove any portions that
> intersect with created CXL regions.
>
> The current approach of leaving SOFT RESERVED entries as is can result
> in failures during device hotplug such as CXL because the address range
> remains reserved and unavailable for reuse even after region teardown.
I will go through the patches, but the main concern here is not hotplug,
it is region assembly failure.
We have a constant drip of surprising platform behaviors that trip up
the driver leaving memory stranded. Specifically, device-dax defers to
CXL to assemble the region representing the soft-reserve range, CXL
fails to complete that assembly due to being confused by the platform,
end user wonders why their platform BIOS sees memory capacity that Linux
does not see.
So the priority order of solutions needed here is:
1/ Fix all shipping platform "quirks", try to prevent new ones from
being created. I.e. ideally, long term, Linux doed not need a
soft-reserve fallback and just always ignores Soft Reserve in
CXL Windows because the CXL subsystem will handle it.
2/ In the near term forseeable future, for all yet to be solved or yet
to be discovered platform quirks, provide a device-dax fallback to
recover baseline device-dax behavior (equivalent to putting cxl_acpi on
a modprobe deny-list).
3/ For hotplug, remove the conflicting resource.
> To address this, the CXL driver now uses a background worker that waits
> for cxl_mem driver probe to complete before scanning for intersecting
> resources. Then the driver walks through created CXL regions to trim any
> intersections with SOFT RESERVED resources in the iomem tree.
The precision of this gives me pause. I think it is fine to make this
more coarse because any mismatch between Soft Reserve and a CXL Window
resource should be cause to give up on the CXL side.
If a Soft Reserve range straddles a CXL window and "System RAM", give up
on trying to use the CXL driver on that system.
CXL does not completely cover a soft-reserve region, give up on trying
to use the CXL driver on that system.
Effectively anytime we detect unexpected platform shenanigans it is
likely indicating missing understanding in the Linux driver.
> The following scenarios have been tested:
Nice! Appreciate you including the test case results.
[..]
> Example 3: No alignment
> |---------- "Soft Reserved" ----------|
> |---- "Region #" ----|
Per above, CXL subsystem should completely give up in this scenario. The
BIOS said that all of the range is Conventional memory and CXL is only
creating a region for part of it. Somebody is wrong. Given the fact that
non-CXL aware OSes would try to use the entirety of the Soft Reserved
region, then this scenario is "disable CXL, it clearly does not
understand this platform".
Powered by blists - more mailing lists