lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20260204232754.9531-1-tomasz.wolski@fujitsu.com>
Date: Thu,  5 Feb 2026 00:27:54 +0100
From: Tomasz Wolski <tomasz.wolski@...itsu.com>
To: dan.j.williams@...el.com
Cc: Smita.KoralahalliChannabasappa@....com,
	alison.schofield@...el.com,
	ardb@...nel.org,
	benjamin.cheatham@....com,
	bp@...en8.de,
	dave.jiang@...el.com,
	dave@...olabs.net,
	gregkh@...uxfoundation.org,
	huang.ying.caritas@...il.com,
	ira.weiny@...el.com,
	jack@...e.cz,
	jeff.johnson@....qualcomm.com,
	jonathan.cameron@...wei.com,
	len.brown@...el.com,
	linux-cxl@...r.kernel.org,
	linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	linux-pm@...r.kernel.org,
	lizhijian@...itsu.com,
	ming.li@...omail.com,
	nathan.fontenot@....com,
	nvdimm@...ts.linux.dev,
	pavel@...nel.org,
	peterz@...radead.org,
	rafael@...nel.org,
	rrichter@....com,
	skoralah@....com,
	terry.bowman@....com,
	tomasz.wolski@...itsu.com,
	vishal.l.verma@...el.com,
	willy@...radead.org,
	yaoxt.fnst@...itsu.com,
	yazen.ghannam@....com
Subject: Re: [PATCH v5 6/7] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges

>> > I was thinking through what Alison asked about what to do later in boot
>> > when other regions are being dynamically created. It made me wonder if
>> > this safety can be achieved more easily by just making sure that the
>> > alloc_dax_region() call fails.
>> 
>> Agreed with all the points above, including making alloc_dax_region() 
>> fail as the safety mechanism. This also cleanly avoids the no Soft 
>> Reserved case Alison pointed out, where dax_cxl_mode can remain stuck in 
>> DEFER and return -EPROBE_DEFER.
>> 
>> What I’m still trying to understand is the case of “other regions being 
>> dynamically created.” Once HMEM has claimed the relevant HPA range, any 
>> later userspace attempts to create regions (via cxl create-region) 
>> should naturally fail due to the existing HPA allocation. This already 
>> shows up as an HPA allocation failure currently.
>> 
>> #cxl create-region -d decoder0.0 -m mem2 -w 1 -g256
>> cxl region: create_region: region0: set_size failed: Numerical result 
>> out of range
>> cxl region: cmd_create_region: created 0 regions
>> 
>> And in the dmesg:
>> [  466.819353] alloc_hpa: cxl region0: HPA allocation error (-34) for 
>> size:0x0000002000000000 in CXL Window 0 [mem 0x850000000-0x284fffffff 
>> flags 0x200]
>> 
>> Also, at this point, with the probe-ordering fixes and the use of 
>> wait_for_device_probe(), region probing should have fully completed.
>> 
>> Am I missing any other scenario where regions could still be created 
>> dynamically beyond this?
>
>The concern is what to do about regions and memory devices that are
>completely innocent. So, for example imagine deviceA is handled by BIOS
>and deviceB is ignored by BIOS. If deviceB was ignored by BIOS then it
>would be rude to tear down any regions that might be established for
>deviceB. So if alloc_dax_region() exclusion and HPA space reservation
>prevent future collisions while not disturbing innocent devices, then I
>think userspace can pick up the pieces from there.

I'm trying to follow the idea of "deviceB being ignored by BIOS" 
Do you consider hot-plug devices and user creating reqions manually? 
Could you please describe such scenario?

>> > Something like (untested / incomplete, needs cleanup handling!)
>> > 
>> > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
>> > index fde29e0ad68b..fd18343e0538 100644
>> > --- a/drivers/dax/bus.c
>> > +++ b/drivers/dax/bus.c
>> > @@ -10,6 +10,7 @@
>> >   #include "dax-private.h"
>> >   #include "bus.h"
>> >   
>> > +static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");
>> >   static DEFINE_MUTEX(dax_bus_lock);
>> >   
>> >   /*
>> > @@ -661,11 +662,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
>> >          dax_region->dev = parent;
>> >          dax_region->target_node = target_node;
>> >          ida_init(&dax_region->ida);
>> > -       dax_region->res = (struct resource) {
>> > -               .start = range->start,
>> > -               .end = range->end,
>> > -               .flags = IORESOURCE_MEM | flags,
>> > -       };
>> > +       dax_region->res = __request_region(&dax_regions, range->start, range->end, flags);
>> >   
>> >          if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
>> >                  kfree(dax_region);
>> > 
>> > ...which will result in enforcing only one of dax_hmem or dax_cxl being
>> > able to register a dax_region.
>> > 
>> > Yes, this would leave a mess of disabled cxl_dax_region devices lying
>> > around, but it would leave more breadcrumbs for debug, and reduce the
>> > number of races you need to worry about.
>> > 
>> > In other words, I thought total teardown would be simpler, but as the
>> > feedback keeps coming in, I think that brings a different set of
>> > complexity. So just inject failures for dax_cxl to trip over and then we
>> > can go further later to effect total teardown if that proves to not be
>> > enough.
>> 
>> One concern with the approach of not tearing down CXL regions is the 
>> state it leaves behind in /proc/iomem. Soft Reserved ranges are 
>> REGISTERed to HMEM while CXL regions remain present. The resulting 
>> nesting (dax under region, region under window and window under SR) 
>> visually suggests a coherent CXL hierarchy, even though ownership has 
>> effectively moved to HMEM. When users, then attempt to tear regions down 
>> and recreate them from userspace, they hit the same HPA allocation 
>> failures described above.
>
>So this gets back to a question of do we really need "Soft Reserved" to
>show up in /proc/iomem? It is an ABI change to stop publishing it
>altogether, so at a minimum we need to be prepared to keep publishing it
>if it causes someone's working setup to regress.
>
>The current state of the for-7.0/cxl-init branch drops publishing "Soft
>Reserved". I am cautiously optimistic no one notices as long as DAX
>devices keep appearing, but at the first sign of regression we need a
>plan B.
>
>> If we decide not to tear down regions in the REGISTER case, should we 
>> gate decoder resets during user initiated region teardown? Today, 
>> decoders are reset when regions are torn down dynamically, and 
>> subsequent attempts to recreate regions can trigger a large amount of 
>> mailbox traffic. Much of what shows up as repeated “Reading event logs/ 
>> Clearing …” messages which ends up interleaved with the HPA allocation 
>> failure, which can be confusing.
>
>One of the nice side effects of installing the "Soft Reserved" entries
>late, when HMEM takes over, is that they are easier to remove.
>
>So the flow would be, if you know what you are doing, is to disable the
>HMEM device which uninstalls the "Soft Reserved" entries, before trying
>to decommit the region and reclaim the HPA space.
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ