[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10e044b9-f126-47d5-86a3-b5b0fcc0bc14@fujitsu.com>
Date: Thu, 10 Jul 2025 08:18:05 +0000
From: "Zhijian Li (Fujitsu)" <lizhijian@...itsu.com>
To: "Koralahalli Channabasappa, Smita"
<Smita.KoralahalliChannabasappa@....com>, "linux-cxl@...r.kernel.org"
<linux-cxl@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "nvdimm@...ts.linux.dev"
<nvdimm@...ts.linux.dev>, "linux-fsdevel@...r.kernel.org"
<linux-fsdevel@...r.kernel.org>, "linux-pm@...r.kernel.org"
<linux-pm@...r.kernel.org>
CC: Davidlohr Bueso <dave@...olabs.net>, Jonathan Cameron
<jonathan.cameron@...wei.com>, Dave Jiang <dave.jiang@...el.com>, Alison
Schofield <alison.schofield@...el.com>, Vishal Verma
<vishal.l.verma@...el.com>, Ira Weiny <ira.weiny@...el.com>, Dan Williams
<dan.j.williams@...el.com>, Matthew Wilcox <willy@...radead.org>, Jan Kara
<jack@...e.cz>, "Rafael J . Wysocki" <rafael@...nel.org>, Len Brown
<len.brown@...el.com>, Pavel Machek <pavel@...nel.org>, Li Ming
<ming.li@...omail.com>, Jeff Johnson <jeff.johnson@....qualcomm.com>, Ying
Huang <huang.ying.caritas@...il.com>, "Xingtao Yao (Fujitsu)"
<yaoxt.fnst@...itsu.com>, Peter Zijlstra <peterz@...radead.org>, Greg KH
<gregkh@...uxfoundation.org>, Nathan Fontenot <nathan.fontenot@....com>,
Terry Bowman <terry.bowman@....com>, Robert Richter <rrichter@....com>,
Benjamin Cheatham <benjamin.cheatham@....com>, PradeepVineshReddy Kodamati
<PradeepVineshReddy.Kodamati@....com>
Subject: Re: [PATCH v4 7/7] cxl/dax: Defer DAX consumption of SOFT RESERVED
resources until after CXL region creation
On 10/07/2025 12:22, Koralahalli Channabasappa, Smita wrote:
>>
>> what is the impact if one consumes all SOFT RESERVED resources?
>>
>> Since `hmem_register_device()` only creates HMEM devices for ranges
>> *without* `IORES_DESC_CXL` which could be marked in cxl_acpi , cxl_core/cxl_dax
>> should still create regions and DAX devices without conflicts.
>
> You're correct that hmem_register_device() includes a check to skip
> regions overlapping with IORES_DESC_CXL. However, this check only works if the CXL region driver has already inserted those regions into iomem_resource.
IIUC, this relies on the the root decoder resource(CFMW) has be already inserted iomem_resource which is currently done in cxl_acpi
This also can be resolved by the modules loading dependence chain.
something like this:
#if IS_MODULE(CONFIG_CXL_ACPI)
MODULE_SOFTDEP("pre: cxl_acpi");
#endif
> If dax_hmem_platform_probe() runs too early (before CXL region probing), that check fails to detect overlaps — leading to erroneous registration.
>
> This is what I think. I may be wrong. Also, Alison/Dan comment here: "New approach is to not have the CXL intersecting soft reserved
> resources in iomem_resource tree."..
I think his point was the latter sentence.
"Only move them there if CXL region assembly fails and we want to make them availabe to DAX directly."
which in my understanding was remove the 'soft reserved' in the next region creating.
>
> https://lore.kernel.org/linux-cxl/ZPdoduf5IckVWQVD@aschofie-mobl2/
>
>>
>>> To resolve this, defer the DAX driver's resource consumption if the
>>> cxl_acpi driver is enabled. The DAX HMEM initialization skips walking the
>>> iomem resource tree in this case. After CXL region creation completes,
>>> any remaining SOFT RESERVED resources are explicitly registered with the
>>> DAX driver by the CXL driver.
>>
>> Conversely, with this patch applied, `cxl_region_softreserv_update()` attempts
>> to register new HMEM devices. This may cause duplicate registrations for the
>> same range (e.g., 0x180000000-0x1ffffffff), triggering warnings like:
>>
>> [ 14.984108] kmem dax4.0: mapping0: 0x180000000-0x1ffffffff could not reserve region
>> [ 14.987204] kmem dax4.0: probe with driver kmem failed with error -16
>>
>> Because the HMAT initialization already registered these sub-ranges:
>> 180000000-1bfffffff
>> 1c0000000-1ffffffff
>>
>>
>> If I'm missing something, please correct me.
>
> Yeah, this bug is due to a double invocation of hmem_register_device() once from cxl_softreserv_mem_register() and once from dax_hmem_platform_probe().
>
> When CONFIG_CXL_ACPI=y, walk_iomem_res_desc() is skipped in hmem_init(),
> so I expected hmem_active to remain empty. However, I missed the detail that the ACPI HMAT parser (drivers/acpi/numa/hmat.c) calls hmem_register_resource(), which populates hmem_active via __hmem_register_resource().
>
> Case 1 (No bug): If dax_hmem_platform_probe() runs when hmem_active is still empty.
>
> walk_hmem_resources() walks nothing — it's effectively a no-op.
>
> Later, cxl_softreserv_mem_register() is invoked to register leftover soft-reserved regions via hmem_register_device().
>
> Only one registration occurs, no conflict.
>
> Case 2: If dax_hmem_platform_probe() runs after hmem_active is populated by hmat_register_target_devices() (via hmem_register_resource()):
>
> walk_hmem_resources() iterates those regions. It invokes hmem_register_device(). Later, cxl_region driver does the same again.
>
> This results in duplicate instances for the same physical range and second call fails like below:
>
> [ 14.984108] kmem dax4.0: mapping0: 0x180000000-0x1ffffffff could not reserve region
> [ 14.987204] kmem dax4.0: probe with driver kmem failed with error -16
>
> Below, did the job to fix the above bug for me and I did incorporate this in v5.
Actually, I don't think it's a real problem, current code can tolerate such duplicating registration. so I'm fine to just turn it to dev_debug from dev_warn.
>
> static int dax_hmem_platform_probe(struct platform_device *pdev)
> {
> + if (IS_ENABLED(CONFIG_CXL_ACPI))
> + return 0;
>
> dax_hmem_pdev = pdev;
> return walk_hmem_resources(hmem_register_device);
> }
>
> Let me know if my thought process is right. I would appreciate any additional feedback or suggestions.
>
> Meanwhile, I should also mention that my approach fails, if cxl_acpi finishes probing before dax_hmem is even loaded, it attempts to call into unresolved dax_hmem symbols, causing probe failures. Particularly when CXL_BUS=y and DEV_DAX_HMEM=m.
>
> ld: vmlinux.o: in function `cxl_softreserv_mem_register':
> region.c:(.text+0xc15160): undefined reference to `hmem_register_device'
> make[2]: *** [scripts/Makefile.vmlinux:77: vmlinux] Error 1
>
> I spent some time exploring possible fixes for this symbol dependency issue, which delayed my v5 submission. I would welcome any ideas..
>
> In the meantime, I noticed your new patchset that introduces a different approach for resolving resource conflicts between CMFW and Soft Reserved regions. I will take a closer look at that.
Thanks in advance.
(I am uncertain whether such an approach has ever been proposed previously.)
>
> Thanks
> Smita
>
Powered by blists - more mailing lists