[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <01956e38-5dc7-45f3-8c56-e98c9b8a3b5c@fujitsu.com>
Date: Tue, 5 Aug 2025 03:58:41 +0000
From: "Zhijian Li (Fujitsu)" <lizhijian@...itsu.com>
To: "dan.j.williams@...el.com" <dan.j.williams@...el.com>, Smita Koralahalli
<Smita.KoralahalliChannabasappa@....com>, "linux-cxl@...r.kernel.org"
<linux-cxl@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "nvdimm@...ts.linux.dev"
<nvdimm@...ts.linux.dev>, "linux-fsdevel@...r.kernel.org"
<linux-fsdevel@...r.kernel.org>, "linux-pm@...r.kernel.org"
<linux-pm@...r.kernel.org>
CC: Davidlohr Bueso <dave@...olabs.net>, Jonathan Cameron
<jonathan.cameron@...wei.com>, Dave Jiang <dave.jiang@...el.com>, Alison
Schofield <alison.schofield@...el.com>, Vishal Verma
<vishal.l.verma@...el.com>, Ira Weiny <ira.weiny@...el.com>, Matthew Wilcox
<willy@...radead.org>, Jan Kara <jack@...e.cz>, "Rafael J . Wysocki"
<rafael@...nel.org>, Len Brown <len.brown@...el.com>, Pavel Machek
<pavel@...nel.org>, Li Ming <ming.li@...omail.com>, Jeff Johnson
<jeff.johnson@....qualcomm.com>, Ying Huang <huang.ying.caritas@...il.com>,
"Xingtao Yao (Fujitsu)" <yaoxt.fnst@...itsu.com>, Peter Zijlstra
<peterz@...radead.org>, Greg KH <gregkh@...uxfoundation.org>, Nathan Fontenot
<nathan.fontenot@....com>, Terry Bowman <terry.bowman@....com>, Robert
Richter <rrichter@....com>, Benjamin Cheatham <benjamin.cheatham@....com>,
PradeepVineshReddy Kodamati <PradeepVineshReddy.Kodamati@....com>
Subject: Re: [PATCH v5 3/7] cxl/acpi: Add background worker to coordinate with
cxl_mem probe completion
Hi Dan and Smita,
On 24/07/2025 00:13, dan.j.williams@...el.com wrote:
> dan.j.williams@ wrote:
> [..]
>> If the goal is: "I want to give device-dax a point at which it can make
>> a go / no-go decision about whether the CXL subsystem has properly
>> assembled all CXL regions implied by Soft Reserved instersecting with
>> CXL Windows." Then that is something like the below, only lightly tested
>> and likely regresses the non-CXL case.
>>
>> -- 8< --
>> From 48b25461eca050504cf5678afd7837307b2dd14f Mon Sep 17 00:00:00 2001
>> From: Dan Williams <dan.j.williams@...el.com>
>> Date: Tue, 22 Jul 2025 16:11:08 -0700
>> Subject: [RFC PATCH] dax/cxl: Defer Soft Reserved registration
>
> Likely needs this incremental change to prevent DEV_DAX_HMEM from being
> built-in when CXL is not. This still leaves the awkward scenario of CXL
> enabled, DEV_DAX_CXL disabled, and DEV_DAX_HMEM built-in. I believe that
> safely fails in devdax only / fallback mode, but something to
> investigate when respinning on top of this.
>
Thank you for your RFC; I find your proposal remarkably compelling, as it adeptly addresses the issues I am currently facing.
To begin with, I still encountered several issues with your patch (considering the patch at the RFC stage, I think it is already quite commendable):
1. Some resources described by SRAT are wrongly identified as System RAM (kmem), such as the following: 200000000-5bffffff.
```
200000000-5bffffff : dax6.0
200000000-5bffffff : System RAM (kmem)
5c0001128-5c00011b7 : port1
5d0000000-64ffffff : CXL Window 0
5d0000000-64ffffff : region0
5d0000000-64ffffff : dax0.0
5d0000000-64ffffff : System RAM (kmem)
680000000-e7ffffff : PCI Bus 0000:00
[root@...a-server ~]# dmesg | grep -i -e soft -e hotplug
[ 0.000000] Command line: BOOT_IMAGE=(hd0,msdos1)/boot/vmlinuz-6.16.0-rc4-lizhijian-Dan+ root=UUID=386769a3-cfa5-47c8-8797-d5ec58c9cb6c ro earlyprintk=ttyS0 no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 softlockup_panic=1 printk.devkmsg=on oops=panic sysrq_always_enabled panic_on_warn ignore_loglevel kasan.fault=panic
[ 0.000000] BIOS-e820: [mem 0x0000000180000000-0x00000001ffffffff] soft reserved
[ 0.000000] BIOS-e820: [mem 0x00000005d0000000-0x000000064ffffff] soft reserved
[ 0.072114] ACPI: SRAT: Node 3 PXM 3 [mem 0x200000000-0x5bffffff] hotplug
```
2. Triggers dev_warn and dev_err:
```
[root@...a-server ~]# journalctl -p err -p warning --dmesg
...snip...
Jul 29 13:17:36 rdma-server kernel: cxl root0: Extended linear cache calculation failed rc:-2
Jul 29 13:17:36 rdma-server kernel: hmem hmem.1: probe with driver hmem failed with error -12
Jul 29 13:17:36 rdma-server kernel: hmem hmem.2: probe with driver hmem failed with error -12
Jul 29 13:17:36 rdma-server kernel: kmem dax3.0: mapping0: 0x100000000-0x17ffffff could not reserve region
Jul 29 13:17:36 rdma-server kernel: kmem dax3.0: probe with driver kmem failed with error -16
```
3. When CXL_REGION is disabled, there is a failure to fallback to dax_hmem, in which case only CXL Window X is visible.
On failure:
```
100000000-27ffffff : System RAM
5c0001128-5c00011b7 : port1
5c0011128-5c00111b7 : port2
5d0000000-6cffffff : CXL Window 0
6d0000000-7cffffff : CXL Window 1
7000000000-700000ffff : PCI Bus 0000:0c
7000000000-700000ffff : 0000:0c:00.0
7000001080-70000010d7 : mem1
```
On success:
```
5d0000000-7cffffff : dax0.0
5d0000000-7cffffff : System RAM (kmem)
5d0000000-6cffffff : CXL Window 0
6d0000000-7cffffff : CXL Window 1
```
In term of issues 1 and 2, this arises because hmem_register_device() attempts to register resources of all "HMEM devices," whereas we only need to register the IORES_DESC_SOFT_RESERVED resources. I believe resolving the current TODO will address this.
```
- rc = region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
- IORES_DESC_SOFT_RESERVED);
- if (rc != REGION_INTERSECTS)
- return 0;
+ /* TODO: insert "Soft Reserved" into iomem here */
```
Regarding issue 3 (which exists in the current situation), this could be because it cannot ensure that dax_hmem_probe() executes prior to cxl_acpi_probe() when CXL_REGION is disabled.
I am pleased that you have pushed the patch to the cxl/for-6.18/cxl-probe-order branch, and I'm looking forward to its integration into the upstream during the v6.18 merge window.
Besides the current TODO, you also mentioned that this RFC PATCH must be further subdivided into several patches, so there remains significant work to be done.
If my understanding is correct, you would be personally continuing to push forward this patch, right?
Smita,
Do you have any additional thoughts on this proposal from your side?
Thanks
Zhijian
> -- 8< --
> diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
> index d656e4c0eb84..3683bb3f2311 100644
> --- a/drivers/dax/Kconfig
> +++ b/drivers/dax/Kconfig
> @@ -48,6 +48,8 @@ config DEV_DAX_CXL
> tristate "CXL DAX: direct access to CXL RAM regions"
> depends on CXL_BUS && CXL_REGION && DEV_DAX
> default CXL_REGION && DEV_DAX
> + depends on CXL_ACPI >= DEV_DAX_HMEM
> + depends on CXL_PCI >= DEV_DAX_HMEM
> help
> CXL RAM regions are either mapped by platform-firmware
> and published in the initial system-memory map as "System RAM", mapped
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 0916478e3817..8bcd104111a8 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -103,7 +103,7 @@ static int hmem_register_device(struct device *host, int target_nid,
> long id;
> int rc;
>
> - if (IS_ENABLED(CONFIG_CXL_REGION) &&
> + if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
> region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> IORES_DESC_CXL) != REGION_DISJOINT) {
> switch (dax_cxl_mode) {
> @@ -209,7 +209,7 @@ static __init int dax_hmem_init(void)
> * CXL topology discovery at least once before scanning the
> * iomem resource tree for IORES_DESC_CXL resources.
> */
> - if (IS_ENABLED(CONFIG_CXL_REGION)) {
> + if (IS_ENABLED(CONFIG_DEV_DAX_CXL)) {
> request_module("cxl_acpi");
> request_module("cxl_pci");
> }
Powered by blists - more mailing lists