[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aKZW5exydL4G37gk@aschofie-mobl2.lan>
Date: Wed, 20 Aug 2025 16:14:45 -0700
From: Alison Schofield <alison.schofield@...el.com>
To: "Zhijian Li (Fujitsu)" <lizhijian@...itsu.com>
CC: "dan.j.williams@...el.com" <dan.j.williams@...el.com>, Smita Koralahalli
<Smita.KoralahalliChannabasappa@....com>, "linux-cxl@...r.kernel.org"
<linux-cxl@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "nvdimm@...ts.linux.dev"
<nvdimm@...ts.linux.dev>, "linux-fsdevel@...r.kernel.org"
<linux-fsdevel@...r.kernel.org>, "linux-pm@...r.kernel.org"
<linux-pm@...r.kernel.org>, Davidlohr Bueso <dave@...olabs.net>, "Jonathan
Cameron" <jonathan.cameron@...wei.com>, Dave Jiang <dave.jiang@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>, Ira Weiny <ira.weiny@...el.com>,
Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>, "Rafael J .
Wysocki" <rafael@...nel.org>, Len Brown <len.brown@...el.com>, Pavel Machek
<pavel@...nel.org>, Li Ming <ming.li@...omail.com>, Jeff Johnson
<jeff.johnson@....qualcomm.com>, Ying Huang <huang.ying.caritas@...il.com>,
"Xingtao Yao (Fujitsu)" <yaoxt.fnst@...itsu.com>, Peter Zijlstra
<peterz@...radead.org>, Greg KH <gregkh@...uxfoundation.org>, Nathan Fontenot
<nathan.fontenot@....com>, Terry Bowman <terry.bowman@....com>, "Robert
Richter" <rrichter@....com>, Benjamin Cheatham <benjamin.cheatham@....com>,
PradeepVineshReddy Kodamati <PradeepVineshReddy.Kodamati@....com>
Subject: Re: [PATCH v5 3/7] cxl/acpi: Add background worker to coordinate
with cxl_mem probe completion
On Tue, Aug 05, 2025 at 03:58:41AM +0000, Zhijian Li (Fujitsu) wrote:
> Hi Dan and Smita,
>
>
> On 24/07/2025 00:13, dan.j.williams@...el.com wrote:
> > dan.j.williams@ wrote:
> > [..]
> >> If the goal is: "I want to give device-dax a point at which it can make
> >> a go / no-go decision about whether the CXL subsystem has properly
> >> assembled all CXL regions implied by Soft Reserved instersecting with
> >> CXL Windows." Then that is something like the below, only lightly tested
> >> and likely regresses the non-CXL case.
> >>
> >> -- 8< --
> >> From 48b25461eca050504cf5678afd7837307b2dd14f Mon Sep 17 00:00:00 2001
> >> From: Dan Williams <dan.j.williams@...el.com>
> >> Date: Tue, 22 Jul 2025 16:11:08 -0700
> >> Subject: [RFC PATCH] dax/cxl: Defer Soft Reserved registration
> >
> > Likely needs this incremental change to prevent DEV_DAX_HMEM from being
> > built-in when CXL is not. This still leaves the awkward scenario of CXL
> > enabled, DEV_DAX_CXL disabled, and DEV_DAX_HMEM built-in. I believe that
> > safely fails in devdax only / fallback mode, but something to
> > investigate when respinning on top of this.
> >
>
> Thank you for your RFC; I find your proposal remarkably compelling, as it adeptly addresses the issues I am currently facing.
>
>
> To begin with, I still encountered several issues with your patch (considering the patch at the RFC stage, I think it is already quite commendable):
Hi Zhijian,
Like you, I tried this RFC out. It resolved the issue of soft reserved
resources preventing teardown and replacement of a region in place.
I looked at the issues you found, and have some questions comments
included below.
>
> 1. Some resources described by SRAT are wrongly identified as System RAM (kmem), such as the following: 200000000-5bffffff.
>
> ```
> 200000000-5bffffff : dax6.0
> 200000000-5bffffff : System RAM (kmem)
> 5c0001128-5c00011b7 : port1
> 5d0000000-64ffffff : CXL Window 0
> 5d0000000-64ffffff : region0
> 5d0000000-64ffffff : dax0.0
> 5d0000000-64ffffff : System RAM (kmem)
> 680000000-e7ffffff : PCI Bus 0000:00
>
> [root@...a-server ~]# dmesg | grep -i -e soft -e hotplug
> [ 0.000000] Command line: BOOT_IMAGE=(hd0,msdos1)/boot/vmlinuz-6.16.0-rc4-lizhijian-Dan+ root=UUID=386769a3-cfa5-47c8-8797-d5ec58c9cb6c ro earlyprintk=ttyS0 no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 softlockup_panic=1 printk.devkmsg=on oops=panic sysrq_always_enabled panic_on_warn ignore_loglevel kasan.fault=panic
> [ 0.000000] BIOS-e820: [mem 0x0000000180000000-0x00000001ffffffff] soft reserved
> [ 0.000000] BIOS-e820: [mem 0x00000005d0000000-0x000000064ffffff] soft reserved
> [ 0.072114] ACPI: SRAT: Node 3 PXM 3 [mem 0x200000000-0x5bffffff] hotplug
> ```
Is that range also labelled as soft reserved?
I ask, because I'm trying to draw a parallel between our test platforms.
I see -
[] BIOS-e820: [mem 0x0000024080000000-0x000004407fffffff] soft reserved
.
.
[] reserve setup_data: [mem 0x0000024080000000-0x000004407fffffff] soft reserved
.
.
[] ACPI: SRAT: Node 6 PXM 14 [mem 0x24080000000-0x4407fffffff] hotplug
/proc/iomem - as expected
24080000000-5f77fffffff : CXL Window 0
24080000000-4407fffffff : region0
24080000000-4407fffffff : dax0.0
24080000000-4407fffffff : System RAM (kmem)
I'm also seeing this message:
[] resource: Unaddressable device [mem 0x24080000000-0x4407fffffff] conflicts with [mem 0x24080000000-0x4407fffffff]
>
> 2. Triggers dev_warn and dev_err:
>
> ```
> [root@...a-server ~]# journalctl -p err -p warning --dmesg
> ...snip...
> Jul 29 13:17:36 rdma-server kernel: cxl root0: Extended linear cache calculation failed rc:-2
> Jul 29 13:17:36 rdma-server kernel: hmem hmem.1: probe with driver hmem failed with error -12
> Jul 29 13:17:36 rdma-server kernel: hmem hmem.2: probe with driver hmem failed with error -12
> Jul 29 13:17:36 rdma-server kernel: kmem dax3.0: mapping0: 0x100000000-0x17ffffff could not reserve region
> Jul 29 13:17:36 rdma-server kernel: kmem dax3.0: probe with driver kmem failed with error -16
I see the kmem dax messages also. It seems the kmem probe is going after
every range (except hotplug) in the SRAT, and failing.
> ```
>
> 3. When CXL_REGION is disabled, there is a failure to fallback to dax_hmem, in which case only CXL Window X is visible.
Haven't tested !CXL_REGION yet.
>
> On failure:
>
> ```
> 100000000-27ffffff : System RAM
> 5c0001128-5c00011b7 : port1
> 5c0011128-5c00111b7 : port2
> 5d0000000-6cffffff : CXL Window 0
> 6d0000000-7cffffff : CXL Window 1
> 7000000000-700000ffff : PCI Bus 0000:0c
> 7000000000-700000ffff : 0000:0c:00.0
> 7000001080-70000010d7 : mem1
> ```
>
> On success:
>
> ```
> 5d0000000-7cffffff : dax0.0
> 5d0000000-7cffffff : System RAM (kmem)
> 5d0000000-6cffffff : CXL Window 0
> 6d0000000-7cffffff : CXL Window 1
> ```
>
> In term of issues 1 and 2, this arises because hmem_register_device() attempts to register resources of all "HMEM devices," whereas we only need to register the IORES_DESC_SOFT_RESERVED resources. I believe resolving the current TODO will address this.
>
> ```
> - rc = region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> - IORES_DESC_SOFT_RESERVED);
> - if (rc != REGION_INTERSECTS)
> - return 0;
> + /* TODO: insert "Soft Reserved" into iomem here */
> ```
Above makes sense.
I'll probably wait for an update from Smita to test again, but if you
or Smita have anything you want me to try out on my hardwware in the
meantime, let me know.
-- Alison
>
> Regarding issue 3 (which exists in the current situation), this could be because it cannot ensure that dax_hmem_probe() executes prior to cxl_acpi_probe() when CXL_REGION is disabled.
>
> I am pleased that you have pushed the patch to the cxl/for-6.18/cxl-probe-order branch, and I'm looking forward to its integration into the upstream during the v6.18 merge window.
> Besides the current TODO, you also mentioned that this RFC PATCH must be further subdivided into several patches, so there remains significant work to be done.
> If my understanding is correct, you would be personally continuing to push forward this patch, right?
>
>
> Smita,
>
> Do you have any additional thoughts on this proposal from your side?
>
>
> Thanks
> Zhijian
>
snip
Powered by blists - more mailing lists