[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251203133552.15468-1-tomasz.wolski@fujitsu.com>
Date: Wed, 3 Dec 2025 14:35:38 +0100
From: Tomasz Wolski <tomasz.wolski@...itsu.com>
To: alison.schofield@...el.com
Cc: Smita.KoralahalliChannabasappa@....com,
ardb@...nel.org,
benjamin.cheatham@....com,
bp@...en8.de,
dan.j.williams@...el.com,
dave.jiang@...el.com,
dave@...olabs.net,
gregkh@...uxfoundation.org,
huang.ying.caritas@...il.com,
ira.weiny@...el.com,
jack@...e.cz,
jeff.johnson@....qualcomm.com,
jonathan.cameron@...wei.com,
len.brown@...el.com,
linux-cxl@...r.kernel.org,
linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org,
linux-pm@...r.kernel.org,
lizhijian@...itsu.com,
ming.li@...omail.com,
nathan.fontenot@....com,
nvdimm@...ts.linux.dev,
pavel@...nel.org,
peterz@...radead.org,
rafael@...nel.org,
rrichter@....com,
terry.bowman@....com,
vishal.l.verma@...el.com,
willy@...radead.org,
yaoxt.fnst@...itsu.com,
yazen.ghannam@....com
Subject: Re: [PATCH v4 0/9] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM
>> This series aims to address long-standing conflicts between HMEM and
>> CXL when handling Soft Reserved memory ranges.
>>
>> Reworked from Dan's patch:
>> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/patch/?id=ab70c6227ee6165a562c215d9dcb4a1c55620d5d
>>
>> Previous work:
>> https://lore.kernel.org/all/20250715180407.47426-1-Smita.KoralahalliChannabasappa@amd.com/
>>
>> Link to v3:
>> https://lore.kernel.org/all/20250930044757.214798-1-Smita.KoralahalliChannabasappa@amd.com
>>
>> This series should be applied on top of:
>> "214291cbaace: acpi/hmat: Fix lockdep warning for hmem_register_resource()"
>> and is based on:
>> base-commit: 211ddde0823f1442e4ad052a2f30f050145ccada
>>
>> I initially tried picking up the three probe ordering patches from v20/v21
>> of Type 2 support, but I hit a NULL pointer dereference in
>> devm_cxl_add_memdev() and cycle dependency with all patches so I left
>> them out for now. With my current series rebased on 6.18-rc2 plus
>> 214291cbaace, probe ordering behaves correctly on AMD systems and I have
>> verified the scenarios mentioned below. I can pull those three patches
>> back in for a future revision once the failures are sorted out.
>
>Hi Smita,
>
>This is a regression from the v3 version for my hotplug test case.
>I believe at least partially due to the ommitted probe order patches.
>I'm not clear why that 'dax18.0' still exists after region teardown.
>
>Upon booting:
>- Do not expect to see that Soft Reserved resource
>
>68e80000000-8d37fffffff : CXL Window 9
> 68e80000000-70e7fffffff : region9
> 68e80000000-70e7fffffff : Soft Reserved
> 68e80000000-70e7fffffff : dax18.0
> 68e80000000-70e7fffffff : System RAM (kmem)
>
>After region teardown:
>- Do not expect to see that Soft Reserved resource
>- Do not expect to see that DAX or kmem
>
>68e80000000-8d37fffffff : CXL Window 9
> 68e80000000-70e7fffffff : Soft Reserved
> 68e80000000-70e7fffffff : dax18.0
> 68e80000000-70e7fffffff : System RAM (kmem)
>
>Create the region anew:
>- Here we see a new region and dax devices created in the
>available space after the Soft Reserved. We don't want
>that. We want to be able to recreate in that original
>space of 68e80000000-70e7fffffff.
>
>68e80000000-8d37fffffff : CXL Window 9
> 68e80000000-70e7fffffff : Soft Reserved
> 68e80000000-70e7fffffff : dax18.0
> 68e80000000-70e7fffffff : System RAM (kmem)
> 70e80000000-78e7fffffff : region9
> 70e80000000-78e7fffffff : dax9.0
> 70e80000000-78e7fffffff : System RAM (kmem)
>
>
>-- Alison
Hello Smita, Alison
I did some testing and came across issues with probe order so I applied the
three patches mentioned by Smita + fix for the NULL dereference.
I noticed issues in scenario 3.1 and 4 below but maybe they are related to
the test setup:
[1] QEMU: 1 CFMWS + Host-bridge + 1 CXL device
Soft reserve in not seen in the iomem:
a90000000-b8fffffff : CXL Window 0
a90000000-b8fffffff : region0
a90000000-b8fffffff : dax0.0
a90000000-b8fffffff : System RAM (kmem)
kernel: [ 0.000000][ T0] BIOS-e820: [mem 0x0000000a90000000-0x0000000b8fffffff] soft reserved
== region teardown
a90000000-b8fffffff : CXL Window 0
// no dax devices
== region recreate
a90000000-b8fffffff : CXL Window 0
a90000000-b8fffffff : region0
a90000000-b8fffffff : dax0.0
a90000000-b8fffffff : System RAM (kmem)
== booted with no PCI attached
a90000000-b8fffffff : Soft Reserved
a90000000-b8fffffff : CXL Window 0
a90000000-b8fffffff : dax1.0
a90000000-b8fffffff : System RAM (kmem)
== ..and hot plug via QEMU terminal => is the following iomem tree expected?
a90000000-b8fffffff : Soft Reserved
a90000000-b8fffffff : CXL Window 0
a90000000-b8fffffff : region0
a90000000-b8fffffff : dax1.0
a90000000-b8fffffff : System RAM (kmem)
kernel: [ 129.820136][ T65] cxl_acpi ACPI0017:00: decoder0.0: created region0
..
kernel: [ 129.827126][ T65] cxl_region region0: [mem 0xa90000000-0xb8fffffff flags 0x200] has System RAM: [mem 0xa90000000-0xb8fffffff flags 0x83000200]
[1.1] QEMU: 1 CFMWS + Host-bridge + 1 CXL device
Region is smaller than SR - hmem claims the space
a90000000-bcfffffff : Soft Reserved
a90000000-bcfffffff : CXL Window 0
a90000000-bcfffffff : dax1.0
a90000000-bcfffffff : System RAM (kmem)
[2] QEMU: 1 CFMWS + Host-bridge + 2 CXL devices
kernel: [ 0.000000][ T0] BIOS-e820: [mem 0x0000000a90000000-0x0000000c8fffffff] soft reserved
a90000000-c8fffffff : CXL Window 0
a90000000-b8fffffff : region1
a90000000-b8fffffff : dax1.0
a90000000-b8fffffff : System RAM (kmem)
b90000000-c8fffffff : region0
b90000000-c8fffffff : dax0.0
b90000000-c8fffffff : System RAM (kmem)
== region1 teardown
a90000000-c8fffffff : CXL Window 0
a90000000-b8fffffff : region0
a90000000-b8fffffff : dax0.0
a90000000-b8fffffff : System RAM (kmem)
== recreate region1 - created in correct address range
a90000000-c8fffffff : CXL Window 0
a90000000-b8fffffff : region0
a90000000-b8fffffff : dax0.0
a90000000-b8fffffff : System RAM (kmem)
b90000000-c8fffffff : region1
b90000000-c8fffffff : dax1.0
b90000000-c8fffffff : System RAM (kmem)
[2.1] QEMU: 1 CFMWS + Host-bridge + 2 CXL devices
Region is smaller than SR - hmem claims the whole space
kernel: [ 0.000000][ T0] BIOS-e820: [mem 0x0000000a90000000-0x0000000ccfffffff] soft reserved
a90000000-ccfffffff : Soft Reserved
a90000000-ccfffffff : CXL Window 0
a90000000-ccfffffff : dax1.0
a90000000-ccfffffff : System RAM (kmem)
[3] QEMU: 2 CFMWS + Host-bridge + 2 CXL devices
a90000000-b8fffffff : CXL Window 0
a90000000-b8fffffff : region0
a90000000-b8fffffff : dax0.0
a90000000-b8fffffff : System RAM (kmem)
b90000000-c8fffffff : CXL Window 1
b90000000-c8fffffff : region1
b90000000-c8fffffff : dax1.0
b90000000-c8fffffff : System RAM (kmem)
== Tearing down region 1
a90000000-b8fffffff : CXL Window 0
a90000000-b8fffffff : region0
a90000000-b8fffffff : dax0.0
a90000000-b8fffffff : System RAM (kmem)
b90000000-c8fffffff : CXL Window 1
== Recreate region 1
a90000000-b8fffffff : CXL Window 0
a90000000-b8fffffff : region0
a90000000-b8fffffff : dax0.0
a90000000-b8fffffff : System RAM (kmem)
b90000000-c8fffffff : CXL Window 1
b90000000-c8fffffff : region1
b90000000-c8fffffff : dax1.0
b90000000-c8fffffff : System RAM (kmem)
[3.1] QEMU: 2 CFMWS + Host-bridge + 2 CXL devices
Region does not span whole CXL Window - hmem should claim the whole space, but kmem failed with EBUSY
a90000000-ccfffffff : Soft Reserved
a90000000-bcfffffff : CXL Window 0
bd0000000-ccfffffff : CXL Window 1
kernel: [ 24.598310][ T543] cxl_acpi ACPI0017:00: decoder0.0 added to root0
kernel: [ 24.598645][ T543] cxl_acpi ACPI0017:00: decode range: node: 1 range [0xa90000000 - 0xbcfffffff]
kernel: [ 24.599673][ T543] cxl_acpi ACPI0017:00: decoder0.1 added to root0
kernel: [ 24.599939][ T543] cxl_acpi ACPI0017:00: decode range: node: 2 range [0xbd0000000 - 0xccfffffff]
kernel: [ 24.630549][ T543] cxl_acpi ACPI0017:00: root0: add: nvdimm-bridge0
kernel: [ 24.692068][ T70] cxl_pci 0000:0e:00.0: mem0:decoder2.0 no CXL window for range 0xb90000000:0xc8fffffff
kernel: [ 24.722976][ T69] cxl_region region0: config state: 0
kernel: [ 24.724446][ T69] cxl_acpi ACPI0017:00: decoder0.0: created region0
kernel: [ 24.725023][ T69] cxl_pci 0000:0d:00.0: mem1:decoder3.0: __construct_region region0 res: [mem 0xa90000000-0xb8fffffff flags 0x200] iw: 1 ig: 256
kernel: [ 24.727230][ T69] cxl_mem mem1: decoder:decoder3.0 parent:0000:0d:00.0 port:endpoint3 range:0xa90000000-0xb8fffffff pos:0
kernel: [ 24.728660][ T69] cxl region0: region sort successful
kernel: [ 24.729627][ T69] cxl region0: mem1:endpoint3 decoder3.0 add: mem1:decoder3.0 @ 0 next: none nr_eps:1 nr_targets: 1
kernel: [ 24.730566][ T69] cxl region0: pci0000:0c:port1 decoder1.0 add: mem1:decoder3.0 @ 0 next: mem1 nr_eps: 1 nr_targets: 1
kernel: [ 24.731445][ T69] cxl region0: pci0000:0c:port1 iw: 1 ig: 256
kernel: [ 24.731791][ T69] cxl region0: pci0000:0c:port1 target[0] = 0000:0c:00.0 for mem1:decoder3.0 @ 0
kernel: [ 24.807234][ T519] hmem_platform hmem_platform.0: deferring range to CXL: [mem 0xa90000000-0xccfffffff flags 0x80000200]
kernel: [ 24.903542][ T99] hmem_platform hmem_platform.0: registering CXL range: [mem 0xa90000000-0xccfffffff flags 0x80000200]
kernel: [ 25.043776][ T530] kmem dax2.0: mapping0: 0xa90000000-0xccfffffff could not reserve region
kernel: [ 25.044553][ T530] kmem dax2.0: probe with driver kmem failed with error -16
[4] Physical machine: 2 CFMWS + Host-bridge + 2 CXL devices
kernel: BIOS-e820: [mem 0x0000002070000000-0x000000a06fffffff] soft reserved
2070000000-606fffffff : CXL Window 0
2070000000-606fffffff : region0
2070000000-606fffffff : dax0.0
2070000000-606fffffff : System RAM (kmem)
6070000000-a06fffffff : CXL Window 1
6070000000-a06fffffff : region1
6070000000-a06fffffff : dax1.0
6070000000-a06fffffff : System RAM (kmem)
kernel: BIOS-e820: [mem 0x0000002070000000-0x000000a06fffffff] soft reserved
== region 1 teardown and unplug (the unplug was done via ubind/remove in /sys/bus/pci/devices)
2070000000-606fffffff : CXL Window 0
2070000000-606fffffff : region0
2070000000-606fffffff : dax0.0
2070000000-606fffffff : System RAM (kmem)
6070000000-a06fffffff : CXL Window 1
== plug - after PCI rescan cannot create hmem
6070000000-a06fffffff : CXL Window 1
6070000000-a06fffffff : region1
kernel: cxl_region region1: config state: 0
kernel: cxl_acpi ACPI0017:00: decoder0.1: created region1
kernel: cxl_pci 0000:04:00.0: mem1:decoder10.0: __construct_region region1 res: [mem 0x6070000000-0xa06fffffff flags 0x200] iw: 1 ig: 4096
kernel: cxl_mem mem1: decoder:decoder10.0 parent:0000:04:00.0 port:endpoint10 range:0x6070000000-0xa06fffffff pos:0
kernel: cxl region1: region sort successful
kernel: cxl region1: mem1:endpoint10 decoder10.0 add: mem1:decoder10.0 @ 0 next: none nr_eps: 1 nr_targets: 1
kernel: cxl region1: pci0000:00:port2 decoder2.1 add: mem1:decoder10.0 @ 0 next: mem1 nr_eps: 1 nr_targets: 1
kernel: cxl region1: pci0000:00:port2 cxl_port_setup_targets expected iw: 1 ig: 4096 [mem 0x6070000000-0xa06fffffff flags 0x200]
kernel: cxl region1: pci0000:00:port2 cxl_port_setup_targets got iw: 1 ig: 256 state: disabled 0x6070000000:0xa06fffffff
kernel: cxl_port endpoint10: failed to attach decoder10.0 to region1: -6
Thanks,
Tomasz
>>
>> Probe order patches of interest:
>> cxl/mem: refactor memdev allocation
>> cxl/mem: Arrange for always-synchronous memdev attach
>> cxl/port: Arrange for always synchronous endpoint attach
>>
>> [1] Hotplug looks okay. After offlining the memory I can tear down the
>> regions and recreate it back if CXL owns entire SR range as Soft Reserved
>> is gone. dax_cxl creates dax devices and onlines memory.
>> 850000000-284fffffff : CXL Window 0
>> 850000000-284fffffff : region0
>> 850000000-284fffffff : dax0.0
>> 850000000-284fffffff : System RAM (kmem)
>>
>> [2] With CONFIG_CXL_REGION disabled, all the resources are handled by
>> HMEM. Soft Reserved range shows up in /proc/iomem, no regions come up
>> and dax devices are created from HMEM.
>> 850000000-284fffffff : CXL Window 0
>> 850000000-284fffffff : Soft Reserved
>> 850000000-284fffffff : dax0.0
>> 850000000-284fffffff : System RAM (kmem)
>>
>> [3] Region assembly failures also behave okay and work same as [2].
>>
>> Before:
>> 2850000000-484fffffff : Soft Reserved
>> 2850000000-484fffffff : CXL Window 1
>> 2850000000-484fffffff : dax4.0
>> 2850000000-484fffffff : System RAM (kmem)
>>
>> After tearing down dax4.0 and creating it back:
>>
>> Logs:
>> [ 547.847764] unregister_dax_mapping: mapping0: unregister_dax_mapping
>> [ 547.855000] trim_dev_dax_range: dax dax4.0: delete range[0]: 0x2850000000:0x484fffffff
>> [ 622.474580] alloc_dev_dax_range: dax dax4.1: alloc range[0]: 0x0000002850000000:0x000000484fffffff
>> [ 752.766194] Fallback order for Node 0: 0 1
>> [ 752.766199] Fallback order for Node 1: 1 0
>> [ 752.766200] Built 2 zonelists, mobility grouping on. Total pages: 8096220
>> [ 752.783234] Policy zone: Normal
>> [ 752.808604] Demotion targets for Node 0: preferred: 1, fallback: 1
>> [ 752.815509] Demotion targets for Node 1: null
>>
>> After:
>> 2850000000-484fffffff : Soft Reserved
>> 2850000000-484fffffff : CXL Window 1
>> 2850000000-484fffffff : dax4.1
>> 2850000000-484fffffff : System RAM (kmem)
>>
>> [4] A small hack to tear down the fully assembled and probed region
>> (i.e region in committed state) for range 850000000-284fffffff.
>> This is to test the region teardown path for regions which don't
>> fully cover the Soft Reserved range.
>>
>> 850000000-284fffffff : Soft Reserved
>> 850000000-284fffffff : CXL Window 0
>> 850000000-284fffffff : dax5.0
>> 850000000-284fffffff : System RAM (kmem)
>> 2850000000-484fffffff : CXL Window 1
>> 2850000000-484fffffff : region1
>> 2850000000-484fffffff : dax1.0
>> 2850000000-484fffffff : System RAM (kmem)
>> .4850000000-684fffffff : CXL Window 2
>> 4850000000-684fffffff : region2
>> 4850000000-684fffffff : dax2.0
>> 4850000000-684fffffff : System RAM (kmem)
>>
>> daxctl list -R -u
>> [
>> {
>> "path":"\/platform\/ACPI0017:00\/root0\/decoder0.1\/region1\/dax_region1",
>> "id":1,
>> "size":"128.00 GiB (137.44 GB)",
>> "align":2097152
>> },
>> {
>> "path":"\/platform\/hmem.5",
>> "id":5,
>> "size":"128.00 GiB (137.44 GB)",
>> "align":2097152
>> },
>> {
>> "path":"\/platform\/ACPI0017:00\/root0\/decoder0.2\/region2\/dax_region2",
>> "id":2,
>> "size":"128.00 GiB (137.44 GB)",
>> "align":2097152
>> }
>> ]
>>
>> I couldn't test multiple regions under same Soft Reserved range
>> with/without contiguous mapping due to limiting BIOS support. Hopefully
>> that works.
>>
>> v4 updates:
>> - No changes patches 1-3.
>> - New patches 4-7.
>> - handle_deferred_cxl() has been enhanced to handle case where CXL
>> regions do not contiguously and fully cover Soft Reserved ranges.
>> - Support added to defer cxl_dax registration.
>> - Support added to teardown cxl regions.
>>
>> v3 updates:
>> - Fixed two "From".
>>
>> v2 updates:
>> - Removed conditional check on CONFIG_EFI_SOFT_RESERVE as dax_hmem
>> depends on CONFIG_EFI_SOFT_RESERVE. (Zhijian)
>> - Added TODO note. (Zhijian)
>> - Included region_intersects_soft_reserve() inside CONFIG_EFI_SOFT_RESERVE
>> conditional check. (Zhijian)
>> - insert_resource_late() -> insert_resource_expand_to_fit() and
>> __insert_resource_expand_to_fit() replacement. (Boris)
>> - Fixed Co-developed and Signed-off by. (Dan)
>> - Combined 2/6 and 3/6 into a single patch. (Zhijian).
>> - Skip local variable in remove_soft_reserved. (Jonathan)
>> - Drop kfree with __free(). (Jonathan)
>> - return 0 -> return dev_add_action_or_reset(host...) (Jonathan)
>> - Dropped 6/6.
>> - Reviewed-by tags (Dave, Jonathan)
>>
>> Dan Williams (4):
>> dax/hmem, e820, resource: Defer Soft Reserved insertion until hmem is
>> ready
>> dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved
>> ranges
>> dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
>> dax/hmem: Defer handling of Soft Reserved ranges that overlap CXL
>> windows
>>
>> Smita Koralahalli (5):
>> cxl/region, dax/hmem: Arbitrate Soft Reserved ownership with
>> cxl_regions_fully_map()
>> cxl/region: Add register_dax flag to control probe-time devdax setup
>> cxl/region, dax/hmem: Register devdax only when CXL owns Soft Reserved
>> span
>> cxl/region, dax/hmem: Tear down CXL regions when HMEM reclaims Soft
>> Reserved
>> dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree
>>
>> arch/x86/kernel/e820.c | 2 +-
>> drivers/cxl/acpi.c | 2 +-
>> drivers/cxl/core/region.c | 181 ++++++++++++++++++++++++++++++++++++--
>> drivers/cxl/cxl.h | 17 ++++
>> drivers/dax/Kconfig | 2 +
>> drivers/dax/hmem/device.c | 4 +-
>> drivers/dax/hmem/hmem.c | 137 ++++++++++++++++++++++++++---
>> include/linux/ioport.h | 13 ++-
>> kernel/resource.c | 92 ++++++++++++++++---
>> 9 files changed, 415 insertions(+), 35 deletions(-)
>>
>> base-commit: 211ddde0823f1442e4ad052a2f30f050145ccada
>> --
>> 2.17.1
>>
Powered by blists - more mailing lists