lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c7878afc-237f-4bfc-bfc9-aaded2e2420e@amd.com>
Date: Mon, 8 Dec 2025 17:04:15 +0000
From: Alejandro Lucero Palau <alucerop@....com>
To: Dan Williams <dan.j.williams@...el.com>, dave.jiang@...el.com
Cc: linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
 Smita.KoralahalliChannabasappa@....com, alison.schofield@...el.com,
 terry.bowman@....com, alejandro.lucero-palau@....com,
 linux-pci@...r.kernel.org, Jonathan.Cameron@...wei.com,
 Shiju Jose <shiju.jose@...wei.com>
Subject: Re: [PATCH 0/6] cxl: Initialization reworks in support Soft Reserve
 Recovery and Accelerator Memory


On 12/4/25 02:21, Dan Williams wrote:
> The CXL subsystem is modular. That modularity is a benefit for
> separation of concerns and testing. It is generally appropriate for this
> class of devices that support hotplug and can dynamically add a CXL
> personality alongside their PCI personality. However, a cost of modules
> is ambiguity about when devices (cxl_memdevs, cxl_ports, cxl_regions)
> have had a chance to attach to their corresponding drivers on
> @cxl_bus_type.
>
> This problem of not being able to reliably determine when a device has
> had a chance to attach to its driver vs still waiting for the module to
> load, is a common problem for the "Soft Reserve Recovery" [1], and
> "Accelerator Memory" [2] enabling efforts.
>
> For "Soft Reserve Recovery" it wants to use wait_for_device_probe() as a
> sync point for when CXL devices present at boot have had a chance to
> attach to the cxl_pci driver (generic CXL memory expansion class
> driver). That breaks down if wait_for_device_probe() only flushes PCI
> device probe, but not the cxl_mem_probe() of the cxl_memdev that
> cxl_pci_probe() creates.
>
> For "Accelerator Memory", the driver is not cxl_pci, but any potential
> PCI driver that wants to use the devm_cxl_add_memdev() ABI to attach to
> the CXL memory domain. Those drivers want to know if the CXL link is
> live end-to-end (from endpoint, through switches, to the host bridge)
> and CXL memory operations are enabled. If not, a CXL accelerator may be
> able to fall back to PCI-only operation. Similar to the "Soft Reserve
> Memory" it needs to know that the CXL subsystem had a chance to probe
> the ancestor topology of the device and let that driver make a
> synchronous decision about CXL operation.
>
> In support of those efforts:
>
> * Clean up some resource lifetime issues in the current code
> * Move some object creation symbols (devm_cxl_add_memdev() and
>    devm_cxl_add_endpoint()) into the cxl_mem.ko and cxl_port.ko objects.
>    Implicitly guarantee that cxl_mem_driver and cxl_port_driver have been
>    registered prior to any device objects being registered. This is
>    preferred over explicit open-coded request_module().
> * Use scoped-based-cleanup before adding more resource management in
>    devm_cxl_add_memdev()
> * Give an accelerator the opportunity to run setup operations in
>    cxl_mem_probe() so it can further probe if the CXL configuration matches
>    its needs.
>
> Some of these previously appeared on a branch as an RFC [3] and left
> "Soft Reserve Recovery" and "Accelerator Memory" to jockey for ordering.
> Instead, create a shared topic branch for both of those efforts to
> import. The main changes since that RFC are fixing a bug and reducing
> the amount of refactoring (which contributed to hiding the bug).
>
> [1]: http://lore.kernel.org/20251120031925.87762-1-Smita.KoralahalliChannabasappa@amd.com
> [2]: http://lore.kernel.org/20251119192236.2527305-1-alejandro.lucero-palau@amd.com
> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.18/cxl-probe-order
>
> Dan Williams (6):
>    cxl/mem: Fix devm_cxl_memdev_edac_release() confusion
>    cxl/mem: Arrange for always-synchronous memdev attach
>    cxl/port: Arrange for always synchronous endpoint attach
>    cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup
>    cxl/mem: Drop @host argument to devm_cxl_add_memdev()
>    cxl/mem: Introduce a memdev creation ->probe() operation
>
>   drivers/cxl/Kconfig          |   2 +-
>   drivers/cxl/cxl.h            |   2 +
>   drivers/cxl/cxlmem.h         |  17 ++++--
>   drivers/cxl/core/edac.c      |  64 ++++++++++++---------
>   drivers/cxl/core/memdev.c    | 104 ++++++++++++++++++++++++-----------
>   drivers/cxl/mem.c            |  69 +++++++++--------------
>   drivers/cxl/pci.c            |   2 +-
>   drivers/cxl/port.c           |  40 ++++++++++++++
>   tools/testing/cxl/test/mem.c |   2 +-
>   9 files changed, 192 insertions(+), 110 deletions(-)
>
>
> base-commit: ea5514e300568cbe8f19431c3e424d4791db8291


I have tested this series with Type2 and it works as expected:


1) type2 fails to be installed if cxl_mem not loaded.

2) cxl_mem, cxl_port and cxl_core not removable as type2 driver 
installed depends on them.

3) cxl_acpi is still possible to remove ...


FWIW:


Tested-by: Alejandro Lucero <alucerop@....com>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ