lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251204022136.2573521-1-dan.j.williams@intel.com>
Date: Wed,  3 Dec 2025 18:21:30 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: dave.jiang@...el.com
Cc: linux-cxl@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Smita.KoralahalliChannabasappa@....com,
	alison.schofield@...el.com,
	terry.bowman@....com,
	alejandro.lucero-palau@....com,
	linux-pci@...r.kernel.org,
	Jonathan.Cameron@...wei.com,
	Alejandro Lucero <alucerop@....com>,
	Shiju Jose <shiju.jose@...wei.com>
Subject: [PATCH 0/6] cxl: Initialization reworks in support Soft Reserve Recovery and Accelerator Memory

The CXL subsystem is modular. That modularity is a benefit for
separation of concerns and testing. It is generally appropriate for this
class of devices that support hotplug and can dynamically add a CXL
personality alongside their PCI personality. However, a cost of modules
is ambiguity about when devices (cxl_memdevs, cxl_ports, cxl_regions)
have had a chance to attach to their corresponding drivers on
@cxl_bus_type.

This problem of not being able to reliably determine when a device has
had a chance to attach to its driver vs still waiting for the module to
load, is a common problem for the "Soft Reserve Recovery" [1], and
"Accelerator Memory" [2] enabling efforts.

For "Soft Reserve Recovery" it wants to use wait_for_device_probe() as a
sync point for when CXL devices present at boot have had a chance to
attach to the cxl_pci driver (generic CXL memory expansion class
driver). That breaks down if wait_for_device_probe() only flushes PCI
device probe, but not the cxl_mem_probe() of the cxl_memdev that
cxl_pci_probe() creates.

For "Accelerator Memory", the driver is not cxl_pci, but any potential
PCI driver that wants to use the devm_cxl_add_memdev() ABI to attach to
the CXL memory domain. Those drivers want to know if the CXL link is
live end-to-end (from endpoint, through switches, to the host bridge)
and CXL memory operations are enabled. If not, a CXL accelerator may be
able to fall back to PCI-only operation. Similar to the "Soft Reserve
Memory" it needs to know that the CXL subsystem had a chance to probe
the ancestor topology of the device and let that driver make a
synchronous decision about CXL operation.

In support of those efforts:

* Clean up some resource lifetime issues in the current code
* Move some object creation symbols (devm_cxl_add_memdev() and
  devm_cxl_add_endpoint()) into the cxl_mem.ko and cxl_port.ko objects.
  Implicitly guarantee that cxl_mem_driver and cxl_port_driver have been
  registered prior to any device objects being registered. This is
  preferred over explicit open-coded request_module().
* Use scoped-based-cleanup before adding more resource management in
  devm_cxl_add_memdev()
* Give an accelerator the opportunity to run setup operations in
  cxl_mem_probe() so it can further probe if the CXL configuration matches
  its needs.

Some of these previously appeared on a branch as an RFC [3] and left
"Soft Reserve Recovery" and "Accelerator Memory" to jockey for ordering.
Instead, create a shared topic branch for both of those efforts to
import. The main changes since that RFC are fixing a bug and reducing
the amount of refactoring (which contributed to hiding the bug).

[1]: http://lore.kernel.org/20251120031925.87762-1-Smita.KoralahalliChannabasappa@amd.com
[2]: http://lore.kernel.org/20251119192236.2527305-1-alejandro.lucero-palau@amd.com
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.18/cxl-probe-order

Dan Williams (6):
  cxl/mem: Fix devm_cxl_memdev_edac_release() confusion
  cxl/mem: Arrange for always-synchronous memdev attach
  cxl/port: Arrange for always synchronous endpoint attach
  cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup
  cxl/mem: Drop @host argument to devm_cxl_add_memdev()
  cxl/mem: Introduce a memdev creation ->probe() operation

 drivers/cxl/Kconfig          |   2 +-
 drivers/cxl/cxl.h            |   2 +
 drivers/cxl/cxlmem.h         |  17 ++++--
 drivers/cxl/core/edac.c      |  64 ++++++++++++---------
 drivers/cxl/core/memdev.c    | 104 ++++++++++++++++++++++++-----------
 drivers/cxl/mem.c            |  69 +++++++++--------------
 drivers/cxl/pci.c            |   2 +-
 drivers/cxl/port.c           |  40 ++++++++++++++
 tools/testing/cxl/test/mem.c |   2 +-
 9 files changed, 192 insertions(+), 110 deletions(-)


base-commit: ea5514e300568cbe8f19431c3e424d4791db8291
-- 
2.51.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ