lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260201093002.1281858-1-ming.li@zohomail.com>
Date: Sun,  1 Feb 2026 17:30:00 +0800
From: Li Ming <ming.li@...omail.com>
To: dave@...olabs.net,
	jonathan.cameron@...wei.com,
	dave.jiang@...el.com,
	alison.schofield@...el.com,
	vishal.l.verma@...el.com,
	ira.weiny@...el.com,
	dan.j.williams@...el.com
Cc: linux-cxl@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Li Ming <ming.li@...omail.com>
Subject: [PATCH 0/2] Fix port enumeration failure and NULL endpoint issue

I ran CXL mock testing with next branch, I usually hit the following
call trace.

 Oops: general protection fault, probably for non-canonical address 0xdffffc0000000092: 0000 [#1] SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000490-0x0000000000000497]
 CPU: 3 UID: 0 PID: 42 Comm: kworker/u16:1 Tainted: G           O      J 6.19.0-rc5-cxl+ #4 PREEMPT(voluntary) 
 Tainted: [O]=OOT_MODULE, [J]=FWCTL
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
 Workqueue: async async_run_entry_fn
 RIP: 0010:cxl_dpa_to_region+0x105/0x1f0 [cxl_core]
 Call Trace:
  <TASK>
  cxl_event_trace_record+0xd1/0xa70 [cxl_core]
  __cxl_event_trace_record+0x12f/0x1e0 [cxl_core]
  cxl_mem_get_records_log+0x261/0x500 [cxl_core]
  cxl_mem_get_event_records+0x7c/0xc0 [cxl_core]
  cxl_mock_mem_probe+0xd38/0x1c60 [cxl_mock_mem]
  platform_probe+0x9d/0x130
  really_probe+0x1c8/0x960
  driver_probe_device+0x45/0x120
  __device_attach_driver+0x15d/0x280
  bus_for_each_drv+0x100/0x180
  __device_attach_async_helper+0x199/0x250
  async_run_entry_fn+0x95/0x430
  process_one_work+0x7db/0x1940

After detailed debugging, I identified two independent issues that
together leads to the problem.

Issue 1:
cxlmd->endpoint is initialized to ERR_PTR(-ENXIO) during cxlmd creation,
but cxl subsystem usually checks endpoint availability by checking
whether it is NULL. As a result, if endpoint port creation fails, some
code paths may incorrectly treat the endpoint as available. In the
call trace above, endpoint port creation fails but cxl_dpa_to_region()
still considers that is available.
Patch #1 is used to fix it, the solution is initializing cxlmd->endpoint
to NULL by default.

Issue 2:
The second issue is why CXL port enumeration could be failure. What I
observed is when two memdev were trying to enumerate a same port, the
first memdev was responsible for port creation and attaching. However,
there is a small window between the point where the new port becomes
visible(after being added to the device list of cxl bus) and when it is
bound to the port driver. During this window, the second memdev may
discover the port and acquire its lock while attempting to add its
dport, which blocks bus_probe_device() inside device_add(). As a result,
the second memdev observes the port as unbound and fails to add its
dport.
Patch #2 fixes this race by holding the grandparent port lock during
dport addition, preventing premature access before driver binding
completed.

base-commit: 63050be0bfe0b280cce5d701b31940fd84858609 cxl/next

Li Ming (2):
  cxl/core: Set cxlmd->endpoint to NULL by default
  cxl/core: Hold grandparent port lock for dport adding.

 drivers/cxl/core/memdev.c | 2 +-
 drivers/cxl/core/port.c   | 6 +++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ