[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260201093002.1281858-1-ming.li@zohomail.com>
Date: Sun, 1 Feb 2026 17:30:00 +0800
From: Li Ming <ming.li@...omail.com>
To: dave@...olabs.net,
jonathan.cameron@...wei.com,
dave.jiang@...el.com,
alison.schofield@...el.com,
vishal.l.verma@...el.com,
ira.weiny@...el.com,
dan.j.williams@...el.com
Cc: linux-cxl@...r.kernel.org,
linux-kernel@...r.kernel.org,
Li Ming <ming.li@...omail.com>
Subject: [PATCH 0/2] Fix port enumeration failure and NULL endpoint issue
I ran CXL mock testing with next branch, I usually hit the following
call trace.
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000092: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000490-0x0000000000000497]
CPU: 3 UID: 0 PID: 42 Comm: kworker/u16:1 Tainted: G O J 6.19.0-rc5-cxl+ #4 PREEMPT(voluntary)
Tainted: [O]=OOT_MODULE, [J]=FWCTL
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
Workqueue: async async_run_entry_fn
RIP: 0010:cxl_dpa_to_region+0x105/0x1f0 [cxl_core]
Call Trace:
<TASK>
cxl_event_trace_record+0xd1/0xa70 [cxl_core]
__cxl_event_trace_record+0x12f/0x1e0 [cxl_core]
cxl_mem_get_records_log+0x261/0x500 [cxl_core]
cxl_mem_get_event_records+0x7c/0xc0 [cxl_core]
cxl_mock_mem_probe+0xd38/0x1c60 [cxl_mock_mem]
platform_probe+0x9d/0x130
really_probe+0x1c8/0x960
driver_probe_device+0x45/0x120
__device_attach_driver+0x15d/0x280
bus_for_each_drv+0x100/0x180
__device_attach_async_helper+0x199/0x250
async_run_entry_fn+0x95/0x430
process_one_work+0x7db/0x1940
After detailed debugging, I identified two independent issues that
together leads to the problem.
Issue 1:
cxlmd->endpoint is initialized to ERR_PTR(-ENXIO) during cxlmd creation,
but cxl subsystem usually checks endpoint availability by checking
whether it is NULL. As a result, if endpoint port creation fails, some
code paths may incorrectly treat the endpoint as available. In the
call trace above, endpoint port creation fails but cxl_dpa_to_region()
still considers that is available.
Patch #1 is used to fix it, the solution is initializing cxlmd->endpoint
to NULL by default.
Issue 2:
The second issue is why CXL port enumeration could be failure. What I
observed is when two memdev were trying to enumerate a same port, the
first memdev was responsible for port creation and attaching. However,
there is a small window between the point where the new port becomes
visible(after being added to the device list of cxl bus) and when it is
bound to the port driver. During this window, the second memdev may
discover the port and acquire its lock while attempting to add its
dport, which blocks bus_probe_device() inside device_add(). As a result,
the second memdev observes the port as unbound and fails to add its
dport.
Patch #2 fixes this race by holding the grandparent port lock during
dport addition, preventing premature access before driver binding
completed.
base-commit: 63050be0bfe0b280cce5d701b31940fd84858609 cxl/next
Li Ming (2):
cxl/core: Set cxlmd->endpoint to NULL by default
cxl/core: Hold grandparent port lock for dport adding.
drivers/cxl/core/memdev.c | 2 +-
drivers/cxl/core/port.c | 6 +++++-
2 files changed, 6 insertions(+), 2 deletions(-)
--
2.43.0
Powered by blists - more mailing lists