[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64c4562fb7531_a88b29481@dwillia2-xfh.jf.intel.com.notmuch>
Date: Fri, 28 Jul 2023 16:58:39 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Ira Weiny <ira.weiny@...el.com>,
Alison Schofield <alison.schofield@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>,
"Dan Williams" <dan.j.williams@...el.com>,
Dave Jiang <dave.jiang@...el.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>
CC: <linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
Ira Weiny <ira.weiny@...el.com>
Subject: RE: [PATCH] cxl/memdev: Avoid mailbox functionality on device memory
CXL devices
Ira Weiny wrote:
> Using the proposed type-2 cxl-test device[1] the following
> splat was observed:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000278
> [...]
> RIP: 0010:devm_cxl_add_memdev+0x1de/0x2c0 [cxl_core]
It would be useful to decode this to a line number, the rest of this
call trace is not adding much.
> [...]
> Call Trace:
> <TASK>
> ? __die+0x1f/0x70
> ? page_fault_oops+0x149/0x420
> ? fixup_exception+0x22/0x310
> ? kernelmode_fixup_or_oops+0x84/0x110
> ? exc_page_fault+0x6d/0x150
> ? asm_exc_page_fault+0x22/0x30
> ? devm_cxl_add_memdev+0x1de/0x2c0 [cxl_core]
> cxl_mock_mem_probe+0x632/0x870 [cxl_mock_mem]
> platform_probe+0x40/0x90
> really_probe+0x19e/0x3e0
> ? __pfx___driver_attach+0x10/0x10
> __driver_probe_device+0x78/0x160
> driver_probe_device+0x1f/0x90
> __driver_attach+0xce/0x1c0
> bus_for_each_dev+0x63/0xa0
> bus_add_driver+0x112/0x210
> driver_register+0x55/0x100
> ? __pfx_cxl_mock_mem_driver_init+0x10/0x10 [cxl_mock_mem]
> [...]
>
> Commit f6b8ab32e3ec made the mailbox functionality optional. However,
> some mailbox functionality was merged after that patch. Therefore some
> mailbox functionality can be accessed on a device which did not set up
> the mailbox.
cxl_memdev_security_init() definitely needs to move out of
devm_cxl_add_memdev() and after that I do not think @mds NULL checks
need to be sprinkled everywhere. In other words something is wrong at a
higher level if we get into some of these helper functions without the
memory device state.
So definitely this uncovered a problem where cxl_memdev_security_init()
needs to move, but the rest of the mds NULL checks need clear
reproduction scenarios and expect most of them are precluded higher in
the call stack.
Powered by blists - more mailing lists