lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <653b3299c1a33_244c8f29449@dwillia2-xfh.jf.intel.com.notmuch>
Date:   Thu, 26 Oct 2023 20:46:33 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Robert Richter <rrichter@....com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Jonathan Cameron <jonathan.cameron@...wei.com>,
        Dave Jiang <dave.jiang@...el.com>,
        Alison Schofield <alison.schofield@...el.com>,
        Vishal Verma <vishal.l.verma@...el.com>,
        Ira Weiny <ira.weiny@...el.com>,
        Ben Widawsky <bwidawsk@...nel.org>,
        Dan Williams <dan.j.williams@...el.com>
CC:     <linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Terry Bowman <terry.bowman@....com>,
        Jonathan Cameron <Jonathan.Cameron@...wei.com>
Subject: RE: [PATCH v12 01/20] cxl/port: Fix release of RCD endpoints

Robert Richter wrote:
> Binding and unbindind RCD endpoints (e.g. mem0 device) caused the
> corresponding endpoint not being released. Reason for that is the
> wrong port discovered for RCD endpoints. See cxl_mem_probe() for
> proper endpoint parent detection. Fix delete_endpoint() respectively.
> 
> Fixes: 0a19bfc8de93 ("cxl/port: Add RCD endpoint port enumeration")
> Signed-off-by: Robert Richter <rrichter@....com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>

This patch causes cxl-topology.sh to crash with use-after-free
signatures. I notice that this path should just be using
endpoint->dev.parent and is missing reference counting on the parent.

I will replace this path with the following:

-- >8 --
Subject: cxl/port: Fix delete_endpoint() vs parent unregistration race

From: Dan Williams <dan.j.williams@...el.com>

The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of
ports (struct cxl_port objects) between an endpoint and the root of a
CXL topology. Each port including the endpoint port is attached to the
cxl_port driver.

Given that setup it follows that when either any port in that lineage
goes through a cxl_port ->remove() event, or the memdev goes through a
cxl_mem ->remove() event. The hierarchy below the removed port, or the
entire hierarchy if the memdev is removed needs to come down.

The delete_endpoint() callback is careful to check whether it is being
called to tear down the hierarchy, or if it is only being called to
teardown the memdev because an ancestor port is going through
->remove().

That care needs to take the device_lock() of the endpoint's parent.
Which requires 2 bugs to be fixed:

1/ A reference on the parent is needed to prevent use-after-free
   scenarios like this signature:

    BUG: spinlock bad magic on CPU#0, kworker/u56:0/11
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
    Workqueue: cxl_port detach_memdev [cxl_core]
    RIP: 0010:spin_bug+0x65/0xa0
    Call Trace:
      do_raw_spin_lock+0x69/0xa0
     __mutex_lock+0x695/0xb80
     delete_endpoint+0xad/0x150 [cxl_core]
     devres_release_all+0xb8/0x110
     device_unbind_cleanup+0xe/0x70
     device_release_driver_internal+0x1d2/0x210
     detach_memdev+0x15/0x20 [cxl_core]
     process_one_work+0x1e3/0x4c0
     worker_thread+0x1dd/0x3d0

2/ In the case of RCH topologies, the parent device that needs to be
   locked is not always @port->dev as returned by cxl_mem_find_port(), use
   endpoint->dev.parent instead.

Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
Cc: <stable@...r.kernel.org>
Reported-by: Robert Richter <rrichter@....com>
Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@...el.com>
---
 drivers/cxl/core/port.c |   13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 5ba606c6e03f..6bbaca5ac60d 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1227,13 +1227,7 @@ static void delete_endpoint(void *data)
 {
 	struct cxl_memdev *cxlmd = data;
 	struct cxl_port *endpoint = cxlmd->endpoint;
-	struct cxl_port *parent_port;
-	struct device *parent;
-
-	parent_port = cxl_mem_find_port(cxlmd, NULL);
-	if (!parent_port)
-		goto out;
-	parent = &parent_port->dev;
+	struct device *parent = endpoint->dev.parent;
 
 	device_lock(parent);
 	if (parent->driver && !endpoint->dead) {
@@ -1243,15 +1237,16 @@ static void delete_endpoint(void *data)
 	}
 	cxlmd->endpoint = NULL;
 	device_unlock(parent);
-	put_device(parent);
-out:
 	put_device(&endpoint->dev);
+	put_device(parent);
 }
 
 int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
 {
+	struct device *parent = endpoint->dev.parent;
 	struct device *dev = &cxlmd->dev;
 
+	get_device(parent);
 	get_device(&endpoint->dev);
 	cxlmd->endpoint = endpoint;
 	cxlmd->depth = endpoint->depth;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ