lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6705b4398f0d0_964f22949e@dwillia2-xfh.jf.intel.com.notmuch>
Date: Tue, 8 Oct 2024 15:37:45 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Gregory Price <gourry@...rry.net>, Dan Williams <dan.j.williams@...el.com>
CC: <linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<Jonathan.Cameron@...wei.com>, <dave.jiang@...el.com>,
	<alison.schofield@...el.com>, <vishal.l.verma@...el.com>,
	<ira.weiny@...el.com>, <rrichter@....com>, <terry.bowman@....com>,
	<dave@...olabs.net>
Subject: Re: [PATCH v2] cxl/core/port: defer endpoint probes when ACPI likely
 hasn't finished

Gregory Price wrote:
> On Fri, Oct 04, 2024 at 05:08:03PM -0700, Dan Williams wrote:
> > Gregory Price wrote:
> > > In cxl_acpi_probe, we add dports and uports to host bridges iteratively:
> > > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_dport);
> > > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_uport);
> > > 
> > > Simultaneously, as ports are probed, memdev endpoints can also be
> > > probed. This creates a race condition, where an endpoint can perceive
> > > its path to the root being broken in devm_cxl_enumerate_ports.
> > > 
> > > The memdev/endpoint probe will see a heirarchy that may look like:
> > >     mem1
> > >       parent => 0000:c1:00.0
> > >         parent => 0000:c0:01.1
> > > 	  parent->parent => NULL
> > > 
> > > This results in find_cxl_port() returning NULL (since the port hasn't
> > > been associated with the host bridge yet), and add_port_attach_ep
> > > fails because the grandparent's grandparent is NULL.
> > > 
> > > When the latter condition is detected, the comments suggest:
> > >     /*
> > >      * The iteration reached the topology root without finding the
> > >      * CXL-root 'cxl_port' on a previous iteration, fail for now to
> > >      * be re-probed after platform driver attaches.
> > >      */
> > > 
> > > This case results in an -ENXIO; however, a re-probe never occurs. Change
> > > this return condition to -EPROBE_DEFER to explicitly cause a reprobe.
> > 
> > Ok, thanks for the additional debug. Like we chatted on the CXL Discord
> > I think this is potentially pointing to a bug in bus_rescan_devices()
> > where it checks dev->driver without holding the lock.
> > 
> > Can you give this fix a try to see if it also resolves the issue?
> > Effectively, cxl_bus_rescan() is always needed in case the cxl_acpi
> > driver loads waaaay after deferred probing has given up, and if this
> > works then EPROBE_DEFER can remain limited to cases where it is
> > absolutely known that no other device_attach() kick is coming to save
> > the day.
> > 
> 
> Funny enough, not only did it not work, but now i get neither endpoint lol
> 
> $ ls /sys/bus/cxl/devices/
> decoder0.0  decoder1.0  decoder2.0  decoder3.1  mem0  port1  port3  root0
> decoder0.1  decoder1.1  decoder3.0  decoder4.0  mem1  port2  port4
> 
> w/ reprobe patch
> 
> # ls /sys/bus/cxl/devices
> decoder0.0  decoder1.0  decoder2.0  decoder3.1  decoder5.0  decoder6.0  endpoint5  mem0  port1  port3  root0
> decoder0.1  decoder1.1  decoder3.0  decoder4.0  decoder5.1  decoder6.1  endpoint6  mem1  port2  port4

Such a violent result is interesting! While I would have preferred an
"all fixed!" version of "interesting", making something fail reliably and
completely is at least indication that the starting point was more
fragile than expected.

Now, I tried to get cxl_test to fail by probing memdevs asynchronously
alongside the ACPI root driver. That did reveal a use-after-free bug
when out-of-order shutdown causes some cleanup to be skipped, will send
a separate fixup for that, but it failed to reproduce the bug you are
seeing.

The incremental fix here, that applies on top of the device_attach()
fixup, is to make sure that all cxl_port instances registered by
cxl_acpi_probe() are active before cxl_bus_rescan() runs. That can only
be guaranteed when the cxl_port driver is pre-loaded.

If you are running from an initial ram disk make sure cxl_port.ko is
included there...

-- 8< --
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 82b78e331d8e..432b7cfd12a8 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -924,6 +924,13 @@ static void __exit cxl_acpi_exit(void)
 
 /* load before dax_hmem sees 'Soft Reserved' CXL ranges */
 subsys_initcall(cxl_acpi_init);
+
+/*
+ * Arrange for host-bridge ports to be active synchronous with
+ * cxl_acpi_probe() exit.
+ */
+MODULE_SOFTDEP("pre: cxl_port");
+
 module_exit(cxl_acpi_exit);
 MODULE_DESCRIPTION("CXL ACPI: Platform Support");
 MODULE_LICENSE("GPL v2");

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ