linux-kernel - Re: [PATCH v2] cxl/core/port: defer endpoint probes when ACPI likely hasn't finished

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZwXV1gPqO6RkrAW5@PC2K9PVX.TheFacebook.com>
Date: Tue, 8 Oct 2024 21:01:10 -0400
From: Gregory Price <gourry@...rry.net>
To: Dan Williams <dan.j.williams@...el.com>
Cc: linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
	Jonathan.Cameron@...wei.com, dave.jiang@...el.com,
	alison.schofield@...el.com, vishal.l.verma@...el.com,
	ira.weiny@...el.com, rrichter@....com, terry.bowman@....com,
	dave@...olabs.net
Subject: Re: [PATCH v2] cxl/core/port: defer endpoint probes when ACPI likely
 hasn't finished

On Tue, Oct 08, 2024 at 03:37:45PM -0700, Dan Williams wrote:
> Gregory Price wrote:
> > On Fri, Oct 04, 2024 at 05:08:03PM -0700, Dan Williams wrote:
> > > Gregory Price wrote:
> > > > In cxl_acpi_probe, we add dports and uports to host bridges iteratively:
> > > > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_dport);
> > > > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_uport);
> > > > 
> > > > Simultaneously, as ports are probed, memdev endpoints can also be
> > > > probed. This creates a race condition, where an endpoint can perceive
> > > > its path to the root being broken in devm_cxl_enumerate_ports.
> > > > 
> > > > The memdev/endpoint probe will see a heirarchy that may look like:
> > > >     mem1
> > > >       parent => 0000:c1:00.0
> > > >         parent => 0000:c0:01.1
> > > > 	  parent->parent => NULL
> > > > 
> > > > This results in find_cxl_port() returning NULL (since the port hasn't
> > > > been associated with the host bridge yet), and add_port_attach_ep
> > > > fails because the grandparent's grandparent is NULL.
> > > > 
> > > > When the latter condition is detected, the comments suggest:
> > > >     /*
> > > >      * The iteration reached the topology root without finding the
> > > >      * CXL-root 'cxl_port' on a previous iteration, fail for now to
> > > >      * be re-probed after platform driver attaches.
> > > >      */
> > > > 
> > > > This case results in an -ENXIO; however, a re-probe never occurs. Change
> > > > this return condition to -EPROBE_DEFER to explicitly cause a reprobe.
> > > 
> > > Ok, thanks for the additional debug. Like we chatted on the CXL Discord
> > > I think this is potentially pointing to a bug in bus_rescan_devices()
> > > where it checks dev->driver without holding the lock.
> > > 
> > > Can you give this fix a try to see if it also resolves the issue?
> > > Effectively, cxl_bus_rescan() is always needed in case the cxl_acpi
> > > driver loads waaaay after deferred probing has given up, and if this
> > > works then EPROBE_DEFER can remain limited to cases where it is
> > > absolutely known that no other device_attach() kick is coming to save
> > > the day.
> > > 
> > 
> > Funny enough, not only did it not work, but now i get neither endpoint lol
> > 
> > $ ls /sys/bus/cxl/devices/
> > decoder0.0  decoder1.0  decoder2.0  decoder3.1  mem0  port1  port3  root0
> > decoder0.1  decoder1.1  decoder3.0  decoder4.0  mem1  port2  port4
> > 
> > w/ reprobe patch
> > 
> > # ls /sys/bus/cxl/devices
> > decoder0.0  decoder1.0  decoder2.0  decoder3.1  decoder5.0  decoder6.0  endpoint5  mem0  port1  port3  root0
> > decoder0.1  decoder1.1  decoder3.0  decoder4.0  decoder5.1  decoder6.1  endpoint6  mem1  port2  port4
> 
> Such a violent result is interesting! While I would have preferred an
> "all fixed!" version of "interesting", making something fail reliably and
> completely is at least indication that the starting point was more
> fragile than expected.
> 
> Now, I tried to get cxl_test to fail by probing memdevs asynchronously
> alongside the ACPI root driver. That did reveal a use-after-free bug
> when out-of-order shutdown causes some cleanup to be skipped, will send
> a separate fixup for that, but it failed to reproduce the bug you are
> seeing.
> 
> The incremental fix here, that applies on top of the device_attach()
> fixup, is to make sure that all cxl_port instances registered by
> cxl_acpi_probe() are active before cxl_bus_rescan() runs. That can only
> be guaranteed when the cxl_port driver is pre-loaded.
> 

:< unfortunately the result is... the same but also different? 

the worst kind of result

# ls /sys/bus/cxl/devices/
decoder0.0  decoder0.2  decoder2.0  decoder3.1  mem0  port1  port3  root0
decoder0.1  decoder1.0  decoder3.0  decoder4.0  mem1  port2  port4

apparently sometimes we get decoder0.2 and sometimes we get decoder1.1

after a reboot we get it the other way

# ls /sys/bus/cxl/devices/
decoder0.0  decoder0.2  decoder2.0  decoder3.1  mem0  port1  port3  root0
decoder0.1  decoder1.0  decoder3.0  decoder4.0  mem1  port2  port4

in my experimental kernel where everything "works" I get something like

# ls /sys/bus/cxl/devices/
dax_region0  decoder0.0  decoder1.0  decoder3.1  decoder5.1  endpoint5  mem1   port3    region1
dax_region1  decoder0.1  decoder2.0  decoder4.0  decoder6.0  endpoint6  port1  port4    region2
dax_region2  decoder0.2  decoder3.0  decoder5.0  decoder6.1  mem0       port2  region0  root0

which does not have decoder1.1 - but i haven't confirmed whether this is
consistent or not.  I don't know what causes 0.N vs 1.M, maybe you do?

but in the first result (past email) you can see the reprobe patch also generated
decoder1.1 instead of decoder0.2

probably not related, more fun side quests!

Regardless, the additional patch did not resolve the problem :<

> If you are running from an initial ram disk make sure cxl_port.ko is
> included there...
> 
> -- 8< --
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 82b78e331d8e..432b7cfd12a8 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -924,6 +924,13 @@ static void __exit cxl_acpi_exit(void)
>  
>  /* load before dax_hmem sees 'Soft Reserved' CXL ranges */
>  subsys_initcall(cxl_acpi_init);
> +
> +/*
> + * Arrange for host-bridge ports to be active synchronous with
> + * cxl_acpi_probe() exit.
> + */
> +MODULE_SOFTDEP("pre: cxl_port");
> +
>  module_exit(cxl_acpi_exit);
>  MODULE_DESCRIPTION("CXL ACPI: Platform Support");
>  MODULE_LICENSE("GPL v2");