[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAa2Zx86yUfayPSG@google.com>
Date: Mon, 21 Apr 2025 14:19:35 -0700
From: William McVicker <willmcvicker@...gle.com>
To: Robin Murphy <robin.murphy@....com>
Cc: Lorenzo Pieralisi <lpieralisi@...nel.org>,
Hanjun Guo <guohanjun@...wei.com>,
Sudeep Holla <sudeep.holla@....com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Len Brown <lenb@...nel.org>, Russell King <linux@...linux.org.uk>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Danilo Krummrich <dakr@...nel.org>,
Stuart Yoder <stuyoder@...il.com>,
Laurentiu Tudor <laurentiu.tudor@....com>,
Nipun Gupta <nipun.gupta@....com>,
Nikhil Agarwal <nikhil.agarwal@....com>,
Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
Rob Herring <robh@...nel.org>,
Saravana Kannan <saravanak@...gle.com>,
Bjorn Helgaas <bhelgaas@...gle.com>, linux-acpi@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
iommu@...ts.linux.dev, devicetree@...r.kernel.org,
linux-pci@...r.kernel.org,
Charan Teja Kalla <quic_charante@...cinc.com>
Subject: Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe
path
Hi Robin,
On 02/28/2025, Robin Murphy wrote:
> In hindsight, there were some crucial subtleties overlooked when moving
> {of,acpi}_dma_configure() to driver probe time to allow waiting for
> IOMMU drivers with -EPROBE_DEFER, and these have become an
> ever-increasing source of problems. The IOMMU API has some fundamental
> assumptions that iommu_probe_device() is called for every device added
> to the system, in the order in which they are added. Calling it in a
> random order or not at all dependent on driver binding leads to
> malformed groups, a potential lack of isolation for devices with no
> driver, and all manner of unexpected concurrency and race conditions.
> We've attempted to mitigate the latter with point-fix bodges like
> iommu_probe_device_lock, but it's a losing battle and the time has come
> to bite the bullet and address the true source of the problem instead.
>
> The crux of the matter is that the firmware parsing actually serves two
> distinct purposes; one is identifying the IOMMU instance associated with
> a device so we can check its availability, the second is actually
> telling that instance about the relevant firmware-provided data for the
> device. However the latter also depends on the former, and at the time
> there was no good place to defer and retry that separately from the
> availability check we also wanted for client driver probe.
>
> Nowadays, though, we have a proper notion of multiple IOMMU instances in
> the core API itself, and each one gets a chance to probe its own devices
> upon registration, so we can finally make that work as intended for
> DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to
> be able to run the iommu_fwspec machinery currently buried deep in the
> wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be
> surprisingly straightforward to bootstrap this transformation by pretty
> much just calling the same path twice. At client driver probe time,
> dev->driver is obviously set; conversely at device_add(), or a
> subsequent bus_iommu_probe(), any device waiting for an IOMMU really
> should *not* have a driver already, so we can use that as a condition to
> disambiguate the two cases, and avoid recursing back into the IOMMU core
> at the wrong times.
>
> Obviously this isn't the nicest thing, but for now it gives us a
> functional baseline to then unpick the layers in between without many
> more awkward cross-subsystem patches. There are some minor side-effects
> like dma_range_map potentially being created earlier, and some debug
> prints being repeated, but these aren't significantly detrimental. Let's
> make things work first, then deal with making them nice.
>
> With the basic flow finally in the right order again, the next step is
> probably turning the bus->dma_configure paths inside-out, since all we
> really need from bus code is its notion of which device and input ID(s)
> to parse the common firmware properties with...
>
> Acked-by: Bjorn Helgaas <bhelgaas@...gle.com> # pci-driver.c
> Acked-by: Rob Herring (Arm) <robh@...nel.org> # of/device.c
> Signed-off-by: Robin Murphy <robin.murphy@....com>
> ---
>
> v2:
> - Comment bus driver changes for clarity
> - Use dev->iommu as the now-robust replay condition
> - Drop the device_iommu_mapped() checks in the firmware paths as they
> weren't doing much - we can't replace probe_device_lock just yet...
>
> drivers/acpi/arm64/dma.c | 5 +++++
> drivers/acpi/scan.c | 7 -------
> drivers/amba/bus.c | 3 ++-
> drivers/base/platform.c | 3 ++-
> drivers/bus/fsl-mc/fsl-mc-bus.c | 3 ++-
> drivers/cdx/cdx.c | 3 ++-
> drivers/iommu/iommu.c | 24 +++++++++++++++++++++---
> drivers/iommu/of_iommu.c | 7 ++++++-
> drivers/of/device.c | 7 ++++++-
> drivers/pci/pci-driver.c | 3 ++-
> 10 files changed, 48 insertions(+), 17 deletions(-)
>
[...]
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 6f2a33722c52..1813cfd0c4bd 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1451,7 +1451,8 @@ static int platform_dma_configure(struct device *dev)
> attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
> ret = acpi_dma_configure(dev, attr);
> }
> - if (ret || drv->driver_managed_dma)
> + /* @drv may not be valid when we're called from the IOMMU layer */
> + if (ret || !dev->driver || drv->driver_managed_dma)
> return ret;
>
> ret = iommu_device_use_default_domain(dev);
I wanted to report a regression here that was exposed by the new probing
behavior. On Pixel 6, we load our kernel modules in parallel which means
probing is done in parallel. This results in a race condition between the IOMMU
thread and the device probing thread. What I'm seeing is at the top of the
function `platform_dma_configure()` when we assign
`drv = to_platform_driver(dev->driver);`, `dev->driver` is NULL which results
in `drv = 0xf...ffd8`. In parallel, if the driver gets bound to the device
before we reach the above if-statement, then `dev->driver != NULL` and we will
de-reference `drv` -- resulting in a kernel panic.
To address this race condition and KP, we need to defer assigning `drv` until
after we check if the driver is bound. Here is what works for me:
----->8-----
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 1813cfd0c4bd..6d124447545c 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1440,8 +1440,8 @@ static void platform_shutdown(struct device *_dev)
static int platform_dma_configure(struct device *dev)
{
- struct platform_driver *drv = to_platform_driver(dev->driver);
struct fwnode_handle *fwnode = dev_fwnode(dev);
+ struct platform_driver *drv;
enum dev_dma_attr attr;
int ret = 0;
@@ -1451,8 +1451,12 @@ static int platform_dma_configure(struct device *dev)
attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
ret = acpi_dma_configure(dev, attr);
}
- /* @drv may not be valid when we're called from the IOMMU layer */
- if (ret || !dev->driver || drv->driver_managed_dma)
+ /* @dev->driver may not be valid when we're called from the IOMMU layer */
+ if (ret || !dev->driver)
+ return ret;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->driver_managed_dma)
return ret;
ret = iommu_device_use_default_domain(dev);
--
Please let me know what you think.
Thanks,
Will
[...]
Powered by blists - more mailing lists