[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aA9xCLF_wNsCUuJ8@hovoldconsulting.com>
Date: Mon, 28 Apr 2025 14:14:00 +0200
From: Johan Hovold <johan@...nel.org>
To: Robin Murphy <robin.murphy@....com>
Cc: joro@...tes.org, will@...nel.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] iommu: Handle yet another race around registration
On Thu, Apr 24, 2025 at 06:41:28PM +0100, Robin Murphy wrote:
> Next up on our list of race windows to close is another one during
> iommu_device_register() - it's now OK again for multiple instances to
> run their bus_iommu_probe() in parallel, but an iommu_probe_device() can
> still also race against a running bus_iommu_probe(). As Johan has
> managed to prove, this has now become a lot more visible on DT platforms
> wth driver_async_probe where a client driver is attempting to probe in
> parallel with its IOMMU driver
To be clear, I hit this with just normal probe deferral (not explicit
async probe).
> - although commit b46064a18810 ("iommu:
> Handle race with default domain setup") resolves this from the client
> driver's point of view, this isn't before of_iommu_configure() has had
> the chance to attempt to "replay" a probe that the bus walk hasn't even
> tried yet, and so still cause the out-of-order group allocation
> behaviour that we're trying to clean up (and now warning about).
>
> The most reliable thing to do here is to explicitly keep track of the
> "iommu_device_register() is still running" state, so we can then
> special-case the ops lookup for the replay path (based on dev->iommu
> again) to let that think it's still waiting for the IOMMU driver to
> appear at all. This still leaves the longstanding theoretical case of
> iommu_bus_notifier() being triggered during bus_iommu_probe(), but it's
> not so simple to defer a notifier, and nobody's ever reported that being
> a visible issue, so let's quietly kick that can down the road for now...
>
> Reported-by: Johan Hovold <johan@...nel.org>
Perhaps add a reference to my report:
Link: https://lore.kernel.org/lkml/Z_jMiC1uj_MJpKVj@hovoldconsulting.com/
> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path")
> Signed-off-by: Robin Murphy <robin.murphy@....com>
This looks like it should work and nothing blows up even if I haven't
tried to instrument a reliable reproducer to test it against:
Reviewed-by: Johan Hovold <johan+linaro@...nel.org>
Tested-by: Johan Hovold <johan+linaro@...nel.org>
That said, don't you have a similar issue with:
@@ -414,9 +414,21 @@ static int iommu_init_device(struct device *dev)
if (!dev_iommu_get(dev))
return -ENOMEM;
/*
- * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
- * instances with non-NULL fwnodes, and client devices should have been
- * identified with a fwspec by this point. Otherwise, we can currently
+ * For FDT-based systems and ACPI IORT/VIOT, the common firmware parsing
+ * is buried in the bus dma_configure path. Properly unpicking that is
+ * still a big job, so for now just invoke the whole thing. The device
+ * already having a driver bound means dma_configure has already run and
+ * either found no IOMMU to wait for, or we're in its replay call right
+ * now, so either way there's no point calling it again.
+ */
+ if (!dev->driver && dev->bus->dma_configure) {
What prevents a racing client driver probe from having set dev->driver
here so that dma_configure() isn't called?
+ mutex_unlock(&iommu_probe_device_lock);
+ dev->bus->dma_configure(dev);
+ mutex_lock(&iommu_probe_device_lock);
+ }
Johan
Powered by blists - more mailing lists