lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aAkjHMCQ82v5Ea0n@google.com>
Date: Wed, 23 Apr 2025 10:27:56 -0700
From: William McVicker <willmcvicker@...gle.com>
To: Robin Murphy <robin.murphy@....com>
Cc: Bjorn Helgaas <helgaas@...nel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	"Rafael J. Wysocki" <rafael@...nel.org>,
	Danilo Krummrich <dakr@...nel.org>, Jason Gunthorpe <jgg@...pe.ca>,
	"Rob Herring (Arm)" <robh@...nel.org>,
	Lorenzo Pieralisi <lpieralisi@...nel.org>,
	Joerg Roedel <jroedel@...e.de>, Bjorn Helgaas <bhelgaas@...gle.com>,
	iommu@...ts.linux.dev, Saravana Kannan <saravanak@...gle.com>,
	kernel-team@...roid.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] platform: Fix race condition during DMA configure at
 IOMMU probe time

On 04/23/2025, Robin Murphy wrote:
> On 2025-04-23 4:08 pm, Bjorn Helgaas wrote:
> > On Tue, Apr 22, 2025 at 04:26:49PM -0700, Will McVicker wrote:
> > > If devices are probed asynchronously, then there is a chance that during
> > > the IOMMU probe the driver is bound to the device in parallel. If this
> > > happens after getting the platform_driver pointer while in the function
> > > `platform_dma_configure()`, then the invalid `drv` pointer
> > > (drv==0xf...ffd8) will be de-referenced since `dev->driver != NULL`.
> > 
> > I need a little more hand-holding to make sense out of this.

Sorry for not making it super clear :/ I was trying to find the right balance
between being too verbose and not clear enough. I guess a threaded call flow
might help visually demonstrate the race condition.

> > 
> > After digging out
> > https://lore.kernel.org/all/aAa2Zx86yUfayPSG@google.com/, I see that
> > drv==0xf...ffd8 must be a result of applying to_platform_driver() to a
> > NULL pointer.  This patch still applies to_platform_driver(NULL), but
> > avoids using the result by testing drv for NULL later, which seems
> > prone to error.
> > 
> > I think this would all be clearer if we tested for the NULL pointer
> > explicitly before applying to_platform_driver().  I don't like setting
> > a pointer to an invalid value.  I think it's better if the pointer is
> > either valid or uninitialized because the compiler can help find uses
> > of uninitialized pointers.
> 
> Yeah, I was also in the middle of looking at this after managing to hit it
> playing with driver_async_probe at the end of last week. I guess when I
> originally wrote this pattern I was maybe thinking the compiler would defer
> the to_x_driver() computation to the point it's eventually dereferenced, but
> I suppose it can't since dev is passed to an external function in program
> order in between.

Glad to hear I'm not the only one hitting this. I agree we should test for the
NULL pointer first before trying to get the platform driver. I'll send a v2 for
this.

> 
> Indeed in my half-written version of this patch I was leaning towards
> removing the drv variable altogether (just doing
> to_x_driver(dev->driver)->driver_managed_dma inline), or at least doing the
> same as Will's previous diff. I figure the one-liner replacing
> "!dev->driver" with "!&drv->driver" would be too disgustingly non-obvious
> for anyone else's tastes...
> 
> For consistency we should really fix all the buses the same way - sorry for
> the bother (I can write up the other patches if you'd like). FWIW this part

Yes please. Thanks!

> really was the most temporary stopgap, as my planned next step is to propose
> moving driver_managed_dma and the use_default_domain() call up into the
> driver core and so removing all this bus-level code anyway, hence trying to
> minimise the effort spent churning it. Oh well.
> 
> > > To avoid a kernel panic and eliminate the race condition, we should
> > > guard the usage of `dev->driver` by only reading it once at the
> > > beginning of the function.
> > > 
> > > Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path")
> > > Signed-off-by: Will McVicker <willmcvicker@...gle.com>
> > > ---
> > >   drivers/base/platform.c | 7 ++++---
> > >   1 file changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> > > index 1813cfd0c4bd..b948c6e8e939 100644
> > > --- a/drivers/base/platform.c
> > > +++ b/drivers/base/platform.c
> > > @@ -1440,7 +1440,8 @@ static void platform_shutdown(struct device *_dev)
> > >   static int platform_dma_configure(struct device *dev)
> > >   {
> > > -	struct platform_driver *drv = to_platform_driver(dev->driver);
> > > +	struct device_driver *drv = READ_ONCE(dev->driver);
> 
> Beware this might annoy a different set of people as it's not paired with a
> WRITE_ONCE(), but for now I guess using it is still arguably better than
> not. Really we should be under device_lock at this point and so have no race
> at all, but we can't do that without keeping track of which devices are
> IOMMUs themselves to avoid deadlock, and that's not something I fancy
> throwing out as an -rc fix in a hurry...
> 
> Thanks,
> Robin.

Thanks again!
--Will

> 
> > > +	struct platform_driver *pdrv = to_platform_driver(drv);
> > >   	struct fwnode_handle *fwnode = dev_fwnode(dev);
> > >   	enum dev_dma_attr attr;
> > >   	int ret = 0;
> > > @@ -1451,8 +1452,8 @@ static int platform_dma_configure(struct device *dev)
> > >   		attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
> > >   		ret = acpi_dma_configure(dev, attr);
> > >   	}
> > > -	/* @drv may not be valid when we're called from the IOMMU layer */
> > > -	if (ret || !dev->driver || drv->driver_managed_dma)
> > > +	/* @dev->driver may not be valid when we're called from the IOMMU layer */
> > > +	if (ret || !drv || pdrv->driver_managed_dma)
> > >   		return ret;
> > >   	ret = iommu_device_use_default_domain(dev);
> > > -- 
> > > 2.49.0.805.g082f7c87e0-goog
> > > 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ