lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4129dca9-07fa-4b9d-a7d8-de7561d509e7@arm.com>
Date: Wed, 23 Apr 2025 16:50:29 +0100
From: Robin Murphy <robin.murphy@....com>
To: Bjorn Helgaas <helgaas@...nel.org>,
 Will McVicker <willmcvicker@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 "Rafael J. Wysocki" <rafael@...nel.org>, Danilo Krummrich <dakr@...nel.org>,
 Jason Gunthorpe <jgg@...pe.ca>, "Rob Herring (Arm)" <robh@...nel.org>,
 Lorenzo Pieralisi <lpieralisi@...nel.org>, Joerg Roedel <jroedel@...e.de>,
 Bjorn Helgaas <bhelgaas@...gle.com>, iommu@...ts.linux.dev,
 Saravana Kannan <saravanak@...gle.com>, kernel-team@...roid.com,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] platform: Fix race condition during DMA configure at
 IOMMU probe time

On 2025-04-23 4:08 pm, Bjorn Helgaas wrote:
> On Tue, Apr 22, 2025 at 04:26:49PM -0700, Will McVicker wrote:
>> If devices are probed asynchronously, then there is a chance that during
>> the IOMMU probe the driver is bound to the device in parallel. If this
>> happens after getting the platform_driver pointer while in the function
>> `platform_dma_configure()`, then the invalid `drv` pointer
>> (drv==0xf...ffd8) will be de-referenced since `dev->driver != NULL`.
> 
> I need a little more hand-holding to make sense out of this.
> 
> After digging out
> https://lore.kernel.org/all/aAa2Zx86yUfayPSG@google.com/, I see that
> drv==0xf...ffd8 must be a result of applying to_platform_driver() to a
> NULL pointer.  This patch still applies to_platform_driver(NULL), but
> avoids using the result by testing drv for NULL later, which seems
> prone to error.
> 
> I think this would all be clearer if we tested for the NULL pointer
> explicitly before applying to_platform_driver().  I don't like setting
> a pointer to an invalid value.  I think it's better if the pointer is
> either valid or uninitialized because the compiler can help find uses
> of uninitialized pointers.

Yeah, I was also in the middle of looking at this after managing to hit 
it playing with driver_async_probe at the end of last week. I guess when 
I originally wrote this pattern I was maybe thinking the compiler would 
defer the to_x_driver() computation to the point it's eventually 
dereferenced, but I suppose it can't since dev is passed to an external 
function in program order in between.

Indeed in my half-written version of this patch I was leaning towards 
removing the drv variable altogether (just doing 
to_x_driver(dev->driver)->driver_managed_dma inline), or at least doing 
the same as Will's previous diff. I figure the one-liner replacing 
"!dev->driver" with "!&drv->driver" would be too disgustingly 
non-obvious for anyone else's tastes...

For consistency we should really fix all the buses the same way - sorry 
for the bother (I can write up the other patches if you'd like). FWIW 
this part really was the most temporary stopgap, as my planned next step 
is to propose moving driver_managed_dma and the use_default_domain() 
call up into the driver core and so removing all this bus-level code 
anyway, hence trying to minimise the effort spent churning it. Oh well.

>> To avoid a kernel panic and eliminate the race condition, we should
>> guard the usage of `dev->driver` by only reading it once at the
>> beginning of the function.
>>
>> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path")
>> Signed-off-by: Will McVicker <willmcvicker@...gle.com>
>> ---
>>   drivers/base/platform.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
>> index 1813cfd0c4bd..b948c6e8e939 100644
>> --- a/drivers/base/platform.c
>> +++ b/drivers/base/platform.c
>> @@ -1440,7 +1440,8 @@ static void platform_shutdown(struct device *_dev)
>>   
>>   static int platform_dma_configure(struct device *dev)
>>   {
>> -	struct platform_driver *drv = to_platform_driver(dev->driver);
>> +	struct device_driver *drv = READ_ONCE(dev->driver);

Beware this might annoy a different set of people as it's not paired 
with a WRITE_ONCE(), but for now I guess using it is still arguably 
better than not. Really we should be under device_lock at this point and 
so have no race at all, but we can't do that without keeping track of 
which devices are IOMMUs themselves to avoid deadlock, and that's not 
something I fancy throwing out as an -rc fix in a hurry...

Thanks,
Robin.

>> +	struct platform_driver *pdrv = to_platform_driver(drv);
>>   	struct fwnode_handle *fwnode = dev_fwnode(dev);
>>   	enum dev_dma_attr attr;
>>   	int ret = 0;
>> @@ -1451,8 +1452,8 @@ static int platform_dma_configure(struct device *dev)
>>   		attr = acpi_get_dma_attr(to_acpi_device_node(fwnode));
>>   		ret = acpi_dma_configure(dev, attr);
>>   	}
>> -	/* @drv may not be valid when we're called from the IOMMU layer */
>> -	if (ret || !dev->driver || drv->driver_managed_dma)
>> +	/* @dev->driver may not be valid when we're called from the IOMMU layer */
>> +	if (ret || !drv || pdrv->driver_managed_dma)
>>   		return ret;
>>   
>>   	ret = iommu_device_use_default_domain(dev);
>> -- 
>> 2.49.0.805.g082f7c87e0-goog
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ