[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0a0a576e-dd4d-402a-a0ea-43eede4e7cd8@arm.com>
Date: Mon, 13 Jan 2025 17:25:18 +0000
From: Robin Murphy <robin.murphy@....com>
To: Charan Teja Kalla <quic_charante@...cinc.com>,
Will Deacon <will@...nel.org>
Cc: joro@...tes.org, jgg@...pe.ca, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
On 10/01/2025 1:29 pm, Charan Teja Kalla wrote:
[...]
>>> We need to fix 6.13 before we fix 6.6 unless you can show that 6.13 is
>>> unaffected (in which case, we can backport the fix(es)).
>>
>> Certainly the reasoning can't apply to mainline as given, since
>> arch_setup_dma_ops() stopped touching IOMMU stuff at all back in 6.10
>> (and indeed iommu_dma_ops itself no longer exists since 6.12).
>>
>
> @Robin
>
> Agree that we don't have iommu_dma_ops but I can say that same race
> still exists. Although dma_ops is not used, but the decision to call
> into the iommu api's is determined by the 'dev->dma_iommu' flag, which
> again, is set after domain is allocated for a device.
>
> In the same race mentioned above,
> 1) S: Domain is not allocated but the dev->iommu_group.
> 2) C: Just returns as dev->iommu_group is filled.
> 3) C: Continues probing and succeeds.
> 4) C: Calls dma_alloc/map/.... But, it won't enter into iommu_ api's
> because the 'dev->dma_iommu' is still 'false'.
> 5) S: Domain is allocated and sets the 'dev->dma_iommu' to 'true'.
>
> 4) above is the problematic step. Although issue exists but seems to me
> that very narrow to get triggered. Please CMIW.
Hmm, yes, I guess there is a fundamental race where async client driver
probe can observe a partially initialised group while the IOMMU driver
itself is still running iommu_device_register()->bus_iommu_probe()...
And in that case, this patch is wrong for two reasons: firstly, bodging
the iommu_probe_device() call (which really should not exist at all)
does not cover all cases; and secondly, forcibly creating the default
domain in that path before we know bus_iommu_probe() has seen all other
existing devices means iommu_get_default_domain_type() may miss their
requirements and thus defeat the whole point of deferred allocation in
the first place.
Having looked a bit closer, I think a more robust solution for now is
probably as below.
Thanks,
Robin.
----->8-----
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d1af0547f553..8d90d196e38d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3120,6 +3120,11 @@ int iommu_device_use_default_domain(struct device *dev)
return 0;
mutex_lock(&group->mutex);
+ /* We may race against bus_iommu_probe() finalising groups here */
+ if (!group->default_domain) {
+ ret = -EPROBE_DEFER;
+ goto unlock_out;
+ }
if (group->owner_cnt) {
if (group->domain != group->default_domain || group->owner ||
!xa_empty(&group->pasid_array)) {
Powered by blists - more mailing lists