[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <49b6c324-eece-4755-b3a2-2005b6246384@quicinc.com>
Date: Fri, 24 Jan 2025 06:02:45 +0530
From: Charan Teja Kalla <quic_charante@...cinc.com>
To: Robin Murphy <robin.murphy@....com>, Will Deacon <will@...nel.org>
CC: <joro@...tes.org>, <jgg@...pe.ca>, <iommu@...ts.linux.dev>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
On 1/22/2025 8:13 PM, Robin Murphy wrote:
>>> Please lmk If I can submit this as V2 patch, with your name as
>>> Suggested-by?
>> I missed to mention that this patch is fixing the problem we are
>> seeing..Thanks.
>
> Ah, that's good. I did spend most of an afternoon digging through git
> history to write up this patch the other day, and was planning to send
> it properly soon (probably after the merge window now) - as noted, it
> also turns out not to be quite right for 6.6LTS as-is, so I figured I'd
> save worrying about backports until it lands.
Thanks a lot Robin for sending this patch with much more details!!
>
> Thanks,
> Robin.
>
> ----->8-----
> Subject: [PATCH] iommu: Handle race with default domain setup
>
> It turns out that deferred default domain creation leaves a subtle
> race window during iommu_device_register() wherein a client driver may
> asynchronously probe in parallel and get as far as performing DMA API
> operations with dma-direct, only to be switched to iommu-dma underfoot
> once the default domain attachment finally happens, with obviously
> disastrous consequences. Even the wonky of_iommu_configure() path is at
> risk, since iommu_fwspec_init() will no longer defer client probe as the
> instance ops are (necessarily) already registered, and the "replay"
> iommu_probe_device() call can see dev->iommu_group already set and so
> think there's nothing to do either.
>
> Fortunately we already have the right tool in the right place in the
> form of iommu_device_use_default_domain(), which just needs to ensure
> that said default domain is actually ready to *be* used. Deferring the
> client probe shouldn't have too much impact, given that this only
> happens while the IOMMU driver is probing, and thus due to kick the
> deferred probe list again once it finishes.
>
> Reported-by: Charan Teja Kalla <quic_charante@...cinc.com>
> Fixes: 98ac73f99bc4 ("iommu: Require a default_domain for all iommu
> drivers")
> Signed-off-by: Robin Murphy <robin.murphy@....com>
Do you think If the below tag can be added?
Closes:
https://lore.kernel.org/all/20241213150415.653821-1-quic_charante@quicinc.com/
>
> ---
>
> Note this fixes tag is rather nuanced - historically there was a more
> general issue before deac0b3bed26 ("iommu: Split off default domain
> allocation from group assignment") set the basis for the current
> conditions; 1ea2a07a532b ("iommu: Add DMA ownership management
> interfaces") is then the point at which it becomes logical to fix the
> current race this way; however only from 98ac73f99bc4 can we rely on all
> drivers supporting default domains and so avoid false negatives, thus
> even though this might apply to older kernels without conflict it would
> not be functionally correct. Furthermore given the other locking and API
> flow changes which also happened over that time, it's not clear how long
> this race has actually been exposed. LTS fixes are going to be fiddly...
Thanks again for sharing such a detailed history.
My understanding is that the commit f188056352bc ("iommu: Avoid
locking/unlocking for iommu_probe_device()"), also sets the basis for
the current race conditions. Without this, I think, iommu_probe_device()
would always returns with default domain and later iommu_setup_dma_ops()
fills the 'dev->dma_iommu' flag. Please CMIW..
And, as the commit f188056352bc ("iommu: Avoid locking/unlocking for
iommu_probe_device()") is present from 6.6 LTS, I am not sure if we
really need to look before 6.6 LTS kernels to fix this race.
> ---
> drivers/iommu/iommu.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 851fd5aeccf5..9969531ad4ed 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -3097,6 +3097,11 @@ int iommu_device_use_default_domain(struct device
> *dev)
> return 0;
>
> mutex_lock(&group->mutex);
> + /* We may race against bus_iommu_probe() finalising groups here */
> + if (!group->default_domain) {
> + ret = -EPROBE_DEFER;
> + goto unlock_out;
> + }
> if (group->owner_cnt) {
> if (group->domain != group->default_domain || group->owner ||
> !xa_empty(&group->pasid_array)) {
Powered by blists - more mailing lists