[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aIAJfYMKYKyZZRqx@Asurada-Nvidia>
Date: Tue, 22 Jul 2025 14:58:21 -0700
From: Nicolin Chen <nicolinc@...dia.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: <joro@...tes.org>, <will@...nel.org>, <robin.murphy@....com>,
<rafael@...nel.org>, <lenb@...nel.org>, <bhelgaas@...gle.com>,
<iommu@...ts.linux.dev>, <linux-kernel@...r.kernel.org>,
<linux-acpi@...r.kernel.org>, <linux-pci@...r.kernel.org>,
<patches@...ts.linux.dev>, <pjaroszynski@...dia.com>, <vsethi@...dia.com>,
<helgaas@...nel.org>, <baolu.lu@...ux.intel.com>
Subject: Re: [PATCH RFC v2 3/4] iommu: Introduce iommu_dev_reset_prepare()
and iommu_dev_reset_done()
Sorry for a huge delay. I've addressed all, following your remarks.
Some feedbacks inline.
On Fri, Jul 04, 2025 at 12:43:42PM -0300, Jason Gunthorpe wrote:
> On Sat, Jun 28, 2025 at 12:42:41AM -0700, Nicolin Chen wrote:
>
> > - This only works for IOMMU drivers that implemented ops->blocked_domain
> > correctly with pci_disable_ats().
>
> As was in the thread, it works for everyone. Even if we install an
> empty paging domain for blocking that still will stop the ATS
> invalidations from being issued. ATS remains on but this is not a
> problem.
OK. And I am dropping this validation in the PCI patch:
/* Something wrong with the iommu driver that failed to disable ATS */
if (dev->ats_enabled)
pci_err(dev, "failed to stop ATS. ATS invalidation may time out\n");
> > @@ -2155,8 +2172,17 @@ int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain)
> > int ret = 0;
> >
> > mutex_lock(&group->mutex);
> > +
> > + /*
> > + * There is a racy attach while the device is resetting. Defer it until
> > + * the iommu_dev_reset_done() that attaches the device to group->domain.
> > + */
> > + if (device_to_group_device(dev)->pending_reset)
> > + goto unlock;
> > +
> > if (dev->iommu && dev->iommu->attach_deferred)
> > ret = __iommu_attach_device(domain, dev);
> > +unlock:
> > mutex_unlock(&group->mutex);
>
> Actually looking at this some more maybe write it like:
>
> /*
> * This is called on the dma mapping fast path so avoid locking. This
> * is racy, but we have an expectation that the driver will setup its
> * DMAs inside probe while still single threaded to avoid racing.
> */
> if (dev->iommu && !READ_ONCE(dev->iommu->attach_deferred))
This triggers a build error as attach_deferred is a bit-field. So I
am changing it from "u32 attach_deferred:1" to "bool" for this.
And, to keep the original logic, I think it should be:
if (!dev->iommu || !READ_ONCE(dev->iommu->attach_deferred))
> return 0;
>
> guard(mutex)(&group->mutex);
I recall Baolu mentioned that Joerg might not like the guard style
so I am keeping mutex_lock/unlock().
> if (device_to_group_device(dev)->pending_reset)
> return 0;
>
> if (!dev->iommu->attach_deferred)
> return 0;
I think this is redundant since the fast path checked.
> return __iommu_attach_device(domain, dev);
>
> And of course it is already quite crazy to be doing FLR during a
> device probe so this is not a realistic scenario.
Hmm, I am not sure about that, as I see iommu_deferred_attach() get
mostly invoked by a dma_alloc() or even a dma_map(). So, this might
not be confined to a device probe?
> > + if (dev->iommu->require_direct) {
> > + dev_warn(
> > + dev,
> > + "Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.\n");
> > + return -EINVAL;
> > + }
>
> I don't think we can do this. eg on ARM all devices have RMRs inside
> VMs so this will completely break FLR inside a vm???
>
> Either ignore this condition with the rational that we are about to
> reset it so it doesn't matter, or we need to establish a new paging
> domain for isolation purposes that has the RMR setup.
Ah, you are right. ARM MSI in a VM uses RMR and sets this.
But does it also raise a question that a VM having RMR can't use
the blocked_domain, as __iommu_device_set_domain() has the exact
same check rejecting blocked_domain? Not sure if there would be
some unintended consequnce though...
> > + if (ret)
> > + goto unlock;
> > +
> > + /* Dock PASID domains to blocked_domain while retaining pasid_array */
> > + xa_lock(&group->pasid_array);
>
> Not sure we need this lock? The group mutex already prevents mutation
> of the xa list and I dont' think it is allowed to call
> iommu_remove_dev_pasid() in an atomic context.
I see only iommu_attach_handle_get() doesn't use group->mutex. And
it's a reader. So I think it's safe to drop the xa_lock.
I added this:
/* ||| iommu_map_sg
* Dock PASID domains to blocking_domain while retaining pasid_array.
*
* The pasid_array is mostly fenced by group->mutex, except one reader
* in iommu_attach_handle_get(), so it's safe to read without xa_lock.
*/
Thanks!
Nicolin
Powered by blists - more mailing lists