[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251125192732.GF520526@nvidia.com>
Date: Tue, 25 Nov 2025 15:27:32 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: joro@...tes.org, afael@...nel.org, bhelgaas@...gle.com,
alex@...zbot.org, will@...nel.org, robin.murphy@....com,
lenb@...nel.org, kevin.tian@...el.com, baolu.lu@...ux.intel.com,
linux-arm-kernel@...ts.infradead.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org,
linux-pci@...r.kernel.org, kvm@...r.kernel.org,
patches@...ts.linux.dev, pjaroszynski@...dia.com, vsethi@...dia.com,
helgaas@...nel.org, etzhao1900@...il.com
Subject: Re: [PATCH v7 4/5] iommu: Introduce
pci_dev_reset_iommu_prepare/done()
On Fri, Nov 21, 2025 at 05:57:31PM -0800, Nicolin Chen wrote:
> PCIe permits a device to ignore ATS invalidation TLPs while processing a
> reset. This creates a problem visible to the OS where an ATS invalidation
> command will time out. E.g. an SVA domain will have no coordination with a
> reset event and can racily issue ATS invalidations to a resetting device.
>
> The OS should do something to mitigate this as we do not want production
> systems to be reporting critical ATS failures, especially in a hypervisor
> environment. Broadly, OS could arrange to ignore the timeouts, block page
> table mutations to prevent invalidations, or disable and block ATS.
>
> The PCIe r6.0, sec 10.3.1 IMPLEMENTATION NOTE recommends SW to disable and
> block ATS before initiating a Function Level Reset. It also mentions that
> other reset methods could have the same vulnerability as well.
>
> Provide a callback from the PCI subsystem that will enclose the reset and
> have the iommu core temporarily change all the attached RID/PASID domains
> group->blocking_domain so that the IOMMU hardware would fence any incoming
> ATS queries. And IOMMU drivers should also synchronously stop issuing new
> ATS invalidations and wait for all ATS invalidations to complete. This can
> avoid any ATS invaliation timeouts.
>
> However, if there is a domain attachment/replacement happening during an
> ongoing reset, ATS routines may be re-activated between the two function
> calls. So, introduce a new resetting_domain in the iommu_group structure
> to reject any concurrent attach_dev/set_dev_pasid call during a reset for
> a concern of compatibility failure. Since this changes the behavior of an
> attach operation, update the uAPI accordingly.
>
> Note that there are two corner cases:
> 1. Devices in the same iommu_group
> Since an attachment is always per iommu_group, this means that any
> sibling devices in the iommu_group cannot change domain, to prevent
> race conditions.
> 2. An SR-IOV PF that is being reset while its VF is not
> In such case, the VF itself is already broken. So, there is no point
> in preventing PF from going through the iommu reset.
>
> Reviewed-by: Lu Baolu <baolu.lu@...ux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@...el.com>
> Signed-off-by: Nicolin Chen <nicolinc@...dia.com>
> ---
> include/linux/iommu.h | 13 +++
> include/uapi/linux/vfio.h | 4 +
> drivers/iommu/iommu.c | 173 ++++++++++++++++++++++++++++++++++++++
> 3 files changed, 190 insertions(+)
Reviewed-by: Jason Gunthorpe <jgg@...dia.com>
Jason
Powered by blists - more mailing lists