[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6d85514f-3d55-43ae-a00f-334f8a5f81fb@amd.com>
Date: Wed, 26 Nov 2025 21:51:34 +0530
From: "Srivastava, Dheeraj Kumar" <dhsrivas@....com>
To: Nicolin Chen <nicolinc@...dia.com>, <joro@...tes.org>, <afael@...nel.org>,
<bhelgaas@...gle.com>, <alex@...zbot.org>, <jgg@...dia.com>
CC: <will@...nel.org>, <robin.murphy@....com>, <lenb@...nel.org>,
<kevin.tian@...el.com>, <baolu.lu@...ux.intel.com>,
<linux-arm-kernel@...ts.infradead.org>, <iommu@...ts.linux.dev>,
<linux-kernel@...r.kernel.org>, <linux-acpi@...r.kernel.org>,
<linux-pci@...r.kernel.org>, <kvm@...r.kernel.org>,
<patches@...ts.linux.dev>, <pjaroszynski@...dia.com>, <vsethi@...dia.com>,
<helgaas@...nel.org>, <etzhao1900@...il.com>
Subject: Re: [PATCH v7 0/5] Disable ATS via iommu during PCI resets
On 11/22/2025 7:27 AM, Nicolin Chen wrote:
> Hi all,
>
> PCIe permits a device to ignore ATS invalidation TLPs while processing a
> reset. This creates a problem visible to the OS where an ATS invalidation
> command will time out: e.g. an SVA domain will have no coordination with a
> reset event and can racily issue ATS invalidations to a resetting device.
>
> The OS should do something to mitigate this as we do not want production
> systems to be reporting critical ATS failures, especially in a hypervisor
> environment. Broadly, OS could arrange to ignore the timeouts, block page
> table mutations to prevent invalidations, or disable and block ATS.
>
> The PCIe spec in sec 10.3.1 IMPLEMENTATION NOTE recommends to disable and
> block ATS before initiating a Function Level Reset. It also mentions that
> other reset methods could have the same vulnerability as well.
>
> Provide a callback from the PCI subsystem that will enclose the reset and
> have the iommu core temporarily change domains to group->blocking_domain,
> so IOMMU drivers would fence any incoming ATS queries, synchronously stop
> issuing new ATS invalidations, and wait for existing ATS invalidations to
> complete. Doing this can avoid any ATS invaliation timeouts.
>
> When a device is resetting, any new domain attachment has to be rejected,
> until the reset is finished, to prevent ATS activity from being activated
> between the two callback functions. Introduce a new resetting_domain, and
> reject a concurrent __iommu_attach_device/set_group_pasid().
>
> Finally, call these pci_dev_reset_iommu/done() functions in the PCI reset
> functions.
>
> This is on Github:
> https://github.com/nicolinc/iommufd/commits/iommu_dev_reset-v7
>
> Changelog
> v7
> * Rebase on Joerg's next tree
> * Add Reviewed-by from Kevin
> * [iommu] Fix inline functions when !CONFIG_IOMMU_API
> v6
> https://lore.kernel.org/all/cover.1763512374.git.nicolinc@nvidia.com/
> * Add Reviewed-by from Baolu and Kevin
> * Revise inline comments, kdocs, commit messages, uAPI
> * [iommu] s/iommu_dev_reset/pci_dev_reset_iommu/g for PCI exclusively
> * [iommu] Disallow iommu group sibling devices to attach concurrently
> * [pci] Drop unnecessary initializations to "ret" and "rc"
> * [pci] Improve pci_err message unpon a prepare() failure
> * [pci] Move pci_ats_supported() check inside the IOMMU callbacks
> * [pci] Apply callbacks to pci_reset_bus_function() that was missed
> v5
> https://lore.kernel.org/all/cover.1762835355.git.nicolinc@nvidia.com/
> * Rebase on Joerg's next tree
> * [iommu] Skip in shared iommu_group cases
> * [iommu] Pass in default_domain to iommu_setup_dma_ops
> * [iommu] Add kdocs to iommu_get_domain_for_dev_locked()
> * [iommu] s/get_domain_for_dev_locked/driver_get_domain_for_dev
> * [iommu] Replace per-gdev pending_reset with per-group resetting_domain
> v4
> https://lore.kernel.org/all/cover.1756682135.git.nicolinc@nvidia.com/
> * Add Reviewed-by from Baolu
> * [iommu] Use guard(mutex)
> * [iommu] Update kdocs for typos and revisings
> * [iommu] Skip two corner cases (alias and SRIOV)
> * [iommu] Rework attach_dev to pass in old domain pointer
> * [iommu] Reject concurrent attach_dev/set_dev_pasid for compatibility
> concern
> * [smmuv3] Drop the old_domain depedency in its release_dev callback
> * [pci] Add pci_reset_iommu_prepare/_done() wrappers checking ATS cap
> v3
> https://lore.kernel.org/all/cover.1754952762.git.nicolinc@nvidia.com/
> * Add Reviewed-by from Jason
> * [iommu] Add a fast return in iommu_deferred_attach()
> * [iommu] Update kdocs, inline comments, and commit logs
> * [iommu] Use group->blocking_domain v.s. ops->blocked_domain
> * [iommu] Drop require_direct, iommu_group_get(), and xa_lock()
> * [iommu] Set the pending_reset flag after RID/PASID domain setups
> * [iommu] Do not bypass PASID domains when RID domain is already the
> blocking_domain
> * [iommu] Add iommu_get_domain_for_dev_locked to correctly return the
> blocking_domain
> v2
> https://lore.kernel.org/all/cover.1751096303.git.nicolinc@nvidia.com/
> * [iommu] Update kdocs, inline comments, and commit logs
> * [iommu] Replace long-holding group->mutex with a pending_reset flag
> * [pci] Abort reset routines if iommu_dev_reset_prepare() fails
> * [pci] Apply the same vulnerability fix to other reset functions
> v1
> https://lore.kernel.org/all/cover.1749494161.git.nicolinc@nvidia.com/
>
> Thanks
> Nicolin
>
> Nicolin Chen (5):
> iommu: Lock group->mutex in iommu_deferred_attach()
> iommu: Tidy domain for iommu_setup_dma_ops()
> iommu: Add iommu_driver_get_domain_for_dev() helper
> iommu: Introduce pci_dev_reset_iommu_prepare/done()
> PCI: Suspend iommu function prior to resetting a device
>
> drivers/iommu/dma-iommu.h | 5 +-
> include/linux/iommu.h | 14 ++
> include/uapi/linux/vfio.h | 4 +
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 +-
> drivers/iommu/dma-iommu.c | 4 +-
> drivers/iommu/iommu.c | 220 +++++++++++++++++++-
> drivers/pci/pci-acpi.c | 13 +-
> drivers/pci/pci.c | 65 +++++-
> drivers/pci/quirks.c | 19 +-
> 9 files changed, 326 insertions(+), 23 deletions(-)
>
Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@....com>
Thanks
Dheeraj
Powered by blists - more mailing lists