[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40f1971d-640a-44b4-b798-d1a5844063e2@arm.com>
Date: Tue, 10 Jun 2025 16:37:58 +0100
From: Robin Murphy <robin.murphy@....com>
To: Nicolin Chen <nicolinc@...dia.com>, jgg@...dia.com, joro@...tes.org,
will@...nel.org, bhelgaas@...gle.com
Cc: iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, patches@...ts.linux.dev, pjaroszynski@...dia.com,
vsethi@...dia.com
Subject: Re: [PATCH RFC v1 0/2] iommu&pci: Disable ATS during FLR resets
On 2025-06-09 7:45 pm, Nicolin Chen wrote:
> Hi all,
>
> Per PCIe r6.3, sec 10.3.1 IMPLEMENTATION NOTE, software should disable ATS
> before initiating a Function Level Reset, and then ensure no invalidation
> requests being issued to a device when its ATS capability is disabled.
Not really - what it says is that software should not expect to receive
invalidate completions from a function which is in the process of being
reset or powered off, and if software doesn't want to be confused by
that then it should take care to wait for completion or timeout of all
outstanding requests, and avoid issuing new requests, before initiating
such a reset or power transition.
> Both pci_enable_ats() and pci_disable_ats() are called by an IOMMU driver,
> but an unsolicited FLR can happen at any time in the PCI layer. This might
> result in a race between them, breaking the rules given by the PCIe Spec.
Can you clarify which rules? The Implementation Note itself is just an
example of a possible software policy, and explicitly not normative.
> Therefore, there needs to be a sync between IOMMU and PCI subsystems, to
> ensure that ATS will be disabled and never gets re-enabled until the FLR
> finishes. Add a pair of new IOMMU helpers for PCI reset functions to call
> before and after the reset routines. These two helpers will temporally
> attach the device's RID/PASID to IOMMU_DOMAIN_BLOCKED, which should allow
> its IOMMU driver to pause any DMA traffic and disable ATS feature until
> the FLR is done.
FLR must inherently stop the function from issuing any kind of requests
anyway (see 6.6.2), so "pausing" traffic at the IOMMU end seems like a
non-issue. I guess I can see how messing with the domain attachment
underneath the rest of the group manages to prevent new invalidate
requests from group->domain being issued to the given function, but it's
pretty horrid - leaving the mutex blocked might be just about tolerable
for an FLR that's supposed to take no longer than 100ms, but what if we
do want to do this for suspend/resume as well?
Thanks,
Robin.
>
> This is on Github:
> https://github.com/nicolinc/iommufd/commits/iommu_dev_reset-rfcv1
>
> Thanks
> Nicolin
>
> Nicolin Chen (2):
> iommu: Introduce iommu_dev_reset_prepare() and iommu_dev_reset_done()
> pci: Suspend ATS before doing FLR
>
> include/linux/iommu.h | 12 +++++
> drivers/iommu/iommu.c | 106 ++++++++++++++++++++++++++++++++++++++++++
> drivers/pci/pci.c | 42 +++++++++++++++--
> 3 files changed, 156 insertions(+), 4 deletions(-)
>
Powered by blists - more mailing lists