[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f6fedc6c-9f3a-4ffe-adb1-38b4f1632647@gmail.com>
Date: Thu, 31 Jul 2025 09:10:59 +0800
From: Ethan Zhao <etzhao1900@...il.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Nicolin Chen <nicolinc@...dia.com>, joro@...tes.org, will@...nel.org,
robin.murphy@....com, rafael@...nel.org, lenb@...nel.org,
bhelgaas@...gle.com, iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
linux-acpi@...r.kernel.org, linux-pci@...r.kernel.org,
patches@...ts.linux.dev, pjaroszynski@...dia.com, vsethi@...dia.com,
helgaas@...nel.org, baolu.lu@...ux.intel.com
Subject: Re: [PATCH RFC v2 0/4] Disable ATS via iommu during PCI resets
On 7/29/2025 8:59 PM, Jason Gunthorpe wrote:
> On Tue, Jul 29, 2025 at 02:16:43PM +0800, Ethan Zhao wrote:
>>
>>
>> On 7/28/2025 12:20 AM, Jason Gunthorpe wrote:
>>> On Sun, Jul 27, 2025 at 08:48:26PM +0800, Ethan Zhao wrote:
>>>
>>>> At least, we can do some attempt in DPC and Hot-plug driver, and then
>>>> push the hardware specification update to provide pre-reset notification for
>>>> DPC & hotplug. does it make sense ?
>>>
>>> I think DPC is a different case..
>> More complex and practical case.
>
> I'm not sure about that, we do FLRs all the time as a normal part of
> VFIO and VMM operations. DPC is pretty rare, IMHO.
DPC reset could be triggered by simply accessing its control bit, that
is boring, while data corruption hardware issue is really rare. >
>>> If we get a DPC we should also push the iommu into blocking, disable
>>> ATS and abandon any outstanding ATC invalidations as part of
>>> recovering from the DPC. Once everythings is cleaned up we can set the
>> Yup, even pure software resets, there might be ATC invalidation pending
>> (in software queue or HW queue).
>
> The design of this patch series will require the iommu driver to wait
> for the in-flight ATC invalidations during the blocking domain
I see there is pci_wait_for_pending_transaction() before the blocking
domain attachment.> attach. So for the SW initiated resets there should
not be pending ATC
> invalidations when the FLR is triggered.
>
> We have been talking about DPC internally, and I think it will need a
> related, but different flow since DPC can unavoidably trigger ATC
> invalidation timeouts/failures and we must sensibly handle them in the
There is race window for software to handle.
And for DPC containing data corruption as priority, seems not rational
to issue notification to software and then do resetting. alternative
way might be async modal support in iommu ATC invalidation path ?
Thanks,
Ethan > driver.
>
> Jason
Powered by blists - more mailing lists