[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250613192709.GA971579@bhelgaas>
Date: Fri, 13 Jun 2025 14:27:09 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Robin Murphy <robin.murphy@....com>, Nicolin Chen <nicolinc@...dia.com>,
joro@...tes.org, will@...nel.org, bhelgaas@...gle.com,
iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, patches@...ts.linux.dev,
pjaroszynski@...dia.com, vsethi@...dia.com
Subject: Re: [PATCH RFC v1 0/2] iommu&pci: Disable ATS during FLR resets
On Tue, Jun 10, 2025 at 01:30:45PM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 10, 2025 at 04:37:58PM +0100, Robin Murphy wrote:
> > On 2025-06-09 7:45 pm, Nicolin Chen wrote:
> > > Hi all,
> > >
> > > Per PCIe r6.3, sec 10.3.1 IMPLEMENTATION NOTE, software should disable ATS
> > > before initiating a Function Level Reset, and then ensure no invalidation
> > > requests being issued to a device when its ATS capability is disabled.
> >
> > Not really - what it says is that software should not expect to receive
> > invalidate completions from a function which is in the process of being
> > reset or powered off, and if software doesn't want to be confused by that
> > then it should take care to wait for completion or timeout of all
> > outstanding requests, and avoid issuing new requests, before initiating such
> > a reset or power transition.
>
> The commit message can be more precise, but I agree with the
> conclusion that the right direction for Linux is to disable and block
> ATS, instead of trying to ignore completion time out events, or trying
> to block page table mutations. Ie do what the implementation note
> says..
>
> Maybe:
>
> PCIe permits a device to ignore ATS invalidation TLPs while it is
> processing FLR. This creates a problem visible to the OS where ATS
> invalidation commands will time out. For instance a SVA domain will
> have no coordination with a FLR event and can racily issue ATC
> invalidations into a resetting device.
The sec 10.3.1 implementation note mentions FLR specifically, but it
seems like *any* kind of reset would be vulnerable, e.g., SBR,
external PERST# assert, etc?
Powered by blists - more mailing lists