lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <aEiXXYdhyeqcNiHX@nvidia.com> Date: Tue, 10 Jun 2025 13:36:45 -0700 From: Nicolin Chen <nicolinc@...dia.com> To: Jason Gunthorpe <jgg@...dia.com> CC: Robin Murphy <robin.murphy@....com>, <joro@...tes.org>, <will@...nel.org>, <bhelgaas@...gle.com>, <iommu@...ts.linux.dev>, <linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>, <patches@...ts.linux.dev>, <pjaroszynski@...dia.com>, <vsethi@...dia.com> Subject: Re: [PATCH RFC v1 0/2] iommu&pci: Disable ATS during FLR resets On Tue, Jun 10, 2025 at 01:30:45PM -0300, Jason Gunthorpe wrote: > On Tue, Jun 10, 2025 at 04:37:58PM +0100, Robin Murphy wrote: > > On 2025-06-09 7:45 pm, Nicolin Chen wrote: > > > Hi all, > > > > > > Per PCIe r6.3, sec 10.3.1 IMPLEMENTATION NOTE, software should disable ATS > > > before initiating a Function Level Reset, and then ensure no invalidation > > > requests being issued to a device when its ATS capability is disabled. > > > > Not really - what it says is that software should not expect to receive > > invalidate completions from a function which is in the process of being > > reset or powered off, and if software doesn't want to be confused by that > > then it should take care to wait for completion or timeout of all > > outstanding requests, and avoid issuing new requests, before initiating such > > a reset or power transition. > > The commit message can be more precise, but I agree with the > conclusion that the right direction for Linux is to disable and block > ATS, instead of trying to ignore completion time out events, or trying > to block page table mutations. Ie do what the implementation note > says.. > > Maybe: > > PCIe permits a device to ignore ATS invalidation TLPs while it is > processing FLR. This creates a problem visible to the OS where ATS > invalidation commands will time out. For instance a SVA domain will > have no coordination with a FLR event and can racily issue ATC > invalidations into a resetting device. > > The OS should do something to mitigate this as we do not want > production systems to be reporting critical ATS failures, especially > in a hypervisor environment. Broadly the OS could arrange to ignore > the timeout, block page table mutations to prevent invalidations, or > disable and block ATS. > > The PCIe spec in sec 10.3.1 IMPLEMENTATION NOTE recommends to disable > and block ATS, and we already have iommu driver support to implement > something like this. Implement this approach in the iommu core. > > Provide a callback from the PCI subsystem that will enclose the FLR > and have the iommu core temporarily change all the domain attachments > into BLOCKED. When attaching a BLOCKED domain IOMMU drivers should > fence any incoming ATS queries, synchronously stop issuing new ATS > invalidations, and synchronously wait for all ATS invalidations to > complete. This will avoid any ATS invaliation time outs. > > IOMMU drivers may also disable ATS in PCI config space, but it is not > required to solve the completion timeout problem. The PCI FLR logic > will put all the iommu owned config space bits back before completing. > > During this period holding the group mutex will not allow new domains > to be attached to prevent any new ATS invalidations. Will pick this writing. > > I guess I can see how messing with the domain attachment > > underneath the rest of the group manages to prevent new invalidate requests > > from group->domain being issued to the given function, but it's pretty > > horrid - leaving the mutex blocked might be just about tolerable for an FLR > > that's supposed to take no longer than 100ms, but what if we do want to do > > this for suspend/resume as well? > > I don't view this a problem for FLR, we can hold a mutex for a long > time. It principally delays domain changes which are kind of nonsense > to be doing concurrently with FLR in the first place.. > > However, for suspend, we probably want to leave a marker in the group IIUIC, the thing for suspend/resume is that it would result in a long hold of the mutex, which can be a problem? > that the group is force-blocked and all domain attach/detach logic > will only update the group tracking structures and not call into the > iommu driver. When the resume happens the core will set the current > group domain list to the iommu driver. No need for a long lived lock > this way. Yea, what we don't want is driver re-enabling ATS. So, bypassing it at the core level should work. Then, iommu_dev_reset_prepare and iommu_dev_reset_done will only mutex the flag. Thanks Nicolin
Powered by blists - more mailing lists