[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <BL1PR11MB5271BE949150D4C2B5EC589A8C08A@BL1PR11MB5271.namprd11.prod.outlook.com>
Date: Fri, 12 Sep 2025 09:49:13 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Nicolin Chen <nicolinc@...dia.com>, "joro@...tes.org" <joro@...tes.org>,
"jgg@...dia.com" <jgg@...dia.com>, "bhelgaas@...gle.com"
<bhelgaas@...gle.com>
CC: "suravee.suthikulpanit@....com" <suravee.suthikulpanit@....com>,
"will@...nel.org" <will@...nel.org>, "robin.murphy@....com"
<robin.murphy@....com>, "sven@...nel.org" <sven@...nel.org>, "j@...nau.net"
<j@...nau.net>, "alyssa@...enzweig.io" <alyssa@...enzweig.io>,
"neal@...pa.dev" <neal@...pa.dev>, "robin.clark@....qualcomm.com"
<robin.clark@....qualcomm.com>, "m.szyprowski@...sung.com"
<m.szyprowski@...sung.com>, "krzk@...nel.org" <krzk@...nel.org>,
"alim.akhtar@...sung.com" <alim.akhtar@...sung.com>, "dwmw2@...radead.org"
<dwmw2@...radead.org>, "baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
"yong.wu@...iatek.com" <yong.wu@...iatek.com>, "matthias.bgg@...il.com"
<matthias.bgg@...il.com>, "angelogioacchino.delregno@...labora.com"
<angelogioacchino.delregno@...labora.com>, "tjeznach@...osinc.com"
<tjeznach@...osinc.com>, "paul.walmsley@...ive.com"
<paul.walmsley@...ive.com>, "palmer@...belt.com" <palmer@...belt.com>,
"aou@...s.berkeley.edu" <aou@...s.berkeley.edu>, "alex@...ti.fr"
<alex@...ti.fr>, "heiko@...ech.de" <heiko@...ech.de>,
"schnelle@...ux.ibm.com" <schnelle@...ux.ibm.com>, "mjrosato@...ux.ibm.com"
<mjrosato@...ux.ibm.com>, "gerald.schaefer@...ux.ibm.com"
<gerald.schaefer@...ux.ibm.com>, "orsonzhai@...il.com" <orsonzhai@...il.com>,
"baolin.wang@...ux.alibaba.com" <baolin.wang@...ux.alibaba.com>,
"zhang.lyra@...il.com" <zhang.lyra@...il.com>, "wens@...e.org"
<wens@...e.org>, "jernej.skrabec@...il.com" <jernej.skrabec@...il.com>,
"samuel@...lland.org" <samuel@...lland.org>, "jean-philippe@...aro.org"
<jean-philippe@...aro.org>, "rafael@...nel.org" <rafael@...nel.org>,
"lenb@...nel.org" <lenb@...nel.org>, "Liu, Yi L" <yi.l.liu@...el.com>,
"cwabbott0@...il.com" <cwabbott0@...il.com>, "quic_pbrahma@...cinc.com"
<quic_pbrahma@...cinc.com>, "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"asahi@...ts.linux.dev" <asahi@...ts.linux.dev>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>, "linux-arm-msm@...r.kernel.org"
<linux-arm-msm@...r.kernel.org>, "linux-samsung-soc@...r.kernel.org"
<linux-samsung-soc@...r.kernel.org>, "linux-mediatek@...ts.infradead.org"
<linux-mediatek@...ts.infradead.org>, "linux-riscv@...ts.infradead.org"
<linux-riscv@...ts.infradead.org>, "linux-rockchip@...ts.infradead.org"
<linux-rockchip@...ts.infradead.org>, "linux-s390@...r.kernel.org"
<linux-s390@...r.kernel.org>, "linux-sunxi@...ts.linux.dev"
<linux-sunxi@...ts.linux.dev>, "linux-tegra@...r.kernel.org"
<linux-tegra@...r.kernel.org>, "virtualization@...ts.linux.dev"
<virtualization@...ts.linux.dev>, "linux-acpi@...r.kernel.org"
<linux-acpi@...r.kernel.org>, "linux-pci@...r.kernel.org"
<linux-pci@...r.kernel.org>, "patches@...ts.linux.dev"
<patches@...ts.linux.dev>, "Sethi, Vikram" <vsethi@...dia.com>,
"helgaas@...nel.org" <helgaas@...nel.org>, "etzhao1900@...il.com"
<etzhao1900@...il.com>
Subject: RE: [PATCH v4 6/7] iommu: Introduce iommu_dev_reset_prepare() and
iommu_dev_reset_done()
> From: Nicolin Chen <nicolinc@...dia.com>
> Sent: Monday, September 1, 2025 7:32 AM
>
> PCIe permits a device to ignore ATS invalidation TLPs, while processing a
> reset. This creates a problem visible to the OS where an ATS invalidation
> command will time out. E.g. an SVA domain will have no coordination with a
> reset event and can racily issue ATS invalidations to a resetting device.
>
> The OS should do something to mitigate this as we do not want production
> systems to be reporting critical ATS failures, especially in a hypervisor
> environment. Broadly, OS could arrange to ignore the timeouts, block page
> table mutations to prevent invalidations, or disable and block ATS.
>
> The PCIe spec in sec 10.3.1 IMPLEMENTATION NOTE recommends to disable
> and
> block ATS before initiating a Function Level Reset. It also mentions that
> other reset methods could have the same vulnerability as well.
>
> Provide a callback from the PCI subsystem that will enclose the reset and
> have the iommu core temporarily change all the attached domain to
> BLOCKED.
> After attaching a BLOCKED domain, IOMMU hardware would fence any
> incoming
> ATS queries. And IOMMU drivers should also synchronously stop issuing new
> ATS invalidations and wait for all ATS invalidations to complete. This can
> avoid any ATS invaliation timeouts.
>
> However, if there is a domain attachment/replacement happening during an
> ongoing reset, ATS routines may be re-activated between the two function
> calls. So, introduce a new pending_reset flag in group_device, and reject
> any concurrent attach_dev/set_dev_pasid call during a reset for a concern
> of compatibility failure.
>
> There are two corner cases that won't work:
> 1. Alias devices that share the same RID
> Blocking one device also blocks the other alias devices that might not
> want a reset. Given that it's very rare for an alias device to support
> ATS, simply skip the blocking routine.
it also applies to the devices in the same iommu group. While one device
is being reset, all other devices in the group cannot change the domain.
This needs to be documented in the attach uAPI.
>
> 2. SRIOV devices that its PF is resetting while its VF isn't.
> Both PF and VF should block RID and PASIDs. But, since VF is not aware
> of the reset, it is difficult to block it and reject concurrent attach
> calls, because it's not logically reasonable to reject a VF attachment
> due to a resetting PF unless the VF is resetting too. To address this,
> we won't be able to reject any concurrent attachment as simple as this
> patch does; instead we will need two new compatibility testing ops for
> attach_dev/set_dev_pasid to allowing caching a compatible attach. This
> itself, however, would be a big series. So, for now, skip the blocking
> routine for PF devices, and leave a note.
>
given it impacts uAPI:
- now attach/replace can be done anytime
- with this series attach/replace is rejected when a device is being reset
- later with compat testing ops attach/replace can be done again at
any time
we should be cautious here, especially if this series goes into 6.18 (likely
the next LTS version) the interim behavior change may last long. yes
we discussed that no know usage would want to do attach/replace
while a device is being reset, but I wonder whether we should instead
wait for a full solution to avoid unnecessary uAPI change back-and-forth...
Thanks
Kevin
Powered by blists - more mailing lists