[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b8d8bfa-ca6b-4a07-8a4d-a30d8993c7c7@linux.intel.com>
Date: Fri, 15 Aug 2025 13:49:55 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: Nicolin Chen <nicolinc@...dia.com>, robin.murphy@....com,
joro@...tes.org, bhelgaas@...gle.com, jgg@...dia.com
Cc: will@...nel.org, robin.clark@....qualcomm.com, yong.wu@...iatek.com,
matthias.bgg@...il.com, angelogioacchino.delregno@...labora.com,
thierry.reding@...il.com, vdumpa@...dia.com, jonathanh@...dia.com,
rafael@...nel.org, lenb@...nel.org, kevin.tian@...el.com,
yi.l.liu@...el.com, linux-arm-kernel@...ts.infradead.org,
iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
linux-arm-msm@...r.kernel.org, linux-mediatek@...ts.infradead.org,
linux-tegra@...r.kernel.org, linux-acpi@...r.kernel.org,
linux-pci@...r.kernel.org, patches@...ts.linux.dev, pjaroszynski@...dia.com,
vsethi@...dia.com, helgaas@...nel.org, etzhao1900@...il.com
Subject: Re: [PATCH v3 4/5] iommu: Introduce iommu_dev_reset_prepare() and
iommu_dev_reset_done()
On 8/12/25 06:59, Nicolin Chen wrote:
> PCIe permits a device to ignore ATS invalidation TLPs, while processing a
> reset. This creates a problem visible to the OS where an ATS invalidation
> command will time out: e.g. an SVA domain will have no coordination with a
> reset event and can racily issue ATS invalidations to a resetting device.
>
> The OS should do something to mitigate this as we do not want production
> systems to be reporting critical ATS failures, especially in a hypervisor
> environment. Broadly, OS could arrange to ignore the timeouts, block page
> table mutations to prevent invalidations, or disable and block ATS.
>
> The PCIe spec in sec 10.3.1 IMPLEMENTATION NOTE recommends to disable and
> block ATS before initiating a Function Level Reset. It also mentions that
> other reset methods could have the same vulnerability as well.
>
> Provide a callback from the PCI subsystem that will enclose the reset and
> have the iommu core temporarily change all the attached domain to BLOCKED.
> After attaching a BLOCKED domain, IOMMU drivers should fence any incoming
Nit, my understanding is that it's not the "IOMMU drivers" but the
"IOMMU hardware" that fences any further incoming translation requests,
right?
> ATS queries, synchronously stop issuing new ATS invalidations, and wait
> for all ATS invalidations to complete. This can avoid any ATS invaliation
> timeouts.
>
> However, if there is a domain attachment/replacement happening during an
> ongoing reset, ATS routines may be re-activated between the two function
> calls. So, introduce a new pending_reset flag in group_device to defer an
> attachment during a reset, allowing iommu core to cache target domains in
> the SW level while bypassing the driver. The iommu_dev_reset_done() will
> re-attach these soft-attached domains, once the device reset is finished.
>
> Signed-off-by: Nicolin Chen<nicolinc@...dia.com>
The code looks good to me:
Reviewed-by: Lu Baolu <baolu.lu@...ux.intel.com>
Powered by blists - more mailing lists