[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260204132031.GF3931454@nvidia.com>
Date: Wed, 4 Feb 2026 09:20:31 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Robin Murphy <robin.murphy@....com>
Cc: Nicolin Chen <nicolinc@...dia.com>, dan.j.williams@...el.com,
"Tian, Kevin" <kevin.tian@...el.com>,
Jonathan Cameron <jonathan.cameron@...wei.com>,
"will@...nel.org" <will@...nel.org>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"joro@...tes.org" <joro@...tes.org>,
"praan@...gle.com" <praan@...gle.com>,
"baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
"miko.lenczewski@....com" <miko.lenczewski@....com>,
"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>
Subject: Re: [PATCH RFCv1 1/3] PCI: Allow ATS to be always on for CXL.cache
capable devices
On Wed, Feb 04, 2026 at 12:18:15PM +0000, Robin Murphy wrote:
> > I strongly suspect the answer is that RMR has to be ignored in this
> > more secure mode.
>
> Yes, I think the only valid case for having an RMR and expecting it to work
> in combination with these other things is if the device has some firmware or
> preloaded configuration in memory which it will still need to access at that
> address once an OS driver starts using it, but does not need to access
> *during* the boot-time handover.
Splash screens are the most obvious case here where the framebuffer
may be in DMA'able memory and must go through the iommu..
At least we are already shipping products where the GPU has DRAM based
framebuffer, the GPU requires ATS for alot of functions, but the
framebuffer scan out does not use ATS.
Sigh. So that will be exciting to make work at some point.
> Thus it seems fair to still honour the
> reserved regions upon attaching to a default domain, but not worry too much
> about being in a transient blocking state in the interim if it's unavoidable
> for other reasons (at worst maybe just log a warning that we're
> doing so).
The interest in the blocking state was to disable ATS.
Maybe another approach would be to have a "RMR blocking" domain which is a
paging domain that tells the driver explicitly not to enable ATS for
it.
Then we could validate the RMR range is OK and install this special
domain and still have security against translated TLPs..
> > > However I think there would be no point exposing the ATS details to
> > > the VM to begin with. It's the host's decision to trust the device
> > > to play in the translated PA space and system cache coherency
> > > protocol, and no guest would be allowed to mess with those aspects
> > > either way, so there seems no obvious good reason for them to know
> > > at all.
> >
> > If the vSMMU is presented then the guest must be aware of the ATS
> > because only the guest can generate the ATC invalidations for changes
> > in the S1.
>
> Only if you assume DVM or some other mechanism for the guest to issue S1
> invalidations directly to the hardware - with an emulated CMDQ we can do
> whatever we like.
With alot of work yes, but that is not the model that is implemented
today.
If the hypervisor has to generate a ATC invalidation from an IOTLB
invalidation then it also needs a map of ASID to RID&PASID, which it
can only build by inspecting all the CD tables. The VMMs in nesting
mode don't read the CD tables at all today, so they don't implement
this option.
> And in fact, I think we actually *have* to if the host has enabled ATS
> itself, since we cannot assume that a guest is going to choose to use it,
> thus we cannot rely on the guest issuing ATCIs in order to get the correct
> behaviour it expects unless and until we've seen it set EATS appropriately
> in all the corresponding vSTEs.
Due to the above we've done the reverse, the host does not get to
unilaterally decide ATS policy, it follows the guest's vEATS setting
so that we never have a situation where the hypervisor has to generate
ATC invalidations.
The kernel offers the VMM the freedom to do it either way, but today
all the VMMs I'm aware of choose the above path.
Jason
Powered by blists - more mailing lists