[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260107204607.GE340082@ziepe.ca>
Date: Wed, 7 Jan 2026 16:46:07 -0400
From: Jason Gunthorpe <jgg@...pe.ca>
To: Samiullah Khawaja <skhawaja@...gle.com>
Cc: David Woodhouse <dwmw2@...radead.org>,
Lu Baolu <baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>,
Pasha Tatashin <pasha.tatashin@...een.com>,
David Matlack <dmatlack@...gle.com>,
Robin Murphy <robin.murphy@....com>,
Pratyush Yadav <pratyush@...nel.org>,
Kevin Tian <kevin.tian@...el.com>,
Alex Williamson <alex@...zbot.org>, Shuah Khan <shuah@...nel.org>,
iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, Saeed Mahameed <saeedm@...dia.com>,
Adithya Jayachandran <ajayachandra@...dia.com>,
Parav Pandit <parav@...dia.com>,
Leon Romanovsky <leonro@...dia.com>, William Tu <witu@...dia.com>
Subject: Re: [PATCH 0/3] iommu/vt-d: Add support to hitless replace IOMMU
domain
On Wed, Jan 07, 2026 at 04:28:12PM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 07, 2026 at 08:17:57PM +0000, Samiullah Khawaja wrote:
> > Intel IOMMU Driver already supports replacing IOMMU domain hitlessly in
> > scalable mode.
>
> It does? We were just talking about how it doesn't work because it
> makes the PASID entry non-present while loading the new domain.
If you tried your tests in scalable mode they are probably only
working because the HW is holding the entry in cache while the CPU is
completely mangling it:
int intel_pasid_replace_first_level(struct intel_iommu *iommu,
struct device *dev, phys_addr_t fsptptr,
u32 pasid, u16 did, u16 old_did,
int flags)
{
[..]
*pte = new_pte;
That just doesn't work for "replace", it isn't hitless unless the
entry stays in the cache. Since your test effectively will hold the
context entry in the cache while testing for "hitless" it doesn't
really test if it is really working without races..
All of this needs to be reworked to always use the stack to build the
entry, like the replace path does, and have a ARM-like algorithm to
update the live memory in just the right order to guarentee the HW
does not see a corrupted entry.
It is a little bit tricky, but it should start with reworking
everything to consistently use the stack to create the new entry and
calling a centralized function to set the new entry to the live
memory. This replace/not replace split should be purged completely.
Some discussion is here
https://lore.kernel.org/all/20260106142301.GS125261@ziepe.ca/
It also needs to be very careful that the invalidation is doing both
the old and new context entry concurrently while it is being replaced.
For instance the placement of cache_tag_assign_domain() looks wrong to
me, it can't be *after* the HW has been programmed to use the new tags
:\
I also didn't note where the currently active cache_tag is removed
from the linked list during attach, is that another bug?
In short, this needs alot of work to actually properly implement
hitless replace the way ARM can. Fortunately I think it is mostly
mechanical and should be fairly straightfoward. Refer to the ARM
driver and try to structure vtd to have the same essential flow..
Jason
Powered by blists - more mailing lists