[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251001114742.GV2695987@ziepe.ca>
Date: Wed, 1 Oct 2025 08:47:42 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Samiullah Khawaja <skhawaja@...gle.com>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>,
David Woodhouse <dwmw2@...radead.org>,
Lu Baolu <baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>, iommu@...ts.linux.dev,
YiFei Zhu <zhuyifei@...gle.com>,
Robin Murphy <robin.murphy@....com>,
Pratyush Yadav <pratyush@...nel.org>,
Kevin Tian <kevin.tian@...el.com>, linux-kernel@...r.kernel.org,
Saeed Mahameed <saeedm@...dia.com>,
Adithya Jayachandran <ajayachandra@...dia.com>,
Parav Pandit <parav@...dia.com>,
Leon Romanovsky <leonro@...dia.com>, William Tu <witu@...dia.com>,
Vipin Sharma <vipinsh@...gle.com>, dmatlack@...gle.com,
Chris Li <chrisl@...nel.org>, praan@...gle.com
Subject: Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
On Tue, Sep 30, 2025 at 04:15:43PM -0700, Samiullah Khawaja wrote:
> > The iommu core code should be restoring the iommu_domain as soon as
> > the attached device is plugged in and attaching the preserved domain
> > instead of something else during the device probe sequence
> >
> > This logic should not be in drivers.
> >
> > From there you either put the hwpt back into iommufd and have it free
> > the iommu_domain when it destroys the hwpt
> >
> > Or you have the iommu core code free the iommu_domain at some point
> > after iommufd has replaced the attachment with a new iommu_domain?
>
> But we cannot do the replacement during domain attachment because
> userspace might not have fully prepared the new domain with all the
> required DMA mappings. Replace during LUO finish?
The idea is the kernel will restore the iommu_domain during early boot
in the iommu_core and then attach it. This should "rewrite" the IOMMU
HW context for that device with identical content. Drivers must be
enhanced to support this hitless rewrite (AMD and ARM are already
done).
At this point the kernel is operating normally with a normal domain
and a normal driver, no special luo stuff.
Later iommufd will come along and establish a HWPT that has an
identical translation. Then we replace the luo domain with the new
HWPT and free the luo domain.
> 1. During boot, the IOMMU core sets up a default domain but doesn't
> program the context entries for the preserved device. The hardware
> keeps on using the old preserved tables.
When the iommu driver first starts up it can take over the context
memory from the predecessor kernel. But it has to go through it and
clear out most of the context entries.
Only context entries belonging to devices marked for preservation
should be kept unchanged.
Later we probe the struct device to the iommu and do as I said above
to restore consistency.
> 2. Userspace restores the iommufd, creates a new HWPT/domain and
> populates mappings.
Yes
> 3. On FINISH, the IOMMU core updates the context entries of preserved
> devices to point to the new domain.
No, finish should never do anything on the restore path, IMHO. User
should directly attach the newly created HWPT when it is ready.
> I understand the desire to have the preserved iommu domain be restored
> during boot so the device has a default domain and there is an owner
> of the attached restored domain, but that would prevent the iommfud
> from cooking a clean new domain.
The "default domain" is the "DMA API domain" and it has to be created
and setup always. The change here is instead of attaching the default
domain we attach the luo restored domain at early boot.
This sets the device into an "owned" mode but vfio can still attach
and nothing prevents iommufd from building a new hwpt and attaching
it.
> Maybe we can refine the "Hotswap" model I had in mind. Basically on
> boot the core restores the preserved iommu domain, but core lets
> iommufd attach a new domain with preserved devices without replacing
> the underlying context entries?
Replace the context entries. If everything is working properly the
preserved domain should compute an identical context entry, so no
reason to not just "replace" it which should be a NOP.
> > Also there is an interesting behavior to note that if the iommu driver
> > restores a domain then it will also prevent a non-vfio driver from
> > binding to that device.
>
> Agreed. I think in the "Hotswap" approach I discussed above, if we
> don't restore the domain, the core can just commit the context entries
> of the new default domain if a non-vfio driver is bound to the device.
As I said, the owned nature of the device will prevent attaching a
non-vfio driver in the first place.
So the only path forward for userspace is to attach vfio, and then
iommufd should take over that luo restored iommu_domain and eventually
free it.
You might consider that finish should de-own the device if vfio didn't
claim it. But that is a bit tricky since it needs a FLR before the
domains can be switched around.
Jason
Powered by blists - more mailing lists