[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAywjhRQONuHsxTGQZ5R=EJbOHUD+xOF_CYjkNRbUyCQkORwig@mail.gmail.com>
Date: Tue, 30 Sep 2025 16:15:43 -0700
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>, David Woodhouse <dwmw2@...radead.org>,
Lu Baolu <baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>, iommu@...ts.linux.dev, YiFei Zhu <zhuyifei@...gle.com>,
Robin Murphy <robin.murphy@....com>, Pratyush Yadav <pratyush@...nel.org>,
Kevin Tian <kevin.tian@...el.com>, linux-kernel@...r.kernel.org,
Saeed Mahameed <saeedm@...dia.com>, Adithya Jayachandran <ajayachandra@...dia.com>,
Parav Pandit <parav@...dia.com>, Leon Romanovsky <leonro@...dia.com>, William Tu <witu@...dia.com>,
Vipin Sharma <vipinsh@...gle.com>, dmatlack@...gle.com, Chris Li <chrisl@...nel.org>,
praan@...gle.com
Subject: Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
On Tue, Sep 30, 2025 at 2:05 PM Jason Gunthorpe <jgg@...pe.ca> wrote:
>
> On Tue, Sep 30, 2025 at 01:02:31PM -0700, Samiullah Khawaja wrote:
> > > There are HWPTs outside the IOAS so it is inconsisent.
> >
> > This makes sense. But if I understand correctly a HWPT should be
> > associated one way or another to a preserved device or IOAS. Also the
> > nested ones will have parent HWPT. Can we not look at the dependencies
> > here and find the HWPTs that need to preserved.
>
> Maybe in some capacity, but I would say more of don't allow preserving
> things that depend on things not already preserved somehow.
I agree. I think this makes sense. Users can explicitly indicate that
they want to preserve HWPTs and iommufd can enforce the dependencies.
>
> > > Finally we expect to discard the preserved HWPTs and replace them we
> > > rebuilt ones at least as a first step. Userspace needs to sequence all
> > > of this..
> >
> > But if we discard the old HWPTs and replace them with the new ones, we
> > shouldn't need labeling of the old HWPTs? We would definitely need to
> > sequence the replacement and discard of the old ones, but that can
> > also be inferred through the dependencies between the new HWPTs?
>
> It depends how this ends up being designed and who is responsible to
> free the restored iommu_domain.
Agreed. I think it depends on how much is restored from the previous
kernel. Discussed further below inline.
>
> The iommu core code should be restoring the iommu_domain as soon as
> the attached device is plugged in and attaching the preserved domain
> instead of something else during the device probe sequence
>
> This logic should not be in drivers.
>
> From there you either put the hwpt back into iommufd and have it free
> the iommu_domain when it destroys the hwpt
>
> Or you have the iommu core code free the iommu_domain at some point
> after iommufd has replaced the attachment with a new iommu_domain?
But we cannot do the replacement during domain attachment because
userspace might not have fully prepared the new domain with all the
required DMA mappings. Replace during LUO finish?
This is actually very close to what I had in mind for the "Hotswap"
model. My thought was:
1. During boot, the IOMMU core sets up a default domain but doesn't
program the context entries for the preserved device. The hardware
keeps on using the old preserved tables.
2. Userspace restores the iommufd, creates a new HWPT/domain and
populates mappings.
3. On FINISH, the IOMMU core updates the context entries of preserved
devices to point to the new domain.
I have a sequence diagram for this in the cover letter also.
I understand the desire to have the preserved iommu domain be restored
during boot so the device has a default domain and there is an owner
of the attached restored domain, but that would prevent the iommfud
from cooking a clean new domain.
Maybe we can refine the "Hotswap" model I had in mind. Basically on
boot the core restores the preserved iommu domain, but core lets
iommufd attach a new domain with preserved devices without replacing
the underlying context entries? The core replaces the context entries
when the iommufd indicates that the domain is fully prepared (during
luo finish).
>
> I'm not sure which is a better option..
>
> Also there is an interesting behavior to note that if the iommu driver
> restores a domain then it will also prevent a non-vfio driver from
> binding to that device.
Agreed. I think in the "Hotswap" approach I discussed above, if we
don't restore the domain, the core can just commit the context entries
of the new default domain if a non-vfio driver is bound to the device.
>
> Jason
Powered by blists - more mailing lists