lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAywjhRKvZBShj7KAXew2v_uGjn3HhvO=sFrZ=bVfMJ8ye-Vyw@mail.gmail.com>
Date: Wed, 1 Oct 2025 18:00:58 -0700
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>, David Woodhouse <dwmw2@...radead.org>, 
	Lu Baolu <baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>, 
	Will Deacon <will@...nel.org>, iommu@...ts.linux.dev, YiFei Zhu <zhuyifei@...gle.com>, 
	Robin Murphy <robin.murphy@....com>, Pratyush Yadav <pratyush@...nel.org>, 
	Kevin Tian <kevin.tian@...el.com>, linux-kernel@...r.kernel.org, 
	Saeed Mahameed <saeedm@...dia.com>, Adithya Jayachandran <ajayachandra@...dia.com>, 
	Parav Pandit <parav@...dia.com>, Leon Romanovsky <leonro@...dia.com>, William Tu <witu@...dia.com>, 
	Vipin Sharma <vipinsh@...gle.com>, dmatlack@...gle.com, Chris Li <chrisl@...nel.org>, 
	praan@...gle.com
Subject: Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update

On Wed, Oct 1, 2025 at 4:47 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
>
> On Tue, Sep 30, 2025 at 04:15:43PM -0700, Samiullah Khawaja wrote:
>
> > > The iommu core code should be restoring the iommu_domain as soon as
> > > the attached device is plugged in and attaching the preserved domain
> > > instead of something else during the device probe sequence
> > >
> > > This logic should not be in drivers.
> > >
> > > From there you either put the hwpt back into iommufd and have it free
> > > the iommu_domain when it destroys the hwpt
> > >
> > > Or you have the iommu core code free the iommu_domain at some point
> > > after iommufd has replaced the attachment with a new iommu_domain?
> >
> > But we cannot do the replacement during domain attachment because
> > userspace might not have fully prepared the new domain with all the
> > required DMA mappings. Replace during LUO finish?
>
> The idea is the kernel will restore the iommu_domain during early boot
> in the iommu_core and then attach it. This should "rewrite" the IOMMU
> HW context for that device with identical content. Drivers must be
> enhanced to support this hitless rewrite (AMD and ARM are already
> done).
>
> At this point the kernel is operating normally with a normal domain
> and a normal driver, no special luo stuff.
>
> Later iommufd will come along and establish a HWPT that has an
> identical translation. Then we replace the luo domain with the new
> HWPT and free the luo domain.
>
> > 1. During boot, the IOMMU core sets up a default domain but doesn't
> > program the context entries for the preserved device. The hardware
> > keeps on using the old preserved tables.
>
> When the iommu driver first starts up it can take over the context
> memory from the predecessor kernel. But it has to go through it and
> clear out most of the context entries.
>
> Only context entries belonging to devices marked for preservation
> should be kept unchanged.

Agreed. We have to sanitize these and remove unused entries. I think
the same goes for any PASID tables.
>
> Later we probe the struct device to the iommu and do as I said above
> to restore consistency.
>
> > 2. Userspace restores the iommufd, creates a new HWPT/domain and
> > populates mappings.
>
> Yes
>
> > 3. On FINISH, the IOMMU core updates the context entries of preserved
> > devices to point to the new domain.
>
> No, finish should never do anything on the restore path, IMHO. User
> should directly attach the newly created HWPT when it is ready.

Makes sense. But if the user never replaces the restored iommu_domain
with a new HWPT, we will have to discard the old (restored) domain on
finish since it doesn't have any associated HWPT. I see you already
hinted at this below. This needs to be handled carefully considering
the vfio cdev FD state also. Discussed further below.
>
> > I understand the desire to have the preserved iommu domain be restored
> > during boot so the device has a default domain and there is an owner
> > of the attached restored domain, but that would prevent the iommfud
> > from cooking a clean new domain.
>
> The "default domain" is the "DMA API domain" and it has to be created
> and setup always. The change here is instead of attaching the default
> domain we attach the luo restored domain at early boot.

Oh... I meant the group->domain instead of group->default_domain.
Should have written active domain instead of default domain.
>
> This sets the device into an "owned" mode but vfio can still attach
> and nothing prevents iommufd from building a new hwpt and attaching
> it.

This is the part that I was concerned about since I was looking into
the auto_domain. Users that attach to ioas directly and use
auto_domain would not be able to restore the mappings before attaching
to the device. But users that use HWPT directly should be able to
prepare a new domain and hotswap when ready. But I think a new
interface can be built to support IOAS only use cases also. We can
revisit this later.
>
> > Maybe we can refine the "Hotswap" model I had in mind. Basically on
> > boot the core restores the preserved iommu domain, but core lets
> > iommufd attach a new domain with preserved devices without replacing
> > the underlying context entries?
>
> Replace the context entries. If everything is working properly the
> preserved domain should compute an identical context entry, so no
> reason to not just "replace" it which should be a NOP.
>
> > > Also there is an interesting behavior to note that if the iommu driver
> > > restores a domain then it will also prevent a non-vfio driver from
> > > binding to that device.
> >
> > Agreed. I think in the "Hotswap" approach I discussed above, if we
> > don't restore the domain, the core can just commit the context entries
> > of the new default domain if a non-vfio driver is bound to the device.
>
> As I said, the owned nature of the device will prevent attaching a
> non-vfio driver in the first place.
>
> So the only path forward for userspace is to attach vfio, and then
> iommufd should take over that luo restored iommu_domain and eventually
> free it.
>
> You might consider that finish should de-own the device if vfio didn't
> claim it. But that is a bit tricky since it needs a FLR before the
> domains can be switched around.

That's a good point. But it might be tricky since the ownership of the
device is with the vfio cdev FD. So if vfio cdev FD is never
restored/reclaimed the device can be FLR'd. iommufd will follow along
and discard the domain.

The more interesting case might be where cdev is restored and bound to
iommufd but the user never recreates and hotswaps a new HWPT. In this
case we can discard the restored iommu_domain and replace it with the
blocking domain as it should have been if the device was not
preserved.
>
> Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ