lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAywjhQrAWPjb8YtO=+G+pfJpW7p-rwrj03zB8ZqdhB0wtsO0w@mail.gmail.com>
Date: Thu, 2 Oct 2025 10:03:05 -0700
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>, David Woodhouse <dwmw2@...radead.org>, 
	Lu Baolu <baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>, 
	Will Deacon <will@...nel.org>, iommu@...ts.linux.dev, YiFei Zhu <zhuyifei@...gle.com>, 
	Robin Murphy <robin.murphy@....com>, Pratyush Yadav <pratyush@...nel.org>, 
	Kevin Tian <kevin.tian@...el.com>, linux-kernel@...r.kernel.org, 
	Saeed Mahameed <saeedm@...dia.com>, Adithya Jayachandran <ajayachandra@...dia.com>, 
	Parav Pandit <parav@...dia.com>, Leon Romanovsky <leonro@...dia.com>, William Tu <witu@...dia.com>, 
	Vipin Sharma <vipinsh@...gle.com>, dmatlack@...gle.com, Chris Li <chrisl@...nel.org>, 
	praan@...gle.com
Subject: Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update

On Thu, Oct 2, 2025 at 6:41 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
>
> On Wed, Oct 01, 2025 at 06:00:58PM -0700, Samiullah Khawaja wrote:
> > > No, finish should never do anything on the restore path, IMHO. User
> > > should directly attach the newly created HWPT when it is ready.
> >
> > Makes sense. But if the user never replaces the restored iommu_domain
> > with a new HWPT, we will have to discard the old (restored) domain on
> > finish since it doesn't have any associated HWPT. I see you already
> > hinted at this below. This needs to be handled carefully considering
> > the vfio cdev FD state also. Discussed further below.
>
> I think the simplest thing is the domain exists forever until
> userspace attaches an iommufd, takes ownership of it and frees it.
> Nothing to do with finish.

Hmm.. I think this is tricky. There needs to be a way to clean up and
discard the old state if the userspace doesn't need it. And I think
the LUO (session) FINISH event is that trigger. Basically if the LUO
session manager (VMM or LUOD) decides that the finish needs to happen
and the iommufd (or the underlying HWPTs) are not restored, it means
that LUOD has decided that the VM is not going to come up and the
preserved state and resources (domain, device, memory) need to be
freed/released. If we don't do this in "FINISH" then the system will
be in a stuck state and the VM scheduler cannot schedule another VM
using the same device and resources.
>
> While the domain is attached iommu_device_use_default_domain() will
> fail.

Yes this makes sense.
>
> > This is the part that I was concerned about since I was looking into
> > the auto_domain. Users that attach to ioas directly and use
> > auto_domain would not be able to restore the mappings before attaching
> > to the device.
>
> IMHO luo users need to be sophisticated enough to avoid auto_domain.

Agreed.
>
> > That's a good point. But it might be tricky since the ownership of the
> > device is with the vfio cdev FD. So if vfio cdev FD is never
> > restored/reclaimed the device can be FLR'd. iommufd will follow along
> > and discard the domain.
>
> Honestly, I keep wanting things to be kept as simple as possible with
> as few exception flows as necessary.
>
> If we make it so that iommu_device_claim_dma_owner() is aware of luo
> and the only way vfio can get ownership is if it is also restoring the
> luo session then that sounds perfect.
>
> Attaching a non-luo VFIO would be blocked by the kernel so we never
> get these inconsistencies.
>
> > The more interesting case might be where cdev is restored and bound to
> > iommufd but the user never recreates and hotswaps a new HWPT. In this
> > case we can discard the restored iommu_domain and replace it with the
> > blocking domain as it should have been if the device was not
> > preserved.
>
> Maybe the HWPT has to be auto-created inside the iommufd as soon as it
> is attached. The "restore" ioctl would just return back the ID of this
> already created HWPT.

Once we return the ID, do we make this HWPT mutable? Or is this
re-created HWPT just a handle to keep the domain ownership?

I think if we make it mutable, this will really complicate the design
and we will get into the sanity checking about attach/detach and
map/unmap calls on this HWPT. I think keeping the restored domain
attached to the preserved device until it is hotswapped with a new
HWPT is cleaner and simpler as you desire it to be.

I think if we consider FINISH a point where everything is supposed to
be reclaimed or discarded then this problem is solved. This should
also allow LUOD to cleanup the resources and create new VMs using the
same device and resources. I see you suggested in the other thread
with Pasha that we can make FINISH fail if things are not reclaimed, I
think that also means that the system would be stuck in this state
indefinitely. Maybe this is correct since the domain is owned by VFIO
and needs to be released by it.

>
> Again, this seems to avoid special cases as once we exit the special
> luo mode of iommu_device_claim_dma_owner() iommufd is always
> responsible for the iommu_domain.
>
> Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ