[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251002173715.GH3195829@ziepe.ca>
Date: Thu, 2 Oct 2025 14:37:15 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Samiullah Khawaja <skhawaja@...gle.com>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>,
David Woodhouse <dwmw2@...radead.org>,
Lu Baolu <baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>, iommu@...ts.linux.dev,
YiFei Zhu <zhuyifei@...gle.com>,
Robin Murphy <robin.murphy@....com>,
Pratyush Yadav <pratyush@...nel.org>,
Kevin Tian <kevin.tian@...el.com>, linux-kernel@...r.kernel.org,
Saeed Mahameed <saeedm@...dia.com>,
Adithya Jayachandran <ajayachandra@...dia.com>,
Parav Pandit <parav@...dia.com>,
Leon Romanovsky <leonro@...dia.com>, William Tu <witu@...dia.com>,
Vipin Sharma <vipinsh@...gle.com>, dmatlack@...gle.com,
Chris Li <chrisl@...nel.org>, praan@...gle.com
Subject: Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
On Thu, Oct 02, 2025 at 10:03:05AM -0700, Samiullah Khawaja wrote:
> > I think the simplest thing is the domain exists forever until
> > userspace attaches an iommufd, takes ownership of it and frees it.
> > Nothing to do with finish.
>
> Hmm.. I think this is tricky. There needs to be a way to clean up and
> discard the old state if the userspace doesn't need it.
Why?
Isn't "userspace doesn't need it" some extermely weird unused corner
case?
This should not be automatic or divorced from userspace, if the
operator would like to switch something out of LUO then they should
have userspace that co-ordinates this. Receive the iommufd, close it,
install a normal kernel driver.
Why make special code in the kernel to sequence this automatically?
> session manager (VMM or LUOD) decides that the finish needs to happen
> and the iommufd (or the underlying HWPTs) are not restored, it means
> that LUOD has decided that the VM is not going to come up and the
> preserved state and resources (domain, device, memory) need to be
> freed/released.
I've been assuming if luo fails so catastrophically the whole node
would reboot to recover.
Is there really a case where you might say a kexec happens and a
single VM out of many doesn't survive? Seems weird..
So to repeat above, if this is something people want then the
userspace should complete luo restoring the failed vm and then turn
around and free up all the resources. Why should the kernel
automatically do the same operations?
Maybe userspace needs some contingency flow where there is a dedicated
reaper program for a luo session. The VMM crashes during restore, OK,
we pass the luo FD to a reaper and it cleans up the objects in the
session and closes it.
> > Maybe the HWPT has to be auto-created inside the iommufd as soon as it
> > is attached. The "restore" ioctl would just return back the ID of this
> > already created HWPT.
>
> Once we return the ID, do we make this HWPT mutable? Or is this
> re-created HWPT just a handle to keep the domain ownership?
That's a bigger question..
For starting I was imagining that the restored iommu_domain was
immutable, eg it does not have map and unmap operations. It never
becomes mutable.
As I outlined this special luo immutable domain is then attached
during early boot, which sould be a NOP, and gets turned into a HWPT
during iommufd restoration. The only thing userspace should be able to
do with that HWPT handle is destroy it after replacing it.
Jason
Powered by blists - more mailing lists