[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAywjhQsyU4j4wfOqVuLxaVu0FsxuqCCjhJHuk4vcYAo3Cqh_g@mail.gmail.com>
Date: Thu, 2 Oct 2025 11:08:14 -0700
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>, David Woodhouse <dwmw2@...radead.org>,
Lu Baolu <baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>, iommu@...ts.linux.dev, YiFei Zhu <zhuyifei@...gle.com>,
Robin Murphy <robin.murphy@....com>, Pratyush Yadav <pratyush@...nel.org>,
Kevin Tian <kevin.tian@...el.com>, linux-kernel@...r.kernel.org,
Saeed Mahameed <saeedm@...dia.com>, Adithya Jayachandran <ajayachandra@...dia.com>,
Parav Pandit <parav@...dia.com>, Leon Romanovsky <leonro@...dia.com>, William Tu <witu@...dia.com>,
Vipin Sharma <vipinsh@...gle.com>, dmatlack@...gle.com, Chris Li <chrisl@...nel.org>,
praan@...gle.com
Subject: Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
On Thu, Oct 2, 2025 at 10:37 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
>
> On Thu, Oct 02, 2025 at 10:03:05AM -0700, Samiullah Khawaja wrote:
> > > I think the simplest thing is the domain exists forever until
> > > userspace attaches an iommufd, takes ownership of it and frees it.
> > > Nothing to do with finish.
> >
> > Hmm.. I think this is tricky. There needs to be a way to clean up and
> > discard the old state if the userspace doesn't need it.
>
> Why?
>
> Isn't "userspace doesn't need it" some extermely weird unused corner
> case?
>
> This should not be automatic or divorced from userspace, if the
> operator would like to switch something out of LUO then they should
> have userspace that co-ordinates this. Receive the iommufd, close it,
> install a normal kernel driver.
>
> Why make special code in the kernel to sequence this automatically?
>
> > session manager (VMM or LUOD) decides that the finish needs to happen
> > and the iommufd (or the underlying HWPTs) are not restored, it means
> > that LUOD has decided that the VM is not going to come up and the
> > preserved state and resources (domain, device, memory) need to be
> > freed/released.
>
> I've been assuming if luo fails so catastrophically the whole node
> would reboot to recover.
>
> Is there really a case where you might say a kexec happens and a
> single VM out of many doesn't survive? Seems weird..
>
> So to repeat above, if this is something people want then the
> userspace should complete luo restoring the failed vm and then turn
> around and free up all the resources. Why should the kernel
> automatically do the same operations?
>
> Maybe userspace needs some contingency flow where there is a dedicated
> reaper program for a luo session. The VMM crashes during restore, OK,
> we pass the luo FD to a reaper and it cleans up the objects in the
> session and closes it.
These are all great points. I agree, it makes sense. It keeps the
FINISH lightweight and makes the domain ownership model very clean. I
will further discuss the memfd dependency scenario in the other
thread.
>
> > > Maybe the HWPT has to be auto-created inside the iommufd as soon as it
> > > is attached. The "restore" ioctl would just return back the ID of this
> > > already created HWPT.
> >
> > Once we return the ID, do we make this HWPT mutable? Or is this
> > re-created HWPT just a handle to keep the domain ownership?
>
> That's a bigger question..
>
> For starting I was imagining that the restored iommu_domain was
> immutable, eg it does not have map and unmap operations. It never
> becomes mutable.
>
> As I outlined this special luo immutable domain is then attached
> during early boot, which sould be a NOP, and gets turned into a HWPT
> during iommufd restoration. The only thing userspace should be able to
> do with that HWPT handle is destroy it after replacing it.
Okay, this is great. An immutable HWPT associated with the restored
iommu_domain confirms my intuition that this is just a handle to the
underlying domain. The user can destroy it when it is replaced, or
when iommufd is closed without HWPT replacement.
>
> Jason
Powered by blists - more mailing lists