[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bAudSHq2t5NZPBKDC2wfzsF6SSxTF7aZ2kxueOTzWYcfg@mail.gmail.com>
Date: Thu, 2 Oct 2025 10:43:45 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Samiullah Khawaja <skhawaja@...gle.com>, David Woodhouse <dwmw2@...radead.org>,
Lu Baolu <baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>, iommu@...ts.linux.dev, YiFei Zhu <zhuyifei@...gle.com>,
Robin Murphy <robin.murphy@....com>, Pratyush Yadav <pratyush@...nel.org>,
Kevin Tian <kevin.tian@...el.com>, linux-kernel@...r.kernel.org,
Saeed Mahameed <saeedm@...dia.com>, Adithya Jayachandran <ajayachandra@...dia.com>,
Parav Pandit <parav@...dia.com>, Leon Romanovsky <leonro@...dia.com>, William Tu <witu@...dia.com>,
Vipin Sharma <vipinsh@...gle.com>, dmatlack@...gle.com, Chris Li <chrisl@...nel.org>,
praan@...gle.com
Subject: Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
On Thu, Oct 2, 2025 at 7:57 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
>
> On Wed, Oct 01, 2025 at 03:28:56PM -0400, Pasha Tatashin wrote:
> > > > 3. On FINISH, the IOMMU core updates the context entries of preserved
> > > > devices to point to the new domain.
> > >
> > > No, finish should never do anything on the restore path, IMHO. User
> > > should directly attach the newly created HWPT when it is ready.
> >
> > But, finish is our indicator that a particular session (VM) is out of
> > blackout, and now we are free to do slow things, such as
> > re-allocating/recreating page tables. Why start it before a VM is out
> > of blackout?
>
> Things should be paired.. The suspend side is
>
> start luo - "brown out" - kernel does basically nothing as the luo is empty
> add all sorts of things to sessions
> finish - kernel does last minute things
>
> While the resume is the symmetric opposite:
>
> kexec boot - kernel restores the critical stuff it needs to boot to
> userspace
> userspace does all sorts of stuff and gets things out of the sessions
> finish - luo should be empty now as everything was taken out by
> userspace
I see, so you are proposing that finish() is basically a no-op for
IOMMU as long as everything was properly reclaimed by userspace.
> I think when things come out of luo they should be fully operational
> immediately.
I agree. Once we are in "normal" mode, we should be done with all
live-update specifics. In this state, the kernel must be fully
operational without limitations or pending background work that could
reduce VM performance. Also, any session was not reclaimed before
finish(), it and all resources associated with it should be terminated
during finish.
> Finish on resume shouldn't indicate anything specific beyond the luo
> should be empty and everything should have been restored. It isn't
> like finish on pre-kexec.
>
> Userspace decides how it sequences things and what steps it takes
> before ending blackout and resuming the VM.
This is a fair statement: userspace knows when vCPUs are resumed and
can decide when to do the HWPT swap. Following that logic, what if we
provide a specific ioctl() to perform the swap? Userspace could then
call that ioctl() prior to finish(), and during the finish() callback,
we would only need to do a quick sanity check that everything is in
order (i.e., resources were retrieved and the HWPTs were swapped).
What do we do if the user reclaimed iommufd but did not swap HWPT or
did not perform some other ioctl() before finish(), simply print a
kernel warnings and let it be, or force swapping during finish before
going into normal mode?
Pasha
Powered by blists - more mailing lists