[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAywjhTbBx+rYGpPGtTw_--9XhoYZBX8ase5ddM6rxmC5J-2JQ@mail.gmail.com>
Date: Wed, 29 Oct 2025 15:00:20 -0700
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: Pratyush Yadav <pratyush@...nel.org>, jasonmiu@...gle.com, graf@...zon.com, 
	changyuanl@...gle.com, rppt@...nel.org, dmatlack@...gle.com, 
	rientjes@...gle.com, corbet@....net, rdunlap@...radead.org, 
	ilpo.jarvinen@...ux.intel.com, kanie@...ux.alibaba.com, ojeda@...nel.org, 
	aliceryhl@...gle.com, masahiroy@...nel.org, akpm@...ux-foundation.org, 
	tj@...nel.org, yoann.congal@...le.fr, mmaurer@...gle.com, 
	roman.gushchin@...ux.dev, chenridong@...wei.com, axboe@...nel.dk, 
	mark.rutland@....com, jannh@...gle.com, vincent.guittot@...aro.org, 
	hannes@...xchg.org, dan.j.williams@...el.com, david@...hat.com, 
	joel.granados@...nel.org, rostedt@...dmis.org, anna.schumaker@...cle.com, 
	song@...nel.org, zhangguopeng@...inos.cn, linux@...ssschuh.net, 
	linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, linux-mm@...ck.org, 
	gregkh@...uxfoundation.org, tglx@...utronix.de, mingo@...hat.com, 
	bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, 
	rafael@...nel.org, dakr@...nel.org, bartosz.golaszewski@...aro.org, 
	cw00.choi@...sung.com, myungjoo.ham@...sung.com, yesanishhere@...il.com, 
	Jonathan.Cameron@...wei.com, quic_zijuhu@...cinc.com, 
	aleksander.lobakin@...el.com, ira.weiny@...el.com, 
	andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de, 
	bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com, 
	stuart.w.hayes@...il.com, lennart@...ttering.net, brauner@...nel.org, 
	linux-api@...r.kernel.org, linux-fsdevel@...r.kernel.org, saeedm@...dia.com, 
	ajayachandra@...dia.com, jgg@...dia.com, parav@...dia.com, leonro@...dia.com, 
	witu@...dia.com, hughd@...gle.com, chrisl@...nel.org, 
	steven.sistare@...cle.com
Subject: Re: [PATCH v4 14/30] liveupdate: luo_session: Add ioctls for file
 preservation and state management
On Wed, Oct 29, 2025 at 1:13 PM Pasha Tatashin
<pasha.tatashin@...een.com> wrote:
>
> On Wed, Oct 29, 2025 at 3:07 PM Pratyush Yadav <pratyush@...nel.org> wrote:
> >
> > Hi Pasha,
> >
> > On Mon, Sep 29 2025, Pasha Tatashin wrote:
> >
> > > Introducing the userspace interface and internal logic required to
> > > manage the lifecycle of file descriptors within a session. Previously, a
> > > session was merely a container; this change makes it a functional
> > > management unit.
> > >
> > > The following capabilities are added:
> > >
> > > A new set of ioctl commands are added, which operate on the file
> > > descriptor returned by CREATE_SESSION. This allows userspace to:
> > > - LIVEUPDATE_SESSION_PRESERVE_FD: Add a file descriptor to a session
> > >   to be preserved across the live update.
> > > - LIVEUPDATE_SESSION_UNPRESERVE_FD: Remove a previously added file
> > >   descriptor from the session.
> > > - LIVEUPDATE_SESSION_RESTORE_FD: Retrieve a preserved file in the
> > >   new kernel using its unique token.
> > >
> > > A state machine for each individual session, distinct from the global
> > > LUO state. This enables more granular control, allowing userspace to
> > > prepare or freeze specific sessions independently. This is managed via:
> > > - LIVEUPDATE_SESSION_SET_EVENT: An ioctl to send PREPARE, FREEZE,
> > >   CANCEL, or FINISH events to a single session.
> > > - LIVEUPDATE_SESSION_GET_STATE: An ioctl to query the current state
> > >   of a single session.
> > >
> > > The global subsystem callbacks (luo_session_prepare, luo_session_freeze)
> > > are updated to iterate through all existing sessions. They now trigger
> > > the appropriate per-session state transitions for any sessions that
> > > haven't already been transitioned individually by userspace.
> > >
> > > The session's .release handler is enhanced to be state-aware. When a
> > > session's file descriptor is closed, it now correctly cancels or
> > > finishes the session based on its current state before freeing all
> > > associated file resources, preventing resource leaks.
> > >
> > > Signed-off-by: Pasha Tatashin <pasha.tatashin@...een.com>
> > [...]
> > > +/**
> > > + * struct liveupdate_session_get_state - ioctl(LIVEUPDATE_SESSION_GET_STATE)
> > > + * @size:     Input; sizeof(struct liveupdate_session_get_state)
> > > + * @incoming: Input; If 1, query the state of a restored file from the incoming
> > > + *            (previous kernel's) set. If 0, query a file being prepared for
> > > + *            preservation in the current set.
> >
> > Spotted this when working on updating my test suite for LUO. This seems
> > to be a leftover from a previous version. I don't see it being used
> > anywhere in the code.
>
> thank you will remove this.
>
> > Also, I think the model we should have is to only allow new sessions in
> > normal state. Currently luo_session_create() allows creating a new
> > session in updated state. This would end up mixing sessions from a
> > previous boot and sessions from current boot. I don't really see a
> > reason for that and I think the userspace should first call finish
> > before starting new serialization. Keeps things simpler.
>
> It does. However, yesterday Jason Gunthorpe suggested that we simplify
> the uapi, at least for the initial landing, by removing the state
> machine during boot and allowing new sessions to be created at any
> time. This would also mean separating the incoming and outgoing
> sessions and removing the ioctl() call used to bring the machine into
> a normal state; instead, only individual sessions could be brought
> into a 'normal' state.
>
> Simplified uAPI Proposal
> The simplest uAPI would look like this:
> IOCTLs on /dev/liveupdate (to create and retrieve session FDs):
> LIVEUPDATE_IOCTL_CREATE_SESSION
> LIVEUPDATE_IOCTL_RETRIEVE_SESSION
>
> IOCTLs on session FDs:
> LIVEUPDATE_CMD_SESSION_PRESERVE_FD
> LIVEUPDATE_CMD_SESSION_RETRIEVE_FD
> LIVEUPDATE_CMD_SESSION_FINISH
>
> Happy Path
> The happy path would look like this:
> - luod creates a session with a specific name and passes it to the vmm.
> - The vmm preserves FDs in a specific order: memfd, iommufd, vfiofd.
> (If the order is wrong, the preserve callbacks will fail.)
> - A reboot(KEXEC) is performed.
> - Each session receives a freeze() callback to notify it that
> mutations are no longer possible.
> - During boot, liveupdate_fh_global_state_get(&h, &obj) can be used to
> retrieve the global state.
> - Once the machine has booted, luod retrieves the incoming sessions
> and passes them to the vmms.
> - The vmm retrieves the FDs from the session and performs the
> necessary IOCTLs on them.
> - The vmm calls LIVEUPDATE_CMD_SESSION_FINISH on the session. Each FD
> receives a finish() callback in LIFO order.
> - If everything succeeds, the session becomes an empty "outgoing"
> session. It can then be closed and discarded or reused for the next
> live update by preserving new FDs into it.
> - Once the last FD for a file-handler is finished,
> h->ops->global_state_finish(h, h->global_state_obj) is called to
> finish the incoming global state.
>
> Unhappy Paths
> - If an outgoing session FD is closed, each FD in that session
> receives an unpreserve callback in LIFO order.
> - If the last FD for a global state is unpreserved,
> h->ops->global_state_unpreserve(h, h->global_state_obj) is called.
> - If freeze() fails, a cancel() is performed on each FD that received
> freeze() cb, and reboot(KEXEC) returns a failure.
nit: Maybe we can rename cancel to unfreeze. So it matches preserve/unpreserve?
> - If an incoming session FD is closed, the resources are considered
> "leaked." They are discarded only during the next live-update; this is
> intended to prevent implementing rare and untested clean-up code.
I am assuming the preserved folios will become unpreserved during
shutdown and in the next kernel those folios are free.
> - If a user tries to finish a session and it fails, it is considered
> the user's problem. This might happen because some IOCTLs still need
> to be run on the retrieved FDs to bring them to a state where finish
> is possible.
Sounds great.
>
> This would also mean that subsystems would not be needed, leaving only
> FLB (File-Lifecycle-Bound Global State) to use as a handle for global
> state. The API I am proposing for FLB keeps the same global state for
> a single file-handler type. However, HugeTLB might have multiple file
> handlers, so the API would need to be extended slightly to support
> this case. Multiple file handlers will share the same global resource
> with the same callbacks.
>
> Pasha
>
> > > + * @reserved: Must be zero.
> > > + * @state:    Output; The live update state of this FD.
> > > + *
> > > + * Query the current live update state of a specific preserved file descriptor.
> > > + *
> > > + * - %LIVEUPDATE_STATE_NORMAL:   Default state
> > > + * - %LIVEUPDATE_STATE_PREPARED: Prepare callback has been performed on this FD.
> > > + * - %LIVEUPDATE_STATE_FROZEN:   Freeze callback ahs been performed on this FD.
> > > + * - %LIVEUPDATE_STATE_UPDATED:  The system has successfully rebooted into the
> > > + *                               new kernel.
> > > + *
> > > + * See the definition of &enum liveupdate_state for more details on each state.
> > > + *
> > > + * Return: 0 on success, negative error code on failure.
> > > + */
> > > +struct liveupdate_session_get_state {
> > > +     __u32           size;
> > > +     __u8            incoming;
> > > +     __u8            reserved[3];
> > > +     __u32           state;
> > > +};
> > > +
> > > +#define LIVEUPDATE_SESSION_GET_STATE                                 \
> > > +     _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_SESSION_GET_STATE)
> > [...]
> >
> > --
> > Regards,
> > Pratyush Yadav
Powered by blists - more mailing lists
 
