lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bBVSX26TKwgLkXCDop5u3e9McH3sQMascT47ZwwrwraOw@mail.gmail.com>
Date: Wed, 29 Oct 2025 16:13:14 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Pratyush Yadav <pratyush@...nel.org>
Cc: jasonmiu@...gle.com, graf@...zon.com, changyuanl@...gle.com, 
	rppt@...nel.org, dmatlack@...gle.com, rientjes@...gle.com, corbet@....net, 
	rdunlap@...radead.org, ilpo.jarvinen@...ux.intel.com, kanie@...ux.alibaba.com, 
	ojeda@...nel.org, aliceryhl@...gle.com, masahiroy@...nel.org, 
	akpm@...ux-foundation.org, tj@...nel.org, yoann.congal@...le.fr, 
	mmaurer@...gle.com, roman.gushchin@...ux.dev, chenridong@...wei.com, 
	axboe@...nel.dk, mark.rutland@....com, jannh@...gle.com, 
	vincent.guittot@...aro.org, hannes@...xchg.org, dan.j.williams@...el.com, 
	david@...hat.com, joel.granados@...nel.org, rostedt@...dmis.org, 
	anna.schumaker@...cle.com, song@...nel.org, zhangguopeng@...inos.cn, 
	linux@...ssschuh.net, linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, 
	linux-mm@...ck.org, gregkh@...uxfoundation.org, tglx@...utronix.de, 
	mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org, 
	hpa@...or.com, rafael@...nel.org, dakr@...nel.org, 
	bartosz.golaszewski@...aro.org, cw00.choi@...sung.com, 
	myungjoo.ham@...sung.com, yesanishhere@...il.com, Jonathan.Cameron@...wei.com, 
	quic_zijuhu@...cinc.com, aleksander.lobakin@...el.com, ira.weiny@...el.com, 
	andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de, 
	bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com, 
	stuart.w.hayes@...il.com, lennart@...ttering.net, brauner@...nel.org, 
	linux-api@...r.kernel.org, linux-fsdevel@...r.kernel.org, saeedm@...dia.com, 
	ajayachandra@...dia.com, jgg@...dia.com, parav@...dia.com, leonro@...dia.com, 
	witu@...dia.com, hughd@...gle.com, skhawaja@...gle.com, chrisl@...nel.org, 
	steven.sistare@...cle.com
Subject: Re: [PATCH v4 14/30] liveupdate: luo_session: Add ioctls for file
 preservation and state management

On Wed, Oct 29, 2025 at 3:07 PM Pratyush Yadav <pratyush@...nel.org> wrote:
>
> Hi Pasha,
>
> On Mon, Sep 29 2025, Pasha Tatashin wrote:
>
> > Introducing the userspace interface and internal logic required to
> > manage the lifecycle of file descriptors within a session. Previously, a
> > session was merely a container; this change makes it a functional
> > management unit.
> >
> > The following capabilities are added:
> >
> > A new set of ioctl commands are added, which operate on the file
> > descriptor returned by CREATE_SESSION. This allows userspace to:
> > - LIVEUPDATE_SESSION_PRESERVE_FD: Add a file descriptor to a session
> >   to be preserved across the live update.
> > - LIVEUPDATE_SESSION_UNPRESERVE_FD: Remove a previously added file
> >   descriptor from the session.
> > - LIVEUPDATE_SESSION_RESTORE_FD: Retrieve a preserved file in the
> >   new kernel using its unique token.
> >
> > A state machine for each individual session, distinct from the global
> > LUO state. This enables more granular control, allowing userspace to
> > prepare or freeze specific sessions independently. This is managed via:
> > - LIVEUPDATE_SESSION_SET_EVENT: An ioctl to send PREPARE, FREEZE,
> >   CANCEL, or FINISH events to a single session.
> > - LIVEUPDATE_SESSION_GET_STATE: An ioctl to query the current state
> >   of a single session.
> >
> > The global subsystem callbacks (luo_session_prepare, luo_session_freeze)
> > are updated to iterate through all existing sessions. They now trigger
> > the appropriate per-session state transitions for any sessions that
> > haven't already been transitioned individually by userspace.
> >
> > The session's .release handler is enhanced to be state-aware. When a
> > session's file descriptor is closed, it now correctly cancels or
> > finishes the session based on its current state before freeing all
> > associated file resources, preventing resource leaks.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@...een.com>
> [...]
> > +/**
> > + * struct liveupdate_session_get_state - ioctl(LIVEUPDATE_SESSION_GET_STATE)
> > + * @size:     Input; sizeof(struct liveupdate_session_get_state)
> > + * @incoming: Input; If 1, query the state of a restored file from the incoming
> > + *            (previous kernel's) set. If 0, query a file being prepared for
> > + *            preservation in the current set.
>
> Spotted this when working on updating my test suite for LUO. This seems
> to be a leftover from a previous version. I don't see it being used
> anywhere in the code.

thank you will remove this.

> Also, I think the model we should have is to only allow new sessions in
> normal state. Currently luo_session_create() allows creating a new
> session in updated state. This would end up mixing sessions from a
> previous boot and sessions from current boot. I don't really see a
> reason for that and I think the userspace should first call finish
> before starting new serialization. Keeps things simpler.

It does. However, yesterday Jason Gunthorpe suggested that we simplify
the uapi, at least for the initial landing, by removing the state
machine during boot and allowing new sessions to be created at any
time. This would also mean separating the incoming and outgoing
sessions and removing the ioctl() call used to bring the machine into
a normal state; instead, only individual sessions could be brought
into a 'normal' state.

Simplified uAPI Proposal
The simplest uAPI would look like this:
IOCTLs on /dev/liveupdate (to create and retrieve session FDs):
LIVEUPDATE_IOCTL_CREATE_SESSION
LIVEUPDATE_IOCTL_RETRIEVE_SESSION

IOCTLs on session FDs:
LIVEUPDATE_CMD_SESSION_PRESERVE_FD
LIVEUPDATE_CMD_SESSION_RETRIEVE_FD
LIVEUPDATE_CMD_SESSION_FINISH

Happy Path
The happy path would look like this:
- luod creates a session with a specific name and passes it to the vmm.
- The vmm preserves FDs in a specific order: memfd, iommufd, vfiofd.
(If the order is wrong, the preserve callbacks will fail.)
- A reboot(KEXEC) is performed.
- Each session receives a freeze() callback to notify it that
mutations are no longer possible.
- During boot, liveupdate_fh_global_state_get(&h, &obj) can be used to
retrieve the global state.
- Once the machine has booted, luod retrieves the incoming sessions
and passes them to the vmms.
- The vmm retrieves the FDs from the session and performs the
necessary IOCTLs on them.
- The vmm calls LIVEUPDATE_CMD_SESSION_FINISH on the session. Each FD
receives a finish() callback in LIFO order.
- If everything succeeds, the session becomes an empty "outgoing"
session. It can then be closed and discarded or reused for the next
live update by preserving new FDs into it.
- Once the last FD for a file-handler is finished,
h->ops->global_state_finish(h, h->global_state_obj) is called to
finish the incoming global state.

Unhappy Paths
- If an outgoing session FD is closed, each FD in that session
receives an unpreserve callback in LIFO order.
- If the last FD for a global state is unpreserved,
h->ops->global_state_unpreserve(h, h->global_state_obj) is called.
- If freeze() fails, a cancel() is performed on each FD that received
freeze() cb, and reboot(KEXEC) returns a failure.
- If an incoming session FD is closed, the resources are considered
"leaked." They are discarded only during the next live-update; this is
intended to prevent implementing rare and untested clean-up code.
- If a user tries to finish a session and it fails, it is considered
the user's problem. This might happen because some IOCTLs still need
to be run on the retrieved FDs to bring them to a state where finish
is possible.

This would also mean that subsystems would not be needed, leaving only
FLB (File-Lifecycle-Bound Global State) to use as a handle for global
state. The API I am proposing for FLB keeps the same global state for
a single file-handler type. However, HugeTLB might have multiple file
handlers, so the API would need to be extended slightly to support
this case. Multiple file handlers will share the same global resource
with the same callbacks.

Pasha

> > + * @reserved: Must be zero.
> > + * @state:    Output; The live update state of this FD.
> > + *
> > + * Query the current live update state of a specific preserved file descriptor.
> > + *
> > + * - %LIVEUPDATE_STATE_NORMAL:   Default state
> > + * - %LIVEUPDATE_STATE_PREPARED: Prepare callback has been performed on this FD.
> > + * - %LIVEUPDATE_STATE_FROZEN:   Freeze callback ahs been performed on this FD.
> > + * - %LIVEUPDATE_STATE_UPDATED:  The system has successfully rebooted into the
> > + *                               new kernel.
> > + *
> > + * See the definition of &enum liveupdate_state for more details on each state.
> > + *
> > + * Return: 0 on success, negative error code on failure.
> > + */
> > +struct liveupdate_session_get_state {
> > +     __u32           size;
> > +     __u8            incoming;
> > +     __u8            reserved[3];
> > +     __u32           state;
> > +};
> > +
> > +#define LIVEUPDATE_SESSION_GET_STATE                                 \
> > +     _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_SESSION_GET_STATE)
> [...]
>
> --
> Regards,
> Pratyush Yadav

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ