[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bBu4ex9O5kPcR7++DVg3RM8ZWg3BCpcc6CboJ=aG8mVmQ@mail.gmail.com>
Date: Tue, 24 Jun 2025 10:27:56 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Christian Brauner <brauner@...nel.org>
Cc: pratyush@...nel.org, jasonmiu@...gle.com, graf@...zon.com,
changyuanl@...gle.com, rppt@...nel.org, dmatlack@...gle.com,
rientjes@...gle.com, corbet@....net, rdunlap@...radead.org,
ilpo.jarvinen@...ux.intel.com, kanie@...ux.alibaba.com, ojeda@...nel.org,
aliceryhl@...gle.com, masahiroy@...nel.org, akpm@...ux-foundation.org,
tj@...nel.org, yoann.congal@...le.fr, mmaurer@...gle.com,
roman.gushchin@...ux.dev, chenridong@...wei.com, axboe@...nel.dk,
mark.rutland@....com, jannh@...gle.com, vincent.guittot@...aro.org,
hannes@...xchg.org, dan.j.williams@...el.com, david@...hat.com,
joel.granados@...nel.org, rostedt@...dmis.org, anna.schumaker@...cle.com,
song@...nel.org, zhangguopeng@...inos.cn, linux@...ssschuh.net,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, linux-mm@...ck.org,
gregkh@...uxfoundation.org, tglx@...utronix.de, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
rafael@...nel.org, dakr@...nel.org, bartosz.golaszewski@...aro.org,
cw00.choi@...sung.com, myungjoo.ham@...sung.com, yesanishhere@...il.com,
Jonathan.Cameron@...wei.com, quic_zijuhu@...cinc.com,
aleksander.lobakin@...el.com, ira.weiny@...el.com,
andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de,
bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com,
stuart.w.hayes@...il.com, ptyadav@...zon.de
Subject: Re: [RFC v2 10/16] luo: luo_ioctl: add ioctl interface
On Tue, Jun 24, 2025 at 5:51 AM Christian Brauner <brauner@...nel.org> wrote:
>
> On Thu, May 15, 2025 at 06:23:14PM +0000, Pasha Tatashin wrote:
> > Introduce the user-space interface for the Live Update Orchestrator
> > via ioctl commands, enabling external control over the live update
> > process and management of preserved resources.
> >
> > Create a misc character device at /dev/liveupdate. Access
> > to this device requires the CAP_SYS_ADMIN capability.
> >
> > A new UAPI header, <uapi/linux/liveupdate.h>, defines the necessary
> > structures. The magic number is registered in
> > Documentation/userspace-api/ioctl/ioctl-number.rst.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@...een.com>
> > ---
> > .../userspace-api/ioctl/ioctl-number.rst | 1 +
> > drivers/misc/liveupdate/Makefile | 1 +
> > drivers/misc/liveupdate/luo_ioctl.c | 199 ++++++++++++
> > include/linux/liveupdate.h | 34 +-
> > include/uapi/linux/liveupdate.h | 300 ++++++++++++++++++
> > 5 files changed, 502 insertions(+), 33 deletions(-)
> > create mode 100644 drivers/misc/liveupdate/luo_ioctl.c
> > create mode 100644 include/uapi/linux/liveupdate.h
> >
> > diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
> > index 7a1409ecc238..279c124048f2 100644
> > --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> > +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> > @@ -375,6 +375,7 @@ Code Seq# Include File Comments
> > 0xB8 01-02 uapi/misc/mrvl_cn10k_dpi.h Marvell CN10K DPI driver
> > 0xB8 all uapi/linux/mshv.h Microsoft Hyper-V /dev/mshv driver
> > <mailto:linux-hyperv@...r.kernel.org>
> > +0xBA all uapi/linux/liveupdate.h <mailto:Pasha Tatashin <pasha.tatashin@...een.com>
> > 0xC0 00-0F linux/usb/iowarrior.h
> > 0xCA 00-0F uapi/misc/cxl.h Dead since 6.15
> > 0xCA 10-2F uapi/misc/ocxl.h
> > diff --git a/drivers/misc/liveupdate/Makefile b/drivers/misc/liveupdate/Makefile
> > index b4cdd162574f..7a0cd08919c9 100644
> > --- a/drivers/misc/liveupdate/Makefile
> > +++ b/drivers/misc/liveupdate/Makefile
> > @@ -1,4 +1,5 @@
> > # SPDX-License-Identifier: GPL-2.0
> > +obj-y += luo_ioctl.o
> > obj-y += luo_core.o
> > obj-y += luo_files.o
> > obj-y += luo_subsystems.o
> > diff --git a/drivers/misc/liveupdate/luo_ioctl.c b/drivers/misc/liveupdate/luo_ioctl.c
> > new file mode 100644
> > index 000000000000..76c687ff650b
> > --- /dev/null
> > +++ b/drivers/misc/liveupdate/luo_ioctl.c
> > @@ -0,0 +1,199 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * Copyright (c) 2025, Google LLC.
> > + * Pasha Tatashin <pasha.tatashin@...een.com>
> > + */
> > +
> > +/**
> > + * DOC: LUO ioctl Interface
> > + *
> > + * The IOCTL user-space control interface for the LUO subsystem.
> > + * It registers a misc character device, typically found at ``/dev/liveupdate``,
> > + * which allows privileged userspace applications (requiring %CAP_SYS_ADMIN) to
> > + * manage and monitor the LUO state machine and associated resources like
> > + * preservable file descriptors.
> > + */
> > +
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/errno.h>
> > +#include <linux/file.h>
> > +#include <linux/fs.h>
> > +#include <linux/init.h>
> > +#include <linux/kernel.h>
> > +#include <linux/miscdevice.h>
> > +#include <linux/module.h>
> > +#include <linux/uaccess.h>
> > +#include <uapi/linux/liveupdate.h>
> > +#include "luo_internal.h"
> > +
> > +static int luo_ioctl_fd_preserve(struct liveupdate_fd *luo_fd)
> > +{
> > + struct file *file;
> > + int ret;
> > +
> > + file = fget(luo_fd->fd);
> > + if (!file) {
> > + pr_err("Bad file descriptor\n");
> > + return -EBADF;
> > + }
> > +
> > + ret = luo_register_file(&luo_fd->token, file);
> > + if (ret)
> > + fput(file);
> > +
> > + return ret;
> > +}
> > +
> > +static int luo_ioctl_fd_unpreserve(u64 token)
> > +{
> > + return luo_unregister_file(token);
> > +}
> > +
> > +static int luo_ioctl_fd_restore(struct liveupdate_fd *luo_fd)
> > +{
> > + struct file *file;
> > + int ret;
> > + int fd;
> > +
> > + fd = get_unused_fd_flags(O_CLOEXEC);
> > + if (fd < 0) {
> > + pr_err("Failed to allocate new fd: %d\n", fd);
> > + return fd;
> > + }
> > +
> > + ret = luo_retrieve_file(luo_fd->token, &file);
> > + if (ret < 0) {
> > + put_unused_fd(fd);
> > +
> > + return ret;
> > + }
> > +
> > + fd_install(fd, file);
> > + luo_fd->fd = fd;
> > +
> > + return 0;
> > +}
> > +
> > +static int luo_open(struct inode *inodep, struct file *filep)
> > +{
> > + if (!capable(CAP_SYS_ADMIN))
> > + return -EACCES;
> > +
> > + if (filep->f_flags & O_EXCL)
> > + return -EINVAL;
> > +
> > + return 0;
> > +}
> > +
> > +static long luo_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
> > +{
> > + void __user *argp = (void __user *)arg;
> > + struct liveupdate_fd luo_fd;
> > + enum liveupdate_state state;
> > + int ret = 0;
> > + u64 token;
> > +
> > + if (_IOC_TYPE(cmd) != LIVEUPDATE_IOCTL_TYPE)
> > + return -ENOTTY;
> > +
> > + switch (cmd) {
> > + case LIVEUPDATE_IOCTL_GET_STATE:
> > + state = READ_ONCE(luo_state);
> > + if (copy_to_user(argp, &state, sizeof(luo_state)))
> > + ret = -EFAULT;
> > + break;
> > +
> > + case LIVEUPDATE_IOCTL_EVENT_PREPARE:
> > + ret = luo_prepare();
> > + break;
> > +
> > + case LIVEUPDATE_IOCTL_EVENT_FREEZE:
> > + ret = luo_freeze();
> > + break;
> > +
> > + case LIVEUPDATE_IOCTL_EVENT_FINISH:
> > + ret = luo_finish();
> > + break;
> > +
> > + case LIVEUPDATE_IOCTL_EVENT_CANCEL:
> > + ret = luo_cancel();
> > + break;
> > +
> > + case LIVEUPDATE_IOCTL_FD_PRESERVE:
> > + if (copy_from_user(&luo_fd, argp, sizeof(luo_fd))) {
> > + ret = -EFAULT;
> > + break;
> > + }
> > +
> > + ret = luo_ioctl_fd_preserve(&luo_fd);
> > + if (!ret && copy_to_user(argp, &luo_fd, sizeof(luo_fd)))
> > + ret = -EFAULT;
> > + break;
> > +
> > + case LIVEUPDATE_IOCTL_FD_UNPRESERVE:
> > + if (copy_from_user(&token, argp, sizeof(u64))) {
> > + ret = -EFAULT;
> > + break;
> > + }
> > +
> > + ret = luo_ioctl_fd_unpreserve(token);
> > + break;
> > +
> > + case LIVEUPDATE_IOCTL_FD_RESTORE:
> > + if (copy_from_user(&luo_fd, argp, sizeof(luo_fd))) {
> > + ret = -EFAULT;
> > + break;
> > + }
> > +
> > + ret = luo_ioctl_fd_restore(&luo_fd);
> > + if (!ret && copy_to_user(argp, &luo_fd, sizeof(luo_fd)))
> > + ret = -EFAULT;
> > + break;
> > +
> > + default:
> > + pr_warn("ioctl: unknown command nr: 0x%x\n", _IOC_NR(cmd));
> > + ret = -ENOTTY;
> > + break;
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +static const struct file_operations fops = {
> > + .owner = THIS_MODULE,
> > + .open = luo_open,
> > + .unlocked_ioctl = luo_ioctl,
> > +};
> > +
> > +static struct miscdevice liveupdate_miscdev = {
> > + .minor = MISC_DYNAMIC_MINOR,
> > + .name = "liveupdate",
> > + .fops = &fops,
> > +};
>
> I'm not sure why people are so in love with character device based apis.
> It's terrible. It glues everything to devtmpfs which isn't namespacable
> in any way. It's terrible to delegate and extremely restrictive in terms
> of extensiblity if you need additional device entries (aka the loop
> driver folly).
>
> One stupid question: I probably have asked this before and just swapped
> out that I a) asked this already and b) received an explanation. But why
> isn't this a singleton simple in-memory filesystem with a flat
> hierarchy?
Hi Christian,
Thank you for the detailed feedback and for raising this important
design question. I appreciate the points you've made about the
benefits of a filesystem-based API.
I have thought thoroughly about this and explored various alternatives
before settling on the ioctl-based interface. This design isn't a
sudden decision but is based on ongoing conversations that have been
happening for over two years at LPC, as well as incorporating direct
feedback I received on LUOv1 at LSF/MM.
The choice for an ioctl-based character device was ultimately driven
by the specific lifecycle and dependency management requirements of
the live update process. While a filesystem API offers great
advantages in visibility and hierarchy, filesystems are not typically
designed to be state machines with the complex lifecycle, dependency,
and ownership tracking that LUO needs to manage.
Let me elaborate on the key aspects that led to the current design:
1. session based lifecycle management: The preservation of an FD is
tied to the open instance of /dev/liveupdate. If a userspace agent
opens /dev/liveupdate, registers several FDs for preservation, and
then crashes or exits before the prepare phase is triggered, all FDs
it registered are automatically unregistered. This "session-scoped"
behavior is crucial to prevent leaking preserved resources into the
next kernel if the controlling process fails. This is naturally
handled by the open() and release() file operations on a character
device. It's not immediately obvious how a similar automatic,
session-based cleanup would be implemented with a singleton
filesystem.
2. state machine: LUO is fundamentally a state machine (NORMAL ->
PREPARED -> FROZEN -> UPDATED -> NORMAL). As part of this, it provides
a crucial guarantee: any resource that was successfully preserved but
not explicitly reclaimed by userspace in the new kernel by the time
the FINISH event is triggered will be automatically cleaned up and its
memory released. This prevents leaks of unreclaimed resources and is
managed by the orchestrator, which is a concept that doesn't map
cleanly onto standard VFS semantics.
3. dependency tracking: Unlike normal files, preserved resources for
live update have strong, often complex interdependencies. For example,
a kvmfd might depend on a guestmemfd; an iommufd can depend on vfiofd,
eventfd, memfd, and kvmfd. LUO's current design provides explicit
callback points (prepare, freeze) where these dependencies can be
validated and tracked by the participating subsystems. If a dependency
is not met when we are about to freeze, we can fail the entire
operation and return an error to userspace. The cancel callback
further allows this complex dependency graph to be unwound safely. A
filesystem interface based on linkat() or unlink() doesn't inherently
provide these critical, ordered points for dependency verification and
rollback.
While I agree that a filesystem offers superior introspection and
integration with standard tools, building this complex, stateful
orchestration logic on top of VFS seemed to be forcing a square peg
into a round hole. The ioctl interface, while more opaque, provides a
direct and explicit way to command the state machine and manage these
complex lifecycle and dependency rules.
Thanks,
Pasha
Powered by blists - more mailing lists