lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bBOu9oRiO7gih7JpePXQjds2vN8uFXodgHU48fxpP_bVQ@mail.gmail.com>
Date: Tue, 5 Aug 2025 18:19:23 +0000
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: pratyush@...nel.org, jasonmiu@...gle.com, graf@...zon.com, 
	changyuanl@...gle.com, rppt@...nel.org, dmatlack@...gle.com, 
	rientjes@...gle.com, corbet@....net, rdunlap@...radead.org, 
	ilpo.jarvinen@...ux.intel.com, kanie@...ux.alibaba.com, ojeda@...nel.org, 
	aliceryhl@...gle.com, masahiroy@...nel.org, akpm@...ux-foundation.org, 
	tj@...nel.org, yoann.congal@...le.fr, mmaurer@...gle.com, 
	roman.gushchin@...ux.dev, chenridong@...wei.com, axboe@...nel.dk, 
	mark.rutland@....com, jannh@...gle.com, vincent.guittot@...aro.org, 
	hannes@...xchg.org, dan.j.williams@...el.com, david@...hat.com, 
	joel.granados@...nel.org, rostedt@...dmis.org, anna.schumaker@...cle.com, 
	song@...nel.org, zhangguopeng@...inos.cn, linux@...ssschuh.net, 
	linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, linux-mm@...ck.org, 
	gregkh@...uxfoundation.org, tglx@...utronix.de, mingo@...hat.com, 
	bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, 
	rafael@...nel.org, dakr@...nel.org, bartosz.golaszewski@...aro.org, 
	cw00.choi@...sung.com, myungjoo.ham@...sung.com, yesanishhere@...il.com, 
	Jonathan.Cameron@...wei.com, quic_zijuhu@...cinc.com, 
	aleksander.lobakin@...el.com, ira.weiny@...el.com, 
	andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de, 
	bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com, 
	stuart.w.hayes@...il.com, ptyadav@...zon.de, lennart@...ttering.net, 
	brauner@...nel.org, linux-api@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	saeedm@...dia.com, ajayachandra@...dia.com, parav@...dia.com, 
	leonro@...dia.com, witu@...dia.com
Subject: Re: [PATCH v2 16/32] liveupdate: luo_ioctl: add ioctl interface

On Tue, Jul 29, 2025 at 12:35 PM Jason Gunthorpe <jgg@...dia.com> wrote:
>
> On Wed, Jul 23, 2025 at 02:46:29PM +0000, Pasha Tatashin wrote:
> > Introduce the user-space interface for the Live Update Orchestrator
> > via ioctl commands, enabling external control over the live update
> > process and management of preserved resources.
>
> I strongly recommend copying something like fwctl (which is copying
> iommufd, which is copying some other best practices). I will try to
> outline the main points below.
>
> The design of the fwctl scheme allows alot of options for ABI
> compatible future extensions and I very strongly recommend that
> complex ioctl style APIs be built with that in mind. I have so many
> scars from trying to undo fixed ABI design :)

Thank you for bringing this up, I have reviewed fwctl ioctl
implementation, and also iommufd ioctl, and I made the necessary
changes to make luo similar.

> > +/**
> > + * struct liveupdate_fd - Holds parameters for preserving and restoring file
> > + * descriptors across live update.
> > + * @fd:    Input for %LIVEUPDATE_IOCTL_FD_PRESERVE: The user-space file
> > + *         descriptor to be preserved.
> > + *         Output for %LIVEUPDATE_IOCTL_FD_RESTORE: The new file descriptor
> > + *         representing the fully restored kernel resource.
> > + * @flags: Unused, reserved for future expansion, must be set to 0.
> > + * @token: Input for %LIVEUPDATE_IOCTL_FD_PRESERVE: An opaque, unique token
> > + *         preserved for preserved resource.
> > + *         Input for %LIVEUPDATE_IOCTL_FD_RESTORE: The token previously
> > + *         provided to the preserve ioctl for the resource to be restored.
> > + *
> > + * This structure is used as the argument for the %LIVEUPDATE_IOCTL_FD_PRESERVE
> > + * and %LIVEUPDATE_IOCTL_FD_RESTORE ioctls. These ioctls allow specific types
> > + * of file descriptors (for example memfd, kvm, iommufd, and VFIO) to have their
> > + * underlying kernel state preserved across a live update cycle.
> > + *
> > + * To preserve an FD, user space passes this struct to
> > + * %LIVEUPDATE_IOCTL_FD_PRESERVE with the @fd field set. On success, the
> > + * kernel uses the @token field to uniquly associate the preserved FD.
> > + *
> > + * After the live update transition, user space passes the struct populated with
> > + * the *same* @token to %LIVEUPDATE_IOCTL_FD_RESTORE. The kernel uses the @token
> > + * to find the preserved state and, on success, populates the @fd field with a
> > + * new file descriptor referring to the restored resource.
> > + */
> > +struct liveupdate_fd {
> > +     int             fd;
>
> 'int' should not appear in uapi structs. Fds are __s32

done

>
> > +     __u32           flags;
> > +     __aligned_u64   token;
> > +};
> > +
> > +/* The ioctl type, documented in ioctl-number.rst */
> > +#define LIVEUPDATE_IOCTL_TYPE                0xBA
>
> I have found it very helpful to organize the ioctl numbering like this:
>
> #define IOMMUFD_TYPE (';')
>
> enum {
>         IOMMUFD_CMD_BASE = 0x80,
>         IOMMUFD_CMD_DESTROY = IOMMUFD_CMD_BASE,
>         IOMMUFD_CMD_IOAS_ALLOC = 0x81,
>         IOMMUFD_CMD_IOAS_ALLOW_IOVAS = 0x82,
> [..]
>
> #define IOMMU_DESTROY _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DESTROY)
>
> The numbers should be tightly packed and non-overlapping. It becomes
> difficult to manage this if the numbers are sprinkled all over the
> file. The above structuring will enforce git am conflicts if things
> get muddled up. Saved me a few times already in iommufd.

Done

>
> > +/**
> > + * LIVEUPDATE_IOCTL_FD_PRESERVE - Validate and initiate preservation for a file
> > + * descriptor.
> > + *
> > + * Argument: Pointer to &struct liveupdate_fd.
> > + *
> > + * User sets the @fd field identifying the file descriptor to preserve
> > + * (e.g., memfd, kvm, iommufd, VFIO). The kernel validates if this FD type
> > + * and its dependencies are supported for preservation. If validation passes,
> > + * the kernel marks the FD internally and *initiates the process* of preparing
> > + * its state for saving. The actual snapshotting of the state typically occurs
> > + * during the subsequent %LIVEUPDATE_IOCTL_PREPARE execution phase, though
> > + * some finalization might occur during freeze.
> > + * On successful validation and initiation, the kernel uses the @token
> > + * field with an opaque identifier representing the resource being preserved.
> > + * This token confirms the FD is targeted for preservation and is required for
> > + * the subsequent %LIVEUPDATE_IOCTL_FD_RESTORE call after the live update.
> > + *
> > + * Return: 0 on success (validation passed, preservation initiated), negative
> > + * error code on failure (e.g., unsupported FD type, dependency issue,
> > + * validation failed).
> > + */
> > +#define LIVEUPDATE_IOCTL_FD_PRESERVE                                 \
> > +     _IOW(LIVEUPDATE_IOCTL_TYPE, 0x00, struct liveupdate_fd)
>
> From a kdoc perspective I find it works much better to attach the kdoc
> to the struct, not the ioctl:
>
> /**
>  * struct iommu_destroy - ioctl(IOMMU_DESTROY)
>  * @size: sizeof(struct iommu_destroy)
>  * @id: iommufd object ID to destroy. Can be any destroyable object type.
>  *
>  * Destroy any object held within iommufd.
>  */
> struct iommu_destroy {
>         __u32 size;
>         __u32 id;
> };
> #define IOMMU_DESTROY _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DESTROY)
>
> Generates this kdoc:
>
> https://docs.kernel.org/userspace-api/iommufd.html#c.iommu_destroy

Agreed, done the same as above.

>
> You should also make sure to link the uapi header into the kdoc build
> under the "userspace API" chaper.
>
> The structs should also be self-describing. I am fairly strongly
> against using the size mechanism in the _IOW macro, it is instantly
> ABI incompatible and basically impossible to deal with from userspace.
>
> Hence why the IOMMFD version is _IO().

Right, I came to the same conclusion while reviewing fwctl, I replaced
everything with pure _IO().

>
> This means stick a size member in the first 4 bytes of every
> struct. More on this later..
>
> > +/**
> > + * LIVEUPDATE_IOCTL_FD_UNPRESERVE - Remove a file descriptor from the
> > + * preservation list.
> > + *
> > + * Argument: Pointer to __u64 token.
>
> Every ioctl should have a struct, with the size header. If you want to
> do more down the road you can not using this structure.

Done

>
> > +#define LIVEUPDATE_IOCTL_FD_RESTORE                                  \
> > +     _IOWR(LIVEUPDATE_IOCTL_TYPE, 0x02, struct liveupdate_fd)
>
> Strongly recommend that every ioctl have a unique struct. Sharing
> structs makes future extend-ability harder.

Done

>
> > +/**
> > + * LIVEUPDATE_IOCTL_PREPARE - Initiate preparation phase and trigger state
> > + * saving.
>
> Perhaps these just want to be a single 'set state' ioctl with an enum
> input argument?

Added a IOCTL: LIVEUPDATE_SET_EVENT, and all events
PREPARE/FINISH/CANCEL are now done through it.

>
> > @@ -7,4 +7,5 @@ obj-$(CONFIG_KEXEC_HANDOVER)          += kexec_handover.o
> >  obj-$(CONFIG_KEXEC_HANDOVER_DEBUG)   += kexec_handover_debug.o
> >  obj-$(CONFIG_LIVEUPDATE)             += luo_core.o
> >  obj-$(CONFIG_LIVEUPDATE)             += luo_files.o
> > +obj-$(CONFIG_LIVEUPDATE)             += luo_ioctl.o
> >  obj-$(CONFIG_LIVEUPDATE)             += luo_subsystems.o
>
> I don't think luo is modular, but I think it is generally better to
> write the kbuilds as though it was anyhow if it has a lot of files:
>
> iommufd-y := \
>         device.o \
>         eventq.o \
>         hw_pagetable.o \
>         io_pagetable.o \
>         ioas.o \
>         main.o \
>         pages.o \
>         vfio_compat.o \
>         viommu.o
> obj-$(CONFIG_IOMMUFD) += iommufd.o

Done

>
> Basically don't repeat obj-$(CONFIG_LIVEUPDATE), every one of those
> lines creates a new module (if it was modular)
>
> > +static int luo_open(struct inode *inodep, struct file *filep)
> > +{
> > +     if (!capable(CAP_SYS_ADMIN))
> > +             return -EACCES;
>
> IMHO file system permissions should control permission to open. No
> capable check.

Removed

>
> > +     if (filep->f_flags & O_EXCL)
> > +             return -EINVAL;
>
> O_EXCL doesn't really do anything for cdev, I'd drop this.
>
> The open should have an atomic to check for single open though.

Removed, and added an enforcement for a single open.

>
> > +static long luo_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
> > +{
> > +     void __user *argp = (void __user *)arg;
> > +     struct liveupdate_fd luo_fd;
> > +     enum liveupdate_state state;
> > +     int ret = 0;
> > +     u64 token;
> > +
> > +     if (_IOC_TYPE(cmd) != LIVEUPDATE_IOCTL_TYPE)
> > +             return -ENOTTY;
>
> The generic parse/disptach from fwctl is a really good idea here, you
> can cut and paste it, change the names. It makes it really easy to manage future extensibility:
>
> List the ops and their structs:
>
> static const struct fwctl_ioctl_op fwctl_ioctl_ops[] = {
>         IOCTL_OP(FWCTL_INFO, fwctl_cmd_info, struct fwctl_info, out_device_data),
>         IOCTL_OP(FWCTL_RPC, fwctl_cmd_rpc, struct fwctl_rpc, out),
> };
>
> Index the list and copy_from_user the struct desribing the opt:
>
> static long fwctl_fops_ioctl(struct file *filp, unsigned int cmd,
>                                unsigned long arg)
> {
>         struct fwctl_uctx *uctx = filp->private_data;
>         const struct fwctl_ioctl_op *op;
>         struct fwctl_ucmd ucmd = {};
>         union fwctl_ucmd_buffer buf;
>         unsigned int nr;
>         int ret;
>
>         nr = _IOC_NR(cmd);
>         if ((nr - FWCTL_CMD_BASE) >= ARRAY_SIZE(fwctl_ioctl_ops))
>                 return -ENOIOCTLCMD;
>
>         op = &fwctl_ioctl_ops[nr - FWCTL_CMD_BASE];
>         if (op->ioctl_num != cmd)
>                 return -ENOIOCTLCMD;
>
>         ucmd.uctx = uctx;
>         ucmd.cmd = &buf;
>         ucmd.ubuffer = (void __user *)arg;
>         // This is reading/checking the standard 4 byte size header:
>         ret = get_user(ucmd.user_size, (u32 __user *)ucmd.ubuffer);
>         if (ret)
>                 return ret;
>
>         if (ucmd.user_size < op->min_size)
>                 return -EINVAL;
>
>         ret = copy_struct_from_user(ucmd.cmd, op->size, ucmd.ubuffer,
>                                     ucmd.user_size);
>
>
> Removes a bunch of boiler plate and easy to make wrong copy_from_users
> in the ioctls. Centralizes size validation, zero padding checking/etc.

Yeap, implemented as  above.

>
> > +             ret = luo_register_file(luo_fd.token, luo_fd.fd);
> > +             if (!ret && copy_to_user(argp, &luo_fd, sizeof(luo_fd))) {
> > +                     WARN_ON_ONCE(luo_unregister_file(luo_fd.token));
> > +                     ret = -EFAULT;
>
> Then for extensibility you'd copy back the struct:
>
> static int ucmd_respond(struct fwctl_ucmd *ucmd, size_t cmd_len)
> {
>         if (copy_to_user(ucmd->ubuffer, ucmd->cmd,
>                          min_t(size_t, ucmd->user_size, cmd_len)))
>                 return -EFAULT;
>         return 0;
> }
>
> Which truncates it/etc according to some ABI extensibility rules.
>
> > +static int __init liveupdate_init(void)
> > +{
> > +     int err;
> > +
> > +     if (!liveupdate_enabled())
> > +             return 0;
> > +
> > +     err = misc_register(&liveupdate_miscdev);
> > +     if (err < 0) {
> > +             pr_err("Failed to register misc device '%s': %d\n",
> > +                    liveupdate_miscdev.name, err);
>
> Should remove most of the pr_err's, here too IMHO..

Removed.

>
> Jason

Thanks a lot for the thorough review!

Pasha

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ