lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mafs034cetc5g.fsf@kernel.org>
Date: Thu, 05 Jun 2025 17:56:43 +0200
From: Pratyush Yadav <pratyush@...nel.org>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: pratyush@...nel.org,  jasonmiu@...gle.com,  graf@...zon.com,
  changyuanl@...gle.com,  rppt@...nel.org,  dmatlack@...gle.com,
  rientjes@...gle.com,  corbet@....net,  rdunlap@...radead.org,
  ilpo.jarvinen@...ux.intel.com,  kanie@...ux.alibaba.com,
  ojeda@...nel.org,  aliceryhl@...gle.com,  masahiroy@...nel.org,
  akpm@...ux-foundation.org,  tj@...nel.org,  yoann.congal@...le.fr,
  mmaurer@...gle.com,  roman.gushchin@...ux.dev,  chenridong@...wei.com,
  axboe@...nel.dk,  mark.rutland@....com,  jannh@...gle.com,
  vincent.guittot@...aro.org,  hannes@...xchg.org,
  dan.j.williams@...el.com,  david@...hat.com,  joel.granados@...nel.org,
  rostedt@...dmis.org,  anna.schumaker@...cle.com,  song@...nel.org,
  zhangguopeng@...inos.cn,  linux@...ssschuh.net,
  linux-kernel@...r.kernel.org,  linux-doc@...r.kernel.org,
  linux-mm@...ck.org,  gregkh@...uxfoundation.org,  tglx@...utronix.de,
  mingo@...hat.com,  bp@...en8.de,  dave.hansen@...ux.intel.com,
  x86@...nel.org,  hpa@...or.com,  rafael@...nel.org,  dakr@...nel.org,
  bartosz.golaszewski@...aro.org,  cw00.choi@...sung.com,
  myungjoo.ham@...sung.com,  yesanishhere@...il.com,
  Jonathan.Cameron@...wei.com,  quic_zijuhu@...cinc.com,
  aleksander.lobakin@...el.com,  ira.weiny@...el.com,
  andriy.shevchenko@...ux.intel.com,  leon@...nel.org,  lukas@...ner.de,
  bhelgaas@...gle.com,  wagi@...nel.org,  djeffery@...hat.com,
  stuart.w.hayes@...il.com
Subject: Re: [RFC v2 08/16] luo: luo_files: add infrastructure for FDs

On Thu, May 15 2025, Pasha Tatashin wrote:

> Introduce the framework within LUO to support preserving specific types
> of file descriptors across a live update transition. This allows
> stateful FDs (like memfds or vfio FDs used by VMs) to be recreated in
> the new kernel.
>
> Note: The core logic for iterating through the luo_files_list and
> invoking the handler callbacks (prepare, freeze, cancel, finish)
> within luo_do_files_*_calls, as well as managing the u64 data
> persistence via the FDT for individual files, is currently implemented
> as stubs in this patch. This patch sets up the registration, FDT layout,
> and retrieval framework.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@...een.com>
> ---
>  drivers/misc/liveupdate/Makefile       |   1 +
>  drivers/misc/liveupdate/luo_core.c     |  19 +
>  drivers/misc/liveupdate/luo_files.c    | 563 +++++++++++++++++++++++++
>  drivers/misc/liveupdate/luo_internal.h |  11 +
>  include/linux/liveupdate.h             |  62 +++
>  5 files changed, 656 insertions(+)
>  create mode 100644 drivers/misc/liveupdate/luo_files.c
>
> diff --git a/drivers/misc/liveupdate/Makefile b/drivers/misc/liveupdate/Makefile
> index df1c9709ba4f..b4cdd162574f 100644
> --- a/drivers/misc/liveupdate/Makefile
> +++ b/drivers/misc/liveupdate/Makefile
> @@ -1,3 +1,4 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-y					+= luo_core.o
> +obj-y					+= luo_files.o
>  obj-y					+= luo_subsystems.o
[...]
> diff --git a/drivers/misc/liveupdate/luo_files.c b/drivers/misc/liveupdate/luo_files.c
> new file mode 100644
> index 000000000000..953fc40db3d7
> --- /dev/null
> +++ b/drivers/misc/liveupdate/luo_files.c
> @@ -0,0 +1,563 @@
[...]
> +struct luo_file {
> +	struct liveupdate_filesystem *fs;
> +	struct file *file;
> +	u64 private_data;
> +	bool reclaimed;
> +	enum liveupdate_state state;
> +	struct mutex mutex;
> +};
> +
> +/**
> + * luo_files_startup - Validates the LUO file-descriptors FDT node at startup.
> + * @fdt: Pointer to the LUO FDT blob passed from the previous kernel.
> + *
> + * This __init function checks the existence and validity of the
> + * '/file-descriptors' node in the FDT. This node is considered mandatory. It

Why is it mandatory? Can't a user just preserve some subsystems, and no
FDs?

> + * calls panic() if the node is missing, inaccessible, or invalid (e.g., missing
> + * compatible, wrong compatible string), indicating a critical configuration
> + * error for LUO.
> + */
> +void __init luo_files_startup(void *fdt)
> +{
> +	int ret, node_offset;
> +
> +	node_offset = fdt_subnode_offset(fdt, 0, LUO_FILES_NODE_NAME);
> +	if (node_offset < 0)
> +		panic("Failed to find /file-descriptors node\n");
> +
> +	ret = fdt_node_check_compatible(fdt, node_offset,
> +					LUO_FILES_COMPATIBLE);
> +	if (ret) {
> +		panic("FDT '%s' is incompatible with '%s' [%d]\n",
> +		      LUO_FILES_NODE_NAME, LUO_FILES_COMPATIBLE, ret);
> +	}
> +	luo_fdt_in = fdt;
> +}
> +
> +static void luo_files_recreate_luo_files_xa_in(void)
> +{
> +	int parent_node_offset, file_node_offset;
> +	const char *node_name, *fdt_compat_str;
> +	struct liveupdate_filesystem *fs;
> +	struct luo_file *luo_file;
> +	const void *data_ptr;
> +	int ret = 0;
> +
> +	if (luo_files_xa_in_recreated || !luo_fdt_in)
> +		return;
> +
> +	/* Take write in order to gurantee that we re-create list once */

Typo: s/gurantee/guarantee

> +	down_write(&luo_filesystems_list_rwsem);
> +	if (luo_files_xa_in_recreated)
> +		goto exit_unlock;
> +
> +	parent_node_offset = fdt_subnode_offset(luo_fdt_in, 0,
> +						LUO_FILES_NODE_NAME);
> +
> +	fdt_for_each_subnode(file_node_offset, luo_fdt_in, parent_node_offset) {
> +		bool handler_found = false;
> +		u64 token;
> +
> +		node_name = fdt_get_name(luo_fdt_in, file_node_offset, NULL);
> +		if (!node_name) {
> +			panic("Skipping FDT subnode at offset %d: Cannot get name\n",
> +			      file_node_offset);

Should failure to parse a specific FD really be a panic? Wouldn't it be
better to continue and let userspace decide if it can live with the FD
missing?

> +		}
> +
> +		ret = kstrtou64(node_name, 0, &token);
> +		if (ret < 0) {
> +			panic("Skipping FDT node '%s': Failed to parse token\n",
> +			      node_name);
> +		}
> +
> +		fdt_compat_str = fdt_getprop(luo_fdt_in, file_node_offset,
> +					     "compatible", NULL);
> +		if (!fdt_compat_str) {
> +			panic("Skipping FDT node '%s': Missing 'compatible' property\n",
> +			      node_name);
> +		}
> +
> +		data_ptr = fdt_getprop(luo_fdt_in, file_node_offset, "data",
> +				       NULL);
> +		if (!data_ptr) {
> +			panic("Can't recover property 'data' for FDT node '%s'\n",
> +			      node_name);
> +		}
> +
> +		list_for_each_entry(fs, &luo_filesystems_list, list) {
> +			if (!strcmp(fs->compatible, fdt_compat_str)) {
> +				handler_found = true;
> +				break;
> +			}
> +		}
> +
> +		if (!handler_found) {
> +			panic("Skipping FDT node '%s': No registered handler for compatible '%s'\n",
> +			      node_name, fdt_compat_str);

Thinking out loud here: this means that by the time of first retrieval,
all file systems must be registered. Since this is called from
luo_do_files_finish_calls() or luo_retrieve_file(), it will come from
userspace, so all built in modules would be initialized by then. But
some loadable module might not be. I don't see much of a use case for
loadable modules to participate in LUO, so I don't think it should be a
problem.

> +		}
> +
> +		luo_file = kmalloc(sizeof(*luo_file),
> +				   GFP_KERNEL | __GFP_NOFAIL);
> +		luo_file->fs = fs;
> +		luo_file->file = NULL;
> +		memcpy(&luo_file->private_data, data_ptr, sizeof(u64));

Why not make sure data_ptr is exactly sizeof(u64) when we parse it, and
then simply do luo_file->private_data = (u64)*data_ptr ?

Because if the previous kernel wrote more than a u64 in data, then
something is broken and we should catch that error anyway.

> +		luo_file->reclaimed = false;
> +		mutex_init(&luo_file->mutex);
> +		luo_file->state = LIVEUPDATE_STATE_UPDATED;
> +		ret = xa_err(xa_store(&luo_files_xa_in, token, luo_file,
> +				      GFP_KERNEL | __GFP_NOFAIL));

Should you also check if something is already at token's slot, in case
previous kernel generated wrong tokens or FDT is broken?

> +		if (ret < 0) {
> +			panic("Failed to store luo_file for token %llu in XArray: %d\n",
> +			      token, ret);
> +		}
> +	}
> +	luo_files_xa_in_recreated = true;
> +
> +exit_unlock:
> +	up_write(&luo_filesystems_list_rwsem);
> +}
> +
[...]
> diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
> index 7a130680b5f2..7afe0aac5ce4 100644
> --- a/include/linux/liveupdate.h
> +++ b/include/linux/liveupdate.h
> @@ -86,6 +86,55 @@ enum liveupdate_state  {
>  	LIVEUPDATE_STATE_UPDATED = 3,
>  };
>  
> +/* Forward declaration needed if definition isn't included */
> +struct file;
> +
> +/**
> + * struct liveupdate_filesystem - Represents a handler for a live-updatable
> + * filesystem/file type.
> + * @prepare:       Optional. Saves state for a specific file instance (@file,
> + *                 @arg) before update, potentially returning value via @data.
> + *                 Returns 0 on success, negative errno on failure.
> + * @freeze:        Optional. Performs final actions just before kernel
> + *                 transition, potentially reading/updating the handle via
> + *                 @data.
> + *                 Returns 0 on success, negative errno on failure.
> + * @cancel:        Optional. Cleans up state/resources if update is aborted
> + *                 after prepare/freeze succeeded, using the @data handle (by
> + *                 value) from the successful prepare. Returns void.
> + * @finish:        Optional. Performs final cleanup in the new kernel using the
> + *                 preserved @data handle (by value). Returns void.
> + * @retrieve:      Retrieve the preserved file. Must be called before finish.
> + * @can_preserve:  callback to determine if @file with associated context (@arg)
> + *                 can be preserved by this handler.
> + *                 Return bool (true if preservable, false otherwise).
> + * @compatible:    The compatibility string (e.g., "memfd-v1", "vfiofd-v1")
> + *                 that uniquely identifies the filesystem or file type this
> + *                 handler supports. This is matched against the compatible
> + *                 string associated with individual &struct liveupdate_file
> + *                 instances.
> + * @arg:           An opaque pointer to implementation-specific context data
> + *                 associated with this filesystem handler registration.
> + * @list:          used for linking this handler instance into a global list of
> + *                 registered filesystem handlers.
> + *
> + * Modules that want to support live update for specific file types should
> + * register an instance of this structure. LUO uses this registration to
> + * determine if a given file can be preserved and to find the appropriate
> + * operations to manage its state across the update.
> + */
> +struct liveupdate_filesystem {
> +	int (*prepare)(struct file *file, void *arg, u64 *data);
> +	int (*freeze)(struct file *file, void *arg, u64 *data);
> +	void (*cancel)(struct file *file, void *arg, u64 data);
> +	void (*finish)(struct file *file, void *arg, u64 data, bool reclaimed);
> +	int (*retrieve)(void *arg, u64 data, struct file **file);
> +	bool (*can_preserve)(struct file *file, void *arg);
> +	const char *compatible;
> +	void *arg;

What is the use for this arg? I would expect one file type/system to
register one set of handlers. So they can keep their arg in a global in
their code. I don't see why a per-filesystem arg is needed.

What I do think is needed is a per-file arg. Each callback gets 'data',
which is the serialized data, but there is no place to store runtime
state, like some flags or serialization metadata. Sure, you could make
place for it somewhere in the inode, but I think it would be a lot
cleaner to be able to store it in struct luo_file.

So perhaps rename private_data in struct luo_file to say
serialized_data, and have a field called "private" that filesystems can
use for their runtime state?

Same suggestion for subsystems as well.

> +	struct list_head list;
> +};
> +
>  /**
>   * struct liveupdate_subsystem - Represents a subsystem participating in LUO
>   * @prepare:      Optional. Called during LUO prepare phase. Should perform
[...]

-- 
Regards,
Pratyush Yadav

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ