[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bBeCOojpZ=qoefd6NG+bO6CUh+NU8=8dMhD01=LtC9eNg@mail.gmail.com>
Date: Sun, 8 Jun 2025 09:37:38 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Pratyush Yadav <pratyush@...nel.org>
Cc: jasonmiu@...gle.com, graf@...zon.com, changyuanl@...gle.com,
rppt@...nel.org, dmatlack@...gle.com, rientjes@...gle.com, corbet@....net,
rdunlap@...radead.org, ilpo.jarvinen@...ux.intel.com, kanie@...ux.alibaba.com,
ojeda@...nel.org, aliceryhl@...gle.com, masahiroy@...nel.org,
akpm@...ux-foundation.org, tj@...nel.org, yoann.congal@...le.fr,
mmaurer@...gle.com, roman.gushchin@...ux.dev, chenridong@...wei.com,
axboe@...nel.dk, mark.rutland@....com, jannh@...gle.com,
vincent.guittot@...aro.org, hannes@...xchg.org, dan.j.williams@...el.com,
david@...hat.com, joel.granados@...nel.org, rostedt@...dmis.org,
anna.schumaker@...cle.com, song@...nel.org, zhangguopeng@...inos.cn,
linux@...ssschuh.net, linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
linux-mm@...ck.org, gregkh@...uxfoundation.org, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org,
hpa@...or.com, rafael@...nel.org, dakr@...nel.org,
bartosz.golaszewski@...aro.org, cw00.choi@...sung.com,
myungjoo.ham@...sung.com, yesanishhere@...il.com, Jonathan.Cameron@...wei.com,
quic_zijuhu@...cinc.com, aleksander.lobakin@...el.com, ira.weiny@...el.com,
andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de,
bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com,
stuart.w.hayes@...il.com
Subject: Re: [RFC v2 08/16] luo: luo_files: add infrastructure for FDs
> > +
> > +/**
> > + * luo_files_startup - Validates the LUO file-descriptors FDT node at startup.
> > + * @fdt: Pointer to the LUO FDT blob passed from the previous kernel.
> > + *
> > + * This __init function checks the existence and validity of the
> > + * '/file-descriptors' node in the FDT. This node is considered mandatory. It
>
> Why is it mandatory? Can't a user just preserve some subsystems, and no
> FDs?
Yes, that is legal, in that case this node is going to be empty.
>
> > + * calls panic() if the node is missing, inaccessible, or invalid (e.g., missing
> > + * compatible, wrong compatible string), indicating a critical configuration
> > + * error for LUO.
> > + */
> > +void __init luo_files_startup(void *fdt)
> > +{
> > + int ret, node_offset;
> > +
> > + node_offset = fdt_subnode_offset(fdt, 0, LUO_FILES_NODE_NAME);
> > + if (node_offset < 0)
> > + panic("Failed to find /file-descriptors node\n");
> > +
> > + ret = fdt_node_check_compatible(fdt, node_offset,
> > + LUO_FILES_COMPATIBLE);
> > + if (ret) {
> > + panic("FDT '%s' is incompatible with '%s' [%d]\n",
> > + LUO_FILES_NODE_NAME, LUO_FILES_COMPATIBLE, ret);
> > + }
> > + luo_fdt_in = fdt;
> > +}
> > +
> > +static void luo_files_recreate_luo_files_xa_in(void)
> > +{
> > + int parent_node_offset, file_node_offset;
> > + const char *node_name, *fdt_compat_str;
> > + struct liveupdate_filesystem *fs;
> > + struct luo_file *luo_file;
> > + const void *data_ptr;
> > + int ret = 0;
> > +
> > + if (luo_files_xa_in_recreated || !luo_fdt_in)
> > + return;
> > +
> > + /* Take write in order to gurantee that we re-create list once */
>
> Typo: s/gurantee/guarantee
Done, thanks.
>
> > + down_write(&luo_filesystems_list_rwsem);
> > + if (luo_files_xa_in_recreated)
> > + goto exit_unlock;
> > +
> > + parent_node_offset = fdt_subnode_offset(luo_fdt_in, 0,
> > + LUO_FILES_NODE_NAME);
> > +
> > + fdt_for_each_subnode(file_node_offset, luo_fdt_in, parent_node_offset) {
> > + bool handler_found = false;
> > + u64 token;
> > +
> > + node_name = fdt_get_name(luo_fdt_in, file_node_offset, NULL);
> > + if (!node_name) {
> > + panic("Skipping FDT subnode at offset %d: Cannot get name\n",
> > + file_node_offset);
>
> Should failure to parse a specific FD really be a panic? Wouldn't it be
> better to continue and let userspace decide if it can live with the FD
> missing?
This is not safe, the memory might be DMA or owned by a sensetive
process, and if we proceed liveupdate reboot without properly handling
memory, we can get corruptions, and memory leaks. Therefore, during
liveupdate boot if there are exceptions, we should panic.
> > + }
> > +
> > + ret = kstrtou64(node_name, 0, &token);
> > + if (ret < 0) {
> > + panic("Skipping FDT node '%s': Failed to parse token\n",
> > + node_name);
> > + }
> > +
> > + fdt_compat_str = fdt_getprop(luo_fdt_in, file_node_offset,
> > + "compatible", NULL);
> > + if (!fdt_compat_str) {
> > + panic("Skipping FDT node '%s': Missing 'compatible' property\n",
> > + node_name);
> > + }
> > +
> > + data_ptr = fdt_getprop(luo_fdt_in, file_node_offset, "data",
> > + NULL);
> > + if (!data_ptr) {
> > + panic("Can't recover property 'data' for FDT node '%s'\n",
> > + node_name);
> > + }
> > +
> > + list_for_each_entry(fs, &luo_filesystems_list, list) {
> > + if (!strcmp(fs->compatible, fdt_compat_str)) {
> > + handler_found = true;
> > + break;
> > + }
> > + }
> > +
> > + if (!handler_found) {
> > + panic("Skipping FDT node '%s': No registered handler for compatible '%s'\n",
> > + node_name, fdt_compat_str);
>
> Thinking out loud here: this means that by the time of first retrieval,
> all file systems must be registered. Since this is called from
> luo_do_files_finish_calls() or luo_retrieve_file(), it will come from
> userspace, so all built in modules would be initialized by then. But
> some loadable module might not be. I don't see much of a use case for
> loadable modules to participate in LUO, so I don't think it should be a
> problem.
Yes, in practice I am against supporting liveupdate for loadable
modules for FDs and devices; however, if userspace decides to use
them, they have to be very careful in terms when data is retrieved,
and when they are loaded.
> > + }
> > +
> > + luo_file = kmalloc(sizeof(*luo_file),
> > + GFP_KERNEL | __GFP_NOFAIL);
> > + luo_file->fs = fs;
> > + luo_file->file = NULL;
> > + memcpy(&luo_file->private_data, data_ptr, sizeof(u64));
>
> Why not make sure data_ptr is exactly sizeof(u64) when we parse it, and
> then simply do luo_file->private_data = (u64)*data_ptr ?
Because FDT alignment is 4 bytes, we can't simply assign it.
> Because if the previous kernel wrote more than a u64 in data, then
> something is broken and we should catch that error anyway.
>
> > + luo_file->reclaimed = false;
> > + mutex_init(&luo_file->mutex);
> > + luo_file->state = LIVEUPDATE_STATE_UPDATED;
> > + ret = xa_err(xa_store(&luo_files_xa_in, token, luo_file,
> > + GFP_KERNEL | __GFP_NOFAIL));
>
> Should you also check if something is already at token's slot, in case
> previous kernel generated wrong tokens or FDT is broken?
Good idea, added.
>
> > + if (ret < 0) {
> > + panic("Failed to store luo_file for token %llu in XArray: %d\n",
> > + token, ret);
> > + }
> > + }
> > + luo_files_xa_in_recreated = true;
> > +
> > +exit_unlock:
> > + up_write(&luo_filesystems_list_rwsem);
> > +}
> > +
> [...]
> > diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
> > index 7a130680b5f2..7afe0aac5ce4 100644
> > --- a/include/linux/liveupdate.h
> > +++ b/include/linux/liveupdate.h
> > @@ -86,6 +86,55 @@ enum liveupdate_state {
> > LIVEUPDATE_STATE_UPDATED = 3,
> > };
> >
> > +/* Forward declaration needed if definition isn't included */
> > +struct file;
> > +
> > +/**
> > + * struct liveupdate_filesystem - Represents a handler for a live-updatable
> > + * filesystem/file type.
> > + * @prepare: Optional. Saves state for a specific file instance (@file,
> > + * @arg) before update, potentially returning value via @data.
> > + * Returns 0 on success, negative errno on failure.
> > + * @freeze: Optional. Performs final actions just before kernel
> > + * transition, potentially reading/updating the handle via
> > + * @data.
> > + * Returns 0 on success, negative errno on failure.
> > + * @cancel: Optional. Cleans up state/resources if update is aborted
> > + * after prepare/freeze succeeded, using the @data handle (by
> > + * value) from the successful prepare. Returns void.
> > + * @finish: Optional. Performs final cleanup in the new kernel using the
> > + * preserved @data handle (by value). Returns void.
> > + * @retrieve: Retrieve the preserved file. Must be called before finish.
> > + * @can_preserve: callback to determine if @file with associated context (@arg)
> > + * can be preserved by this handler.
> > + * Return bool (true if preservable, false otherwise).
> > + * @compatible: The compatibility string (e.g., "memfd-v1", "vfiofd-v1")
> > + * that uniquely identifies the filesystem or file type this
> > + * handler supports. This is matched against the compatible
> > + * string associated with individual &struct liveupdate_file
> > + * instances.
> > + * @arg: An opaque pointer to implementation-specific context data
> > + * associated with this filesystem handler registration.
> > + * @list: used for linking this handler instance into a global list of
> > + * registered filesystem handlers.
> > + *
> > + * Modules that want to support live update for specific file types should
> > + * register an instance of this structure. LUO uses this registration to
> > + * determine if a given file can be preserved and to find the appropriate
> > + * operations to manage its state across the update.
> > + */
> > +struct liveupdate_filesystem {
> > + int (*prepare)(struct file *file, void *arg, u64 *data);
> > + int (*freeze)(struct file *file, void *arg, u64 *data);
> > + void (*cancel)(struct file *file, void *arg, u64 data);
> > + void (*finish)(struct file *file, void *arg, u64 data, bool reclaimed);
> > + int (*retrieve)(void *arg, u64 data, struct file **file);
> > + bool (*can_preserve)(struct file *file, void *arg);
> > + const char *compatible;
> > + void *arg;
>
> What is the use for this arg? I would expect one file type/system to
> register one set of handlers. So they can keep their arg in a global in
> their code. I don't see why a per-filesystem arg is needed.
I think, arg is useful in case we support a subsystem is registered
multiple times with some differences: i.e. based on mount point, or
file types handling. Let's keep it for now, but if needed, we can
remove that in future revisions.
> What I do think is needed is a per-file arg. Each callback gets 'data',
> which is the serialized data, but there is no place to store runtime
> state, like some flags or serialization metadata. Sure, you could make
> place for it somewhere in the inode, but I think it would be a lot
> cleaner to be able to store it in struct luo_file.
>
> So perhaps rename private_data in struct luo_file to say
> serialized_data, and have a field called "private" that filesystems can
> use for their runtime state?
I am not against this, but let's make this change when it is actually
needed by a registered filesystem.
Thanks,
Pasha
Powered by blists - more mailing lists