[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+CK2bDWVu137cPdbu7yOBNGm_ixoeJkQucZW_gPXV3FzTPMKQ@mail.gmail.com>
Date: Fri, 30 May 2025 01:00:28 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: pratyush@...nel.org, jasonmiu@...gle.com, graf@...zon.com,
changyuanl@...gle.com, dmatlack@...gle.com, rientjes@...gle.com,
corbet@....net, rdunlap@...radead.org, ilpo.jarvinen@...ux.intel.com,
kanie@...ux.alibaba.com, ojeda@...nel.org, aliceryhl@...gle.com,
masahiroy@...nel.org, akpm@...ux-foundation.org, tj@...nel.org,
yoann.congal@...le.fr, mmaurer@...gle.com, roman.gushchin@...ux.dev,
chenridong@...wei.com, axboe@...nel.dk, mark.rutland@....com,
jannh@...gle.com, vincent.guittot@...aro.org, hannes@...xchg.org,
dan.j.williams@...el.com, david@...hat.com, joel.granados@...nel.org,
rostedt@...dmis.org, anna.schumaker@...cle.com, song@...nel.org,
zhangguopeng@...inos.cn, linux@...ssschuh.net, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-mm@...ck.org, gregkh@...uxfoundation.org,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, rafael@...nel.org,
dakr@...nel.org, bartosz.golaszewski@...aro.org, cw00.choi@...sung.com,
myungjoo.ham@...sung.com, yesanishhere@...il.com, Jonathan.Cameron@...wei.com,
quic_zijuhu@...cinc.com, aleksander.lobakin@...el.com, ira.weiny@...el.com,
andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de,
bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com,
stuart.w.hayes@...il.com, ptyadav@...zon.de
Subject: Re: [RFC v2 04/16] luo: luo_core: Live Update Orchestrator
> > +config LIVEUPDATE
> > + bool "Live Update Orchestrator"
> > + depends on KEXEC_HANDOVER
> > + help
> > + Enable the Live Update Orchestrator. Live Update is a mechanism,
> > + typically based on kexec, that allows the kernel to be updated
> > + while keeping selected devices operational across the transition.
> > + These devices are intended to be reclaimed by the new kernel and
> > + re-attached to their original workload without requiring a device
> > + reset.
> > +
> > + This functionality depends on specific support within device drivers
> > + and related kernel subsystems.
>
> This is not clear if the ability to reattach a device to the new kernel or
> the entire live update functionality depends on specific support with
> drivers.
>
> Probably better phrase it as
>
> Ability to handover a device from old to new kernel depends ...
Updated
>
> > +
> > + This feature is primarily used in cloud environments to quickly
> > + update the kernel hypervisor with minimal disruption to the
> > + running virtual machines.
>
> I wouldn't put it into Kconfig. If anything I'd make it
>
> This feature primarily targets virtual machine hosts to quickly ...
Ok
> > + * The core of LUO is a state machine that tracks the progress of a live update,
> > + * along with a callback API that allows other kernel subsystems to participate
> > + * in the process. Example subsystems that can hook into LUO include: kvm,
> > + * iommu, interrupts, vfio, participating filesystems, and mm.
>
> Please spell out memory management.
Done.
>
> > + * LUO uses KHO to transfer memory state from the current Kernel to the next
>
> A link to KHO docs would have been nice, but I'm not sure kernel-doc can do
> that nicely.
Added a link, a simple path to rst, is apparently correctly converted
to a link by sphinx.
>
> > + * Kernel.
>
> Why capital 'K'? :)
Fixed.
>
> > + * The LUO state machine ensures that operations are performed in the correct
> > + * sequence and provides a mechanism to track and recover from potential
> > + * failures, and select devices and subsystems that should participate in
> > + * live update sequence.
> > + */
> > +
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/err.h>
> > +#include <linux/kobject.h>
> > +#include <linux/liveupdate.h>
> > +#include <linux/rwsem.h>
> > +#include <linux/string.h>
> > +#include "luo_internal.h"
> > +
> > +static DECLARE_RWSEM(luo_state_rwsem);
> > +
> > +enum liveupdate_state luo_state;
>
> static?
Fixed
> Hmm, luo_state is initialized to 0 (NORMAL) which means we always start
> from NORMAL, although the second kernel is not in the normal state until
> the handover is complete. Maybe we need an initial "unknown" state until
> some of luo code starts running and would set an actual known state?
Added: LIVEUPDATE_STATE_UNDEFINED that exists only before LUO is
initialized during boot.
> > +const char *const luo_state_str[] = {
> > + [LIVEUPDATE_STATE_NORMAL] = "normal",
> > + [LIVEUPDATE_STATE_PREPARED] = "prepared",
> > + [LIVEUPDATE_STATE_FROZEN] = "frozen",
> > + [LIVEUPDATE_STATE_UPDATED] = "updated",
> > +};
> > +
> > +bool luo_enabled;
>
> static?
Fixed.
>
> > +static int __init early_liveupdate_param(char *buf)
> > +{
> > + return kstrtobool(buf, &luo_enabled);
> > +}
> > +early_param("liveupdate", early_liveupdate_param);
> > +
> > +/* Return true if the current state is equal to the provided state */
> > +static inline bool is_current_luo_state(enum liveupdate_state expected_state)
> > +{
> > + return READ_ONCE(luo_state) == expected_state;
> > +}
> > +
> > +static void __luo_set_state(enum liveupdate_state state)
> > +{
> > + WRITE_ONCE(luo_state, state);
> > +}
> > +
> > +static inline void luo_set_state(enum liveupdate_state state)
> > +{
> > + pr_info("Switched from [%s] to [%s] state\n",
> > + LUO_STATE_STR, luo_state_str[state]);
>
> Maybe LUO_CURRENT_STATE_STR?
Done
> > + __luo_set_state(state);
> > +}
> > +
> > +static int luo_do_freeze_calls(void)
> > +{
> > + return 0;
> > +}
> > +
> > +static void luo_do_finish_calls(void)
> > +{
> > +}
> > +
> > +int luo_prepare(void)
> > +{
> > + return 0;
> > +}
> > +
> > +/**
> > + * luo_freeze() - Initiate the final freeze notification phase for live update.
> > + *
> > + * Attempts to transition the live update orchestrator state from
> > + * %LIVEUPDATE_STATE_PREPARED to %LIVEUPDATE_STATE_FROZEN. This function is
> > + * typically called just before the actual reboot system call (e.g., kexec)
> > + * is invoked, either directly by the orchestration tool or potentially from
> > + * within the reboot syscall path itself.
> > + *
> > + * Based on the outcome of the notification process:
> > + * - If luo_do_freeze_calls() returns 0 (all callbacks succeeded), the state
> > + * is set to %LIVEUPDATE_STATE_FROZEN using luo_set_state(), indicating
> > + * readiness for the imminent kexec.
> > + * - If luo_do_freeze_calls() returns a negative error code (a callback
> > + * failed), the state is reverted to %LIVEUPDATE_STATE_NORMAL using
> > + * luo_set_state() to cancel the live update attempt.
>
> The kernel-doc comments are mostly for users of a function and describe how
> it should be used rather how it is implemented.
SGTM, cleaned-up.
> I don't think it's important to mention return values of
> luo_do_freeze_calls() here. The important things are whether registered
> subsystems succeeded to freeze or not and the state changes.
> I'd also mention that if a subsystem fails to freeze, everything is
> canceled.
Added
> > +/**
> > + * luo_finish - Finalize the live update process in the new kernel.
> > + *
> > + * This function is called after a successful live update reboot into a new
> > + * kernel, once the new kernel is ready to transition to the normal operational
> > + * state. It signals the completion of the live update sequence to subsystems.
> > + *
> > + * It first attempts to acquire the write lock for the orchestrator state.
> > + *
> > + * Then, it checks if the system is in the ``LIVEUPDATE_STATE_UPDATED`` state.
> > + * If not, it logs a warning and returns ``-EINVAL``.
> > + *
> > + * If the state is correct, it triggers the ``LIVEUPDATE_FINISH`` notifier
>
> Here too, you describe what the function does rather how it should be used
Fixed
>
> > + * chain. Note that the return value of the notifier is intentionally ignored as
> > + * finish callbacks must not fail. Finally, the orchestrator state is
>
> And what should happen if there was an error in a finish callback?
Scream, warn, panic, we cannot allow running a system past liveupdate,
if some state was not properly passed from the previous kernel to the
current kernel. This may result in catastrophic memory leaks.
> > +static int __init luo_startup(void)
> > +{
> > + __luo_set_state(LIVEUPDATE_STATE_NORMAL);
> > +
> > + return 0;
> > +}
> > +early_initcall(luo_startup);
>
> This means that the second kernel starts with luo_state == NORMAL, then
> at early_initcall transitions to NORMAL again and later is set to UPDATED,
> doesn't it?
In the next patch, in this function we transition to UPDATED. So,
technically, we go from NORMAL to UPDATED. However, I added UNDEFINED
state so, in this function we either go from UNDEFINED to UPDATED or
UNDEFINED to NORMAL.
> > + * @return true if the system is in the ``LIVEUPDATE_STATE_NORMAL`` state,
> > + * false otherwise.
> > + */
> > +bool liveupdate_state_normal(void)
> > +{
> > + return is_current_luo_state(LIVEUPDATE_STATE_NORMAL);
> > +}
> > +EXPORT_SYMBOL_GPL(liveupdate_state_normal);
>
> Won't liveupdate_get_state() do?
Yeah, we can simply return state, and let caller to compare. However,
I think, caller is only interested if this is normal state or if live
update is in progress. I will keep them, and also added
liveupdate_get_state().
> > +
> > +/**
> > + * liveupdate_enabled - Check if the live update feature is enabled.
> > + *
> > + * This function returns the state of the live update feature flag, which
> > + * can be controlled via the ``liveupdate`` kernel command-line parameter.
> > + *
> > + * @return true if live update is enabled, false otherwise.
> > + */
> > +bool liveupdate_enabled(void)
> > +{
> > + return luo_enabled;
> > +}
> > +EXPORT_SYMBOL_GPL(liveupdate_enabled);
> > diff --git a/drivers/misc/liveupdate/luo_internal.h b/drivers/misc/liveupdate/luo_internal.h
> > new file mode 100644
> > index 000000000000..34e73fb0318c
> > --- /dev/null
> > +++ b/drivers/misc/liveupdate/luo_internal.h
> > @@ -0,0 +1,26 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +/*
> > + * Copyright (c) 2025, Google LLC.
> > + * Pasha Tatashin <pasha.tatashin@...een.com>
> > + */
> > +
> > +#ifndef _LINUX_LUO_INTERNAL_H
> > +#define _LINUX_LUO_INTERNAL_H
> > +
> > +int luo_cancel(void);
> > +int luo_prepare(void);
> > +int luo_freeze(void);
> > +int luo_finish(void);
> > +
> > +void luo_state_read_enter(void);
> > +void luo_state_read_exit(void);
> > +
> > +extern const char *const luo_state_str[];
> > +
> > +/* Get the current state as a string */
> > +#define LUO_STATE_STR luo_state_str[READ_ONCE(luo_state)]
>
> IIUC you need the macro to have LUO_STATE_STR available in all files in
> liveupdate/ but without exposing luo_state.
>
> I think that we can do a function call to get that string, will make things
> nicer IMHO.
Done.
>
> > +
> > +extern enum liveupdate_state luo_state;
> > +
> > +#endif /* _LINUX_LUO_INTERNAL_H */
> > diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
> > new file mode 100644
> > index 000000000000..c2740da70958
> > --- /dev/null
> > +++ b/include/linux/liveupdate.h
> > @@ -0,0 +1,131 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +/*
> > + * Copyright (c) 2025, Google LLC.
> > + * Pasha Tatashin <pasha.tatashin@...een.com>
> > + */
> > +#ifndef _LINUX_LIVEUPDATE_H
> > +#define _LINUX_LIVEUPDATE_H
> > +
> > +#include <linux/bug.h>
> > +#include <linux/types.h>
> > +#include <linux/list.h>
> > +
> > +/**
> > + * enum liveupdate_event - Events that trigger live update callbacks.
> > + * @LIVEUPDATE_PREPARE: PREPARE should happens *before* the blackout window.
>
> should happen or happens ;-)
Done
>
> > + * Subsystems should prepare for an upcoming reboot by
> > + * serializing their states. However, it must be considered
>
> It's not only about state serialization, it's also about adjusting
> operational mode so that state that was serialized won't be changed or at
> least the changes from PREPARE to FREEZE would be accounted somehow.
By serialization, I mean is to save their state, but I agree, the
devices and resources are also should be in a limited state where the
serialized data should not be altered between prepare and freeze (i.e.
no memfd resizing, no new DMA mappings, etc).
Thank you for your comments.
Pasha
Powered by blists - more mailing lists