[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bDu2FdzyotSwBpGwQtiisv=3f6gC7DzOpebPCxmmpwMYw@mail.gmail.com>
Date: Sun, 16 Nov 2025 09:55:30 -0500
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: pratyush@...nel.org, jasonmiu@...gle.com, graf@...zon.com,
dmatlack@...gle.com, rientjes@...gle.com, corbet@....net,
rdunlap@...radead.org, ilpo.jarvinen@...ux.intel.com, kanie@...ux.alibaba.com,
ojeda@...nel.org, aliceryhl@...gle.com, masahiroy@...nel.org,
akpm@...ux-foundation.org, tj@...nel.org, yoann.congal@...le.fr,
mmaurer@...gle.com, roman.gushchin@...ux.dev, chenridong@...wei.com,
axboe@...nel.dk, mark.rutland@....com, jannh@...gle.com,
vincent.guittot@...aro.org, hannes@...xchg.org, dan.j.williams@...el.com,
david@...hat.com, joel.granados@...nel.org, rostedt@...dmis.org,
anna.schumaker@...cle.com, song@...nel.org, linux@...ssschuh.net,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, linux-mm@...ck.org,
gregkh@...uxfoundation.org, tglx@...utronix.de, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
rafael@...nel.org, dakr@...nel.org, bartosz.golaszewski@...aro.org,
cw00.choi@...sung.com, myungjoo.ham@...sung.com, yesanishhere@...il.com,
Jonathan.Cameron@...wei.com, quic_zijuhu@...cinc.com,
aleksander.lobakin@...el.com, ira.weiny@...el.com,
andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de,
bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com,
stuart.w.hayes@...il.com, ptyadav@...zon.de, lennart@...ttering.net,
brauner@...nel.org, linux-api@...r.kernel.org, linux-fsdevel@...r.kernel.org,
saeedm@...dia.com, ajayachandra@...dia.com, jgg@...dia.com, parav@...dia.com,
leonro@...dia.com, witu@...dia.com, hughd@...gle.com, skhawaja@...gle.com,
chrisl@...nel.org
Subject: Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
On Sun, Nov 16, 2025 at 7:43 AM Mike Rapoport <rppt@...nel.org> wrote:
>
> On Sat, Nov 15, 2025 at 06:33:48PM -0500, Pasha Tatashin wrote:
> > Integrate the LUO with the KHO framework to enable passing LUO state
> > across a kexec reboot.
> >
> > When LUO is transitioned to a "prepared" state, it tells KHO to
> > finalize, so all memory segments that were added to KHO preservation
> > list are getting preserved. After "Prepared" state no new segments
> > can be preserved. If LUO is canceled, it also tells KHO to cancel the
> > serialization, and therefore, later LUO can go back into the prepared
> > state.
> >
> > This patch introduces the following changes:
> > - During the KHO finalization phase allocate FDT blob.
>
> This happens much earlier, isn't it?
It is, this commit log needs to be updated, it still talks about
prepare/cancel, where they are since v5 replaced with
preserve/unfreeze.
>
> > - Populate this FDT with a LUO compatibility string ("luo-v1").
> >
> > LUO now depends on `CONFIG_KEXEC_HANDOVER`. The core state transition
> > logic (`luo_do_*_calls`) remains unimplemented in this patch.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@...een.com>
> > ---
> > include/linux/liveupdate/abi/luo.h | 54 ++++++++++
> > kernel/liveupdate/luo_core.c | 153 ++++++++++++++++++++++++++++-
> > 2 files changed, 206 insertions(+), 1 deletion(-)
> > create mode 100644 include/linux/liveupdate/abi/luo.h
> >
> > diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
> > new file mode 100644
> > index 000000000000..9483a294287f
> > --- /dev/null
> > +++ b/include/linux/liveupdate/abi/luo.h
> > @@ -0,0 +1,54 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +/*
> > + * Copyright (c) 2025, Google LLC.
> > + * Pasha Tatashin <pasha.tatashin@...een.com>
> > + */
> > +
> > +/**
> > + * DOC: Live Update Orchestrator ABI
> > + *
> > + * This header defines the stable Application Binary Interface used by the
> > + * Live Update Orchestrator to pass state from a pre-update kernel to a
> > + * post-update kernel. The ABI is built upon the Kexec HandOver framework
> > + * and uses a Flattened Device Tree to describe the preserved data.
> > + *
> > + * This interface is a contract. Any modification to the FDT structure, node
> > + * properties, compatible strings, or the layout of the `__packed` serialization
> > + * structures defined here constitutes a breaking change. Such changes require
> > + * incrementing the version number in the relevant `_COMPATIBLE` string to
> > + * prevent a new kernel from misinterpreting data from an old kernel.
>
> I'd add a sentence that stresses that ABI changes are possible as long they
> include changes to the FDT version.
> This is indeed implied by the last paragraph, but I think it's worth
> spelling it explicitly.
>
> Another thing that I think this should mention is that compatibility is
> only guaranteed for the kernels that use the same ABI version.
Sure, I will add both.
> > + *
> > + * FDT Structure Overview:
> > + * The entire LUO state is encapsulated within a single KHO entry named "LUO".
> > + * This entry contains an FDT with the following layout:
> > + *
> > + * .. code-block:: none
> > + *
> > + * / {
> > + * compatible = "luo-v1";
> > + * liveupdate-number = <...>;
> > + * };
> > + *
> > + * Main LUO Node (/):
> > + *
> > + * - compatible: "luo-v1"
> > + * Identifies the overall LUO ABI version.
> > + * - liveupdate-number: u64
> > + * A counter tracking the number of successful live updates performed.
> > + */
> ...
>
> > +static int __init liveupdate_early_init(void)
> > +{
> > + int err;
> > +
> > + err = luo_early_startup();
> > + if (err) {
> > + pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
> > + ERR_PTR(err));
>
> How do we report this to the userspace?
> I think the decision what to do in this case belongs there. Even if it's
> down to choosing between plain kexec and full reboot, it's still a policy
> that should be implemented in userspace.
I agree that policy belongs in userspace, and that is how we designed
it. In this specific failure case (ABI mismatch or corrupt FDT), the
preserved state is unrecoverable by the kernel. We cannot parse the
incoming data, so we cannot offer it to userspace.
We report this state by not registering the /dev/liveupdate device.
When the userspace agent attempts to initialize, it receives ENOENT.
At that point, the agent exercises its policy:
- Check dmesg for the specific error and report the failure to the
fleet control plane.
- Trigger a fresh (kexec or cold) reboot to reset unreclaimable resources.
Pasha
Powered by blists - more mailing lists