[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aSMXUKMhroThYrlU@kernel.org>
Date: Sun, 23 Nov 2025 16:16:48 +0200
From: Mike Rapoport <rppt@...nel.org>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: pratyush@...nel.org, jasonmiu@...gle.com, graf@...zon.com,
dmatlack@...gle.com, rientjes@...gle.com, corbet@....net,
rdunlap@...radead.org, ilpo.jarvinen@...ux.intel.com,
kanie@...ux.alibaba.com, ojeda@...nel.org, aliceryhl@...gle.com,
masahiroy@...nel.org, akpm@...ux-foundation.org, tj@...nel.org,
yoann.congal@...le.fr, mmaurer@...gle.com, roman.gushchin@...ux.dev,
chenridong@...wei.com, axboe@...nel.dk, mark.rutland@....com,
jannh@...gle.com, vincent.guittot@...aro.org, hannes@...xchg.org,
dan.j.williams@...el.com, david@...hat.com,
joel.granados@...nel.org, rostedt@...dmis.org,
anna.schumaker@...cle.com, song@...nel.org, linux@...ssschuh.net,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
linux-mm@...ck.org, gregkh@...uxfoundation.org, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
x86@...nel.org, hpa@...or.com, rafael@...nel.org, dakr@...nel.org,
bartosz.golaszewski@...aro.org, cw00.choi@...sung.com,
myungjoo.ham@...sung.com, yesanishhere@...il.com,
Jonathan.Cameron@...wei.com, quic_zijuhu@...cinc.com,
aleksander.lobakin@...el.com, ira.weiny@...el.com,
andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de,
bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com,
stuart.w.hayes@...il.com, ptyadav@...zon.de, lennart@...ttering.net,
brauner@...nel.org, linux-api@...r.kernel.org,
linux-fsdevel@...r.kernel.org, saeedm@...dia.com,
ajayachandra@...dia.com, jgg@...dia.com, parav@...dia.com,
leonro@...dia.com, witu@...dia.com, hughd@...gle.com,
skhawaja@...gle.com, chrisl@...nel.org
Subject: Re: [PATCH v7 02/22] liveupdate: luo_core: integrate with KHO
On Sun, Nov 23, 2025 at 07:03:19AM -0500, Pasha Tatashin wrote:
> On Sun, Nov 23, 2025 at 6:27 AM Mike Rapoport <rppt@...nel.org> wrote:
> >
> > On Sat, Nov 22, 2025 at 05:23:29PM -0500, Pasha Tatashin wrote:
> > > Integrate the LUO with the KHO framework to enable passing LUO state
> > > across a kexec reboot.
> > >
> > > This patch implements the lifecycle integration with KHO:
> > >
> > > 1. Incoming State: During early boot (`early_initcall`), LUO checks if
> > > KHO is active. If so, it retrieves the "LUO" subtree, verifies the
> > > "luo-v1" compatibility string, and reads the `liveupdate-number` to
> > > track the update count.
> > >
> > > 2. Outgoing State: During late initialization (`late_initcall`), LUO
> > > allocates a new FDT for the next kernel, populates it with the basic
> > > header (compatible string and incremented update number), and
> > > registers it with KHO (`kho_add_subtree`).
> > >
> > > 3. Finalization: The `liveupdate_reboot()` notifier is updated to invoke
> > > `kho_finalize()`. This ensures that all memory segments marked for
> > > preservation are properly serialized before the kexec jump.
> > >
> > > LUO now depends on `CONFIG_KEXEC_HANDOVER`.
> > >
> > > Signed-off-by: Pasha Tatashin <pasha.tatashin@...een.com>
> > > ---
> > > include/linux/kho/abi/luo.h | 54 +++++++++++
> > > kernel/liveupdate/luo_core.c | 154 ++++++++++++++++++++++++++++++-
> > > kernel/liveupdate/luo_internal.h | 22 +++++
> > > 3 files changed, 229 insertions(+), 1 deletion(-)
> > > create mode 100644 include/linux/kho/abi/luo.h
> > > create mode 100644 kernel/liveupdate/luo_internal.h
> > >
> > > diff --git a/include/linux/kho/abi/luo.h b/include/linux/kho/abi/luo.h
> > > new file mode 100644
> > > index 000000000000..8523b3ff82d1
> > > --- /dev/null
> > > +++ b/include/linux/kho/abi/luo.h
> > > @@ -0,0 +1,54 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +
> > > +/*
> > > + * Copyright (c) 2025, Google LLC.
> > > + * Pasha Tatashin <pasha.tatashin@...een.com>
> > > + */
> > > +
> > > +/**
> > > + * DOC: Live Update Orchestrator ABI
> > > + *
> > > + * This header defines the stable Application Binary Interface used by the
> > > + * Live Update Orchestrator to pass state from a pre-update kernel to a
> > > + * post-update kernel. The ABI is built upon the Kexec HandOver framework
> > > + * and uses a Flattened Device Tree to describe the preserved data.
> > > + *
> > > + * This interface is a contract. Any modification to the FDT structure, node
> > > + * properties, compatible strings, or the layout of the `__packed` serialization
> > > + * structures defined here constitutes a breaking change. Such changes require
> > > + * incrementing the version number in the relevant `_COMPATIBLE` string to
> > > + * prevent a new kernel from misinterpreting data from an old kernel.
> >
> > From v6 thread:
> >
> > > > I'd add a sentence that stresses that ABI changes are possible as long they
> > > > include changes to the FDT version.
> > > > This is indeed implied by the last paragraph, but I think it's worth
> > > > spelling it explicitly.
> > > >
> > > > Another thing that I think this should mention is that compatibility is
> > > > only guaranteed for the kernels that use the same ABI version.
> > >
> > > Sure, I will add both.
> >
> > Looks like it fell between the cracks :/
>
> Hm, when I was updating the patches, I included the first part, and
> then re-read the content, and I think it covers all points:
>
> 1. Changes are possible
> This interface is a contract. Any modification to the FDT structure, node
> * properties, compatible strings, or the layout of the `__packed` serialization
> * structures defined here constitutes a breaking change. Such changes require
> * incrementing the version number in the relevant `_COMPATIBLE` string
>
> So, change as long as you update versioning number
>
> 2. Breaking if version is different:
> to prevent a new kernel from misinterpreting data from an old kernel.
>
> So, the next kernel can interpret only if the version is the same.
>
> Which point do you think is not covered?
As I said, it's covered, but it's implied. I'd prefer these stated
explicitly.
> > > +static int __init liveupdate_early_init(void)
> > > +{
> > > + int err;
> > > +
> > > + err = luo_early_startup();
> > > + if (err) {
> > > + luo_global.enabled = false;
> > > + luo_restore_fail("The incoming tree failed to initialize properly [%pe], disabling live update\n",
> > > + ERR_PTR(err));
> >
> > What's wrong with a plain panic()?
>
> Jason suggested using the luo_restore_fail() function instead of
> inserting panic() right in code somewhere in LUOv3 or earlier. It
> helps avoid sprinkling panics in different places, and also in case if
> we add the maintenance mode that we have discussed in LUOv6, we could
> update this function as a place where that mode would be switched on.
I'd agree if we were to have a bunch of panic()s sprinkled in the code.
With a single one it's easier to parse panic() than lookup what
luo_restore_fail() means.
> > > + }
> > > +
> > > + return err;
> > > +}
> > > +early_initcall(liveupdate_early_init);
> > > +
> >
> > ...
> >
> > > int liveupdate_reboot(void)
> > > {
> > > - return 0;
> > > + int err;
> > > +
> > > + if (!liveupdate_enabled())
> > > + return 0;
> > > +
> > > + err = kho_finalize();
> > > + if (err) {
> > > + pr_err("kho_finalize failed %d\n", err);
> >
> > Nit: why not %pe?
>
> I believe, before my last clean-up of KHO it could return FDT error in
> addition to standard errno; but anyways, this code is going to be
> removed soon with stateless KHO, keeping err instead of %pe is fine (I
> can change this if I update this patch).
Nah, %d is ok.
> > > + /*
> > > + * kho_finalize() may return libfdt errors, to aboid passing to
> > > + * userspace unknown errors, change this to EAGAIN.
> > > + */
> > > + err = -EAGAIN;
> > > + }
> > > +
> > > + return err;
> > > }
> > >
> > > /**
> > > diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
> > > new file mode 100644
> > > index 000000000000..8612687b2000
> > > --- /dev/null
> > > +++ b/kernel/liveupdate/luo_internal.h
> > > @@ -0,0 +1,22 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +
> > > +/*
> > > + * Copyright (c) 2025, Google LLC.
> > > + * Pasha Tatashin <pasha.tatashin@...een.com>
> > > + */
> > > +
> > > +#ifndef _LINUX_LUO_INTERNAL_H
> > > +#define _LINUX_LUO_INTERNAL_H
> > > +
> > > +#include <linux/liveupdate.h>
> > > +
> > > +/*
> > > + * Handles a deserialization failure: devices and memory is in unpredictable
> > > + * state.
> > > + *
> > > + * Continuing the boot process after a failure is dangerous because it could
> > > + * lead to leaks of private data.
> > > + */
> > > +#define luo_restore_fail(__fmt, ...) panic(__fmt, ##__VA_ARGS__)
> >
> > Let's add this when we have more than a single callsite.
> > Just use panic() in liveupdate_early_init() and add the comment there.
>
> https://lore.kernel.org/all/CA+CK2bBEX6C6v63DrK-Fx2sE7fvLTZM=HX0y_j4aVDYcfrCXOg@mail.gmail.com/
>
> This is the reason I added this function. I like the current approach.
v2 had way more than a single panic(), then it made sense
> Pasha
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists