lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mafs0sekfts2i.fsf@kernel.org>
Date: Wed, 04 Jun 2025 18:00:37 +0200
From: Pratyush Yadav <pratyush@...nel.org>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: pratyush@...nel.org,  jasonmiu@...gle.com,  graf@...zon.com,
  changyuanl@...gle.com,  rppt@...nel.org,  dmatlack@...gle.com,
  rientjes@...gle.com,  corbet@....net,  rdunlap@...radead.org,
  ilpo.jarvinen@...ux.intel.com,  kanie@...ux.alibaba.com,
  ojeda@...nel.org,  aliceryhl@...gle.com,  masahiroy@...nel.org,
  akpm@...ux-foundation.org,  tj@...nel.org,  yoann.congal@...le.fr,
  mmaurer@...gle.com,  roman.gushchin@...ux.dev,  chenridong@...wei.com,
  axboe@...nel.dk,  mark.rutland@....com,  jannh@...gle.com,
  vincent.guittot@...aro.org,  hannes@...xchg.org,
  dan.j.williams@...el.com,  david@...hat.com,  joel.granados@...nel.org,
  rostedt@...dmis.org,  anna.schumaker@...cle.com,  song@...nel.org,
  zhangguopeng@...inos.cn,  linux@...ssschuh.net,
  linux-kernel@...r.kernel.org,  linux-doc@...r.kernel.org,
  linux-mm@...ck.org,  gregkh@...uxfoundation.org,  tglx@...utronix.de,
  mingo@...hat.com,  bp@...en8.de,  dave.hansen@...ux.intel.com,
  x86@...nel.org,  hpa@...or.com,  rafael@...nel.org,  dakr@...nel.org,
  bartosz.golaszewski@...aro.org,  cw00.choi@...sung.com,
  myungjoo.ham@...sung.com,  yesanishhere@...il.com,
  Jonathan.Cameron@...wei.com,  quic_zijuhu@...cinc.com,
  aleksander.lobakin@...el.com,  ira.weiny@...el.com,
  andriy.shevchenko@...ux.intel.com,  leon@...nel.org,  lukas@...ner.de,
  bhelgaas@...gle.com,  wagi@...nel.org,  djeffery@...hat.com,
  stuart.w.hayes@...il.com
Subject: Re: [RFC v2 05/16] luo: luo_core: integrate with KHO

On Thu, May 15 2025, Pasha Tatashin wrote:

> Integrate the LUO with the KHO framework to enable passing LUO state
> across a kexec reboot.
>
> This patch introduces the following changes:
> - During the KHO finalization phase allocate FDT blob.
> - Populate this FDT with a LUO compatibility string ("luo-v1") and the
>   current LUO state (`luo_state`).
> - Implement a KHO notifier
>
> LUO now depends on `CONFIG_KEXEC_HANDOVER`. The core state transition
> logic (`luo_do_*_calls`) remains unimplemented in this patch.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@...een.com>
> ---
>  drivers/misc/liveupdate/luo_core.c | 222 ++++++++++++++++++++++++++++-
>  1 file changed, 219 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/misc/liveupdate/luo_core.c b/drivers/misc/liveupdate/luo_core.c
> index 919c37b0b4d1..a76e886bc3b1 100644
> --- a/drivers/misc/liveupdate/luo_core.c
> +++ b/drivers/misc/liveupdate/luo_core.c
> @@ -36,9 +36,12 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
>  #include <linux/err.h>
> +#include <linux/kexec_handover.h>
>  #include <linux/kobject.h>
> +#include <linux/libfdt.h>
>  #include <linux/liveupdate.h>
>  #include <linux/rwsem.h>
> +#include <linux/sizes.h>
>  #include <linux/string.h>
>  #include "luo_internal.h"
>  
> @@ -55,6 +58,12 @@ const char *const luo_state_str[] = {
>  
>  bool luo_enabled;
>  
> +static void *luo_fdt_out;
> +static void *luo_fdt_in;
> +#define LUO_FDT_SIZE		SZ_1M
> +#define LUO_KHO_ENTRY_NAME	"LUO"
> +#define LUO_COMPATIBLE		"luo-v1"
> +
>  static int __init early_liveupdate_param(char *buf)
>  {
>  	return kstrtobool(buf, &luo_enabled);
> @@ -79,6 +88,60 @@ static inline void luo_set_state(enum liveupdate_state state)
>  	__luo_set_state(state);
>  }
>  
> +/* Called during the prepare phase, to create LUO fdt tree */
> +static int luo_fdt_setup(struct kho_serialization *ser)
> +{
> +	void *fdt_out;
> +	int ret;
> +
> +	fdt_out = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					   get_order(LUO_FDT_SIZE));

Why not alloc_folio()? KHO already deals with folios so it seems
simpler. The kho_{un,}preserve_folio() functions exist exactly for these
kinds of allocations, so LUO also ends up being a first user. You also
won't end up needing kho_unpreserve_phys() and all the __pa() calls.

> +	if (!fdt_out) {
> +		pr_err("failed to allocate FDT memory\n");
> +		return -ENOMEM;
> +	}
> +
> +	ret = fdt_create_empty_tree(fdt_out, LUO_FDT_SIZE);

You are using FDT read/write functions throughout the series to create
new FDTs. The sequential write functions are generally more efficient
since they are meant for creating new FDT blobs. The read/write
functions are better for modifying an existing FDT blob.

Is there a particular reason you do this?

When using FDT SW functions, the creation of the tree would be something
like:

        fdt_create()
        fdt_finish_reservemap()
        fdt_begin_node()

        // Add stuff to FDT

        fdt_end_node()
        fdt_finish()

In this patch, the FDT does not change much after creation so it doesn't
look like it matters much, but in later patches, the FDT is passed to
luo_files_fdt_setup() and luo_subsystems_fdt_setup() which probably
modify the FDT a fair bit.

> +	if (ret)
> +		goto exit_free;
> +
> +	ret = fdt_setprop(fdt_out, 0, "compatible", LUO_COMPATIBLE,
> +			  strlen(LUO_COMPATIBLE) + 1);

fdt_setprop_string() instead? Or if you change to FDT SW,
fdt_property_string().

> +	if (ret)
> +		goto exit_free;
> +
> +	ret = kho_preserve_phys(__pa(fdt_out), LUO_FDT_SIZE);
> +	if (ret)
> +		goto exit_free;
> +
> +	ret = kho_add_subtree(ser, LUO_KHO_ENTRY_NAME, fdt_out);
> +	if (ret)
> +		goto exit_unpreserve;
> +	luo_fdt_out = fdt_out;
> +
> +	return 0;
> +
> +exit_unpreserve:
> +	kho_unpreserve_phys(__pa(fdt_out), LUO_FDT_SIZE);
> +exit_free:
> +	free_pages((unsigned long)fdt_out, get_order(LUO_FDT_SIZE));
> +	pr_err("failed to prepare LUO FDT: %d\n", ret);
> +
> +	return ret;
> +}
> +
> +static void luo_fdt_destroy(void)
> +{
> +	kho_unpreserve_phys(__pa(luo_fdt_out), LUO_FDT_SIZE);
> +	free_pages((unsigned long)luo_fdt_out, get_order(LUO_FDT_SIZE));
> +	luo_fdt_out = NULL;
> +}
> +
> +static int luo_do_prepare_calls(void)
> +{
> +	return 0;
> +}
> +
>  static int luo_do_freeze_calls(void)
>  {
>  	return 0;
> @@ -88,11 +151,111 @@ static void luo_do_finish_calls(void)
>  {
>  }
>  
> -int luo_prepare(void)
> +static void luo_do_cancel_calls(void)
> +{
> +}
> +
> +static int __luo_prepare(struct kho_serialization *ser)
>  {
> +	int ret;
> +
> +	if (down_write_killable(&luo_state_rwsem)) {
> +		pr_warn("[prepare] event canceled by user\n");
> +		return -EAGAIN;
> +	}
> +
> +	if (!is_current_luo_state(LIVEUPDATE_STATE_NORMAL)) {
> +		pr_warn("Can't switch to [%s] from [%s] state\n",
> +			luo_state_str[LIVEUPDATE_STATE_PREPARED],
> +			LUO_STATE_STR);
> +		ret = -EINVAL;
> +		goto exit_unlock;
> +	}
> +
> +	ret = luo_fdt_setup(ser);
> +	if (ret)
> +		goto exit_unlock;
> +
> +	ret = luo_do_prepare_calls();
> +	if (ret)
> +		goto exit_unlock;

With subsystems/filesystems support in place, this can fail. But since
luo_fdt_setup() called kho_add_subtree(), the debugfs file stays around,
and later calls to __luo_prepare() fail because the next
kho_add_subtree() tries to create a debugfs file that already exists. So
you would see an error like below:

    [  767.339920] debugfs: File 'LUO' in directory 'sub_fdts' already present!
    [  767.340613] luo_core: failed to prepare LUO FDT: -17
    [  767.341071] KHO: Failed to convert KHO state tree: -17
    [  767.341593] luo_core: Can't switch to [normal] from [normal] state
    [  767.342175] KHO: Failed to abort KHO finalization: -22

You probably need a kho_remove_subtree() that can be called from the
error paths.

Note that __luo_cancel() is called because failure in a KHO finalize
notifier calls the abort notifiers.

This is also something to fix, since if prepare fails, all other KHO
users who are already serialized won't even get to abort.

This weirdness happens because luo_prepare() and luo_cancel() control
the KHO state machine, but then also get controlled by it via the
notifier callbacks. So the relationship between then is not clear.
__luo_prepare() at least needs access to struct kho_serialization, so it
needs to come from the callback. So I don't have a clear way to clean
this all up off the top of my head.

I suppose one way to fix this would be to move the state check to
luo_cancel() instead. It would probably fix this problem but won't
actually do anything about the murky hierarchy between KHO and LUO.

> +
> +	luo_set_state(LIVEUPDATE_STATE_PREPARED);
> +
> +exit_unlock:
> +	up_write(&luo_state_rwsem);
> +
> +	return ret;
> +}
> +
> +static int __luo_cancel(void)
> +{
> +	if (down_write_killable(&luo_state_rwsem)) {
> +		pr_warn("[cancel] event canceled by user\n");
> +		return -EAGAIN;
> +	}
> +
> +	if (!is_current_luo_state(LIVEUPDATE_STATE_PREPARED) &&
> +	    !is_current_luo_state(LIVEUPDATE_STATE_FROZEN)) {
> +		pr_warn("Can't switch to [%s] from [%s] state\n",
> +			luo_state_str[LIVEUPDATE_STATE_NORMAL],
> +			LUO_STATE_STR);
> +		up_write(&luo_state_rwsem);
> +
> +		return -EINVAL;
> +	}
> +
> +	luo_do_cancel_calls();
> +	luo_fdt_destroy();
> +	luo_set_state(LIVEUPDATE_STATE_NORMAL);
> +
> +	up_write(&luo_state_rwsem);
> +
>  	return 0;
>  }
>  
> +static int luo_kho_notifier(struct notifier_block *self,
> +			    unsigned long cmd, void *v)
> +{
> +	int ret;
> +
> +	switch (cmd) {
> +	case KEXEC_KHO_FINALIZE:
> +		ret = __luo_prepare((struct kho_serialization *)v);
> +		break;
> +	case KEXEC_KHO_ABORT:
> +		ret = __luo_cancel();
> +		break;
> +	default:
> +		return NOTIFY_BAD;
> +	}
> +
> +	return notifier_from_errno(ret);
> +}
> +
> +static struct notifier_block luo_kho_notifier_nb = {
> +	.notifier_call = luo_kho_notifier,
> +};
> +
> +/**
> + * luo_prepare - Initiate the live update preparation phase.
> + *
> + * This function is called to begin the live update process. It attempts to
> + * transition the luo to the ``LIVEUPDATE_STATE_PREPARED`` state.
> + *
> + * If the calls complete successfully, the orchestrator state is set
> + * to ``LIVEUPDATE_STATE_PREPARED``. If any  call fails a
> + * ``LIVEUPDATE_CANCEL`` is sent to roll back any actions.
> + *
> + * @return 0 on success, ``-EAGAIN`` if the state change was cancelled by the
> + * user while waiting for the lock, ``-EINVAL`` if the orchestrator is not in
> + * the normal state, or a negative error code returned by the calls.
> + */
> +int luo_prepare(void)
> +{
> +	return kho_finalize();
> +}
> +
>  /**
>   * luo_freeze() - Initiate the final freeze notification phase for live update.
>   *
> @@ -188,9 +351,23 @@ int luo_finish(void)
>  	return 0;
>  }
>  
> +/**
> + * luo_cancel - Cancel the ongoing live update from prepared or frozen states.
> + *
> + * This function is called to abort a live update that is currently in the
> + * ``LIVEUPDATE_STATE_PREPARED`` state.
> + *
> + * If the state is correct, it triggers the ``LIVEUPDATE_CANCEL`` notifier chain
> + * to allow subsystems to undo any actions performed during the prepare or
> + * freeze events. Finally, the orchestrator state is transitioned back to
> + * ``LIVEUPDATE_STATE_NORMAL``.
> + *
> + * @return 0 on success, or ``-EAGAIN`` if the state change was cancelled by the
> + * user while waiting for the lock.
> + */
>  int luo_cancel(void)
>  {
> -	return 0;
> +	return kho_abort();
>  }
>  
>  void luo_state_read_enter(void)
> @@ -205,7 +382,46 @@ void luo_state_read_exit(void)
>  
>  static int __init luo_startup(void)
>  {
> -	__luo_set_state(LIVEUPDATE_STATE_NORMAL);
> +	phys_addr_t fdt_phys;
> +	int ret;
> +
> +	if (!kho_is_enabled()) {
> +		if (luo_enabled)
> +			pr_warn("Disabling liveupdate because KHO is disabled\n");
> +		luo_enabled = false;
> +		return 0;
> +	}
> +
> +	ret = register_kho_notifier(&luo_kho_notifier_nb);
> +	if (ret) {
> +		luo_enabled = false;

You set luo_enabled to false here, but none of the LUO entry points like
luo_prepare() or luo_freeze() actually check it. So LUO will appear work
just fine even when it hasn't initialized properly.

> +		pr_warn("Failed to register with KHO [%d]\n", ret);

I guess you don't return here so a previous liveupdate can still be
recovered, even though we won't be able to make the next one. If so, a
comment would be nice to point this out.

> +	}
> +
> +	/*
> +	 * Retrieve LUO subtree, and verify its format.  Panic in case of
> +	 * exceptions, since machine devices and memory is in unpredictable
> +	 * state.
> +	 */
> +	ret = kho_retrieve_subtree(LUO_KHO_ENTRY_NAME, &fdt_phys);
> +	if (ret) {
> +		if (ret != -ENOENT) {
> +			panic("failed to retrieve FDT '%s' from KHO: %d\n",
> +			      LUO_KHO_ENTRY_NAME, ret);
> +		}
> +		__luo_set_state(LIVEUPDATE_STATE_NORMAL);
> +
> +		return 0;
> +	}
> +
> +	luo_fdt_in = __va(fdt_phys);
> +	ret = fdt_node_check_compatible(luo_fdt_in, 0, LUO_COMPATIBLE);
> +	if (ret) {
> +		panic("FDT '%s' is incompatible with '%s' [%d]\n",
> +		      LUO_KHO_ENTRY_NAME, LUO_COMPATIBLE, ret);
> +	}
> +
> +	__luo_set_state(LIVEUPDATE_STATE_UPDATED);
>  
>  	return 0;
>  }

-- 
Regards,
Pratyush Yadav

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ