lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+CK2bDqO4SkUpiFahfUx2MUiE8oae9HmuaghPAnCwaJZpoTwQ@mail.gmail.com>
Date: Thu, 19 Jun 2025 10:22:52 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: Pratyush Yadav <pratyush@...nel.org>, Jason Gunthorpe <jgg@...pe.ca>, jasonmiu@...gle.com, 
	graf@...zon.com, changyuanl@...gle.com, dmatlack@...gle.com, 
	rientjes@...gle.com, corbet@....net, rdunlap@...radead.org, 
	ilpo.jarvinen@...ux.intel.com, kanie@...ux.alibaba.com, ojeda@...nel.org, 
	aliceryhl@...gle.com, masahiroy@...nel.org, akpm@...ux-foundation.org, 
	tj@...nel.org, yoann.congal@...le.fr, mmaurer@...gle.com, 
	roman.gushchin@...ux.dev, chenridong@...wei.com, axboe@...nel.dk, 
	mark.rutland@....com, jannh@...gle.com, vincent.guittot@...aro.org, 
	hannes@...xchg.org, dan.j.williams@...el.com, david@...hat.com, 
	joel.granados@...nel.org, rostedt@...dmis.org, anna.schumaker@...cle.com, 
	song@...nel.org, zhangguopeng@...inos.cn, linux@...ssschuh.net, 
	linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, linux-mm@...ck.org, 
	gregkh@...uxfoundation.org, tglx@...utronix.de, mingo@...hat.com, 
	bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, 
	rafael@...nel.org, dakr@...nel.org, bartosz.golaszewski@...aro.org, 
	cw00.choi@...sung.com, myungjoo.ham@...sung.com, yesanishhere@...il.com, 
	Jonathan.Cameron@...wei.com, quic_zijuhu@...cinc.com, 
	aleksander.lobakin@...el.com, ira.weiny@...el.com, 
	andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de, 
	bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com, 
	stuart.w.hayes@...il.com
Subject: Re: [RFC v2 05/16] luo: luo_core: integrate with KHO

> > > I disagree, LUO is for liveupdate flows, and is designed specifically
> > > around the live update flows: brownout/blackout/post-liveupdate, it
> > > should not be generalized to anticipate some other random states, and
> > > it should only support participants that are related to live update:
> > > iommufd/vfiofd/kvmfd/memfd/eventfd and controled via "liveupdated" the
> > > userspace agent.
>
> But it's not how the things work. Once there's an API anyone can use it,
> right?
>
> How do you intend to restrict this API usage to subsystems that are related
> to the live update flow? Or userspace driving ioctls outside "liveupdated"
> user agent?

Hi Mike,

LUO provides both kernel and user APIs specifically for live update
scenarios.  Live Update is an ability to reboot kernel while keeping
some devices operations and FDs intact. That is the only uAPI that LUO
provides, It enables users to preserve resources via FDs for memfd,
vfiofd, guestmemfd, kvmfd, eventfd, and any other supported FD. It
also provides a well defined state machine for user to add an retrieve
the resources, and for kernel to do proper serialization of these
resources. Since this is the only uAPI that LUO provides, I do not see
how it can be used for other scenarios.

> There are a lot of examples of kernel subsystems that were designed for a
> particular thing and later were extended to support additional use cases.

If that ever becomes necessary, either the core part would need to be
moved out to be a separate thing, or a separate state machine on top
of KHO targeting that use case would need to be developed.

Currently, I don't see an immediate need for this, especially if KHO
itself is updated so the state machine is removed, and therefore
finalization is not required.

> I'm not saying LUO should "anticipate some other random states", what I'm
> saying is that usecases other than liveupdate may appear and use the APIs
> LUO provides for something else.
>
> > > KHO is for preserving memory, LUO uses KHO as a backbone for Live Update.
>
> If we make LUO the only uABI to drive KHO it becomes misnamed from the
> start.
> As you mentioned yourself, reserve_mem and potentially IMA and kexec

Kernel-internal components like pstore/reserve_mem or IMA do not
require a uAPI to drive their KHO interactions. They can, and should,
directly use KHO's kernel-level APIs kho_preserve_folio() and
kho_restore_folio().

KHO itself must offer these preservation primitives, rather than
embedding a state machine that dictates a single "finalize" point for
all users.

> pstore can use reserve_mem already.

That's good to know; I'll investigate how pstore currently utilizes
reserve_mem. My current approach involves reserving the memmap for
pstore via kernel parameters.

> > So currently, KHO provides the following two types of  internal API:
> >
> > Preserve memory and metadata
> > =========================
> > kho_preserve_folio() / kho_preserve_phys()
> > kho_unpreserve_folio() / kho_unpreserve_phys()
> > kho_restore_folio()
> >
> > kho_add_subtree() kho_retrieve_subtree()
> >
> > State machine
> > ===========
> > register_kho_notifier() / unregister_kho_notifier()
> >
> > kho_finalize() / kho_abort()
> >
> > We should remove the "State machine", and only keep the "Preserve
> > Memory" API functions. At the time these functions are called, KHO
> > should do the magic of making sure that the memory gets preserved
> > across the reboot.
> >
> > This way, reserve_mem_init() would call: kho_preserve_folio() and
> > kho_add_subtree() during boot, and be done with it.
>
> Right, but we still need something to drive kho_mem_serialize().

My view is that an explicit, global kho_mem_serialize() call driven
externally (like by LUO or debugfs) is not necessary for KHO
operations.

When kho_preserve_folio() or kho_add_subtree() is called, KHO itself
should perform the immediate actions required to ensure that specific
folio or subtree metadata is staged for preservation across a kexec.
Similarly, kho_unpreserve_folio() or kho_remove_subtree() (which is
currently missing from the KHO API) should immediately update KHO's
state to reflect that the item is no longer preserved.

> And it has to be done before kexec load, at least until we resolve this.

The before kexec load constrained has been fixed. The only
"finalization" constraint we have is it should be before
reboot(LINUX_REBOOT_CMD_KEXEC) and only because memory allocations
during kernel shutdown are undesirable. Once KHO moves away from a
monolithic state machine this constraint disappears. Kernel components
could preserve their resources at appropriate times, not necessarily
tied to a shutdown-time. For live update scenarios, LUO already
orchestrates this timing.

> Currently this is triggered either by KHO debugfs or by LUO ioctls. If we
> completely drop KHO debugfs and notifiers, we still need something that
> would trigger the magic.

An external "magic trigger" for KHO (like the current finalize
notifier or debugfs command) is necessary for scenarios like live
update, where userspace resources are being preserved in a coordinated
fashion just before kexec.

For kernel-internal resources that are unrelated to such a
userspace-driven live update flow, the respective kernel components
should directly use KHO's primitive preservation APIs
(kho_preserve_folio, etc.) when they need to mark their resources for
handover. No separate, state machine or external trigger should be
required for these individual, self-contained preservation acts.

> I'm not saying we should keep KHO debugfs and notifiers, I'm saying that if
> we make LUO the only thing driving KHO, liveupdate is not an appropriate
> name.

LUO drives KHO specifically for the purpose of live updates. If a
different userspace use-case emerges that needs another distinct
purpose (e.g., not to preserve a FD a or a device across kernel reboot
(i.e. something for which LUO does not provide uAPI)), then that would
probably need a separate from LUO uAPI instead of extending the LUO
uAPI.

Pasha

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ