linux-kernel - Re: [RFC v2 05/16] luo: luo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bCA_ggpvbY+MQPaAHsN7MOzV7D3=MYvfAP4cFwhThJpPw@mail.gmail.com>
Date: Fri, 20 Jun 2025 12:03:59 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Pratyush Yadav <pratyush@...nel.org>
Cc: Mike Rapoport <rppt@...nel.org>, Jason Gunthorpe <jgg@...pe.ca>, jasonmiu@...gle.com, graf@...zon.com, 
	changyuanl@...gle.com, dmatlack@...gle.com, rientjes@...gle.com, 
	corbet@....net, rdunlap@...radead.org, ilpo.jarvinen@...ux.intel.com, 
	kanie@...ux.alibaba.com, ojeda@...nel.org, aliceryhl@...gle.com, 
	masahiroy@...nel.org, akpm@...ux-foundation.org, tj@...nel.org, 
	yoann.congal@...le.fr, mmaurer@...gle.com, roman.gushchin@...ux.dev, 
	chenridong@...wei.com, axboe@...nel.dk, mark.rutland@....com, 
	jannh@...gle.com, vincent.guittot@...aro.org, hannes@...xchg.org, 
	dan.j.williams@...el.com, david@...hat.com, joel.granados@...nel.org, 
	rostedt@...dmis.org, anna.schumaker@...cle.com, song@...nel.org, 
	zhangguopeng@...inos.cn, linux@...ssschuh.net, linux-kernel@...r.kernel.org, 
	linux-doc@...r.kernel.org, linux-mm@...ck.org, gregkh@...uxfoundation.org, 
	tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, 
	dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, rafael@...nel.org, 
	dakr@...nel.org, bartosz.golaszewski@...aro.org, cw00.choi@...sung.com, 
	myungjoo.ham@...sung.com, yesanishhere@...il.com, Jonathan.Cameron@...wei.com, 
	quic_zijuhu@...cinc.com, aleksander.lobakin@...el.com, ira.weiny@...el.com, 
	andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de, 
	bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com, 
	stuart.w.hayes@...il.com
Subject: Re: [RFC v2 05/16] luo: luo_core: integrate with KHO

On Fri, Jun 20, 2025 at 11:28 AM Pratyush Yadav <pratyush@...nel.org> wrote:
>
> Hi Pasha,
>
> On Thu, Jun 19 2025, Pasha Tatashin wrote:
>
> [...]
> >> And it has to be done before kexec load, at least until we resolve this.
> >
> > The before kexec load constrained has been fixed. The only
> > "finalization" constraint we have is it should be before
> > reboot(LINUX_REBOOT_CMD_KEXEC) and only because memory allocations
> > during kernel shutdown are undesirable. Once KHO moves away from a
> > monolithic state machine this constraint disappears. Kernel components
> > could preserve their resources at appropriate times, not necessarily
> > tied to a shutdown-time. For live update scenarios, LUO already
> > orchestrates this timing.
> >
> >> Currently this is triggered either by KHO debugfs or by LUO ioctls. If we
> >> completely drop KHO debugfs and notifiers, we still need something that
> >> would trigger the magic.
> >
> > An external "magic trigger" for KHO (like the current finalize
> > notifier or debugfs command) is necessary for scenarios like live
> > update, where userspace resources are being preserved in a coordinated
> > fashion just before kexec.
> >
> > For kernel-internal resources that are unrelated to such a
> > userspace-driven live update flow, the respective kernel components
> > should directly use KHO's primitive preservation APIs
> > (kho_preserve_folio, etc.) when they need to mark their resources for
> > handover. No separate, state machine or external trigger should be
> > required for these individual, self-contained preservation acts.
>

Hi Pratyush,

> For kernel-internal components, I think this makes a lot of sense,
> especially now that we don't need to get everything done by kexec load
> time. I suppose the liveupdate_reboot() call at reboot time to prepare
> final things can be useful, but subsystems can just as well register
> reboot notifiers to get the same notification.

Correct. If subsystems unrelated to the userspace live update flow,
such as pstore, tracing, telemetry, debugging, or IMA, need to be
notified about a reboot, they can simply register their own reboot
notifier.

> >> I'm not saying we should keep KHO debugfs and notifiers, I'm saying that if
> >> we make LUO the only thing driving KHO, liveupdate is not an appropriate
> >> name.
> >
> > LUO drives KHO specifically for the purpose of live updates. If a
> > different userspace use-case emerges that needs another distinct
> > purpose (e.g., not to preserve a FD a or a device across kernel reboot
> > (i.e. something for which LUO does not provide uAPI)), then that would
> > probably need a separate from LUO uAPI instead of extending the LUO
> > uAPI.
>
> Outside of hypervisor live update, I have a very clear use case in mind:
> userspace memory handover (on guest side). Say a guest running an
> in-memory cache like memcached with many gigabytes of cache wants to
> reboot. It can just shove the cache into a memfd, give it to LUO, and
> restore it after reboot. Some services that suffer from long reboots are
> looking into using this to reduce downtime. Since it pretty much
> overlaps with the hypervisor work for now, I haven't been talking about
> it as much.
>
> Would you also call this use case "live update"? Does it also fit with
> your vision of where LUO should go?

Yes, absolutely. The use case you described (preserving a memcached
instance via memfd) is a perfect fit for LUO's vision.

While the primary use case driving this work is supporting the
preservation of virtual machines on a hypervisor, the framework itself
is not restricted to that scenario. We define "live update" as the
process of updating the kernel from one version to another while
preserving FD-based resources and keeping selected devices
operational. The machine itself can be running storage, database,
networking, containers, or anything else.

A good parallel is Kernel Live Patching: we don't distinguish what
workload is running on a machine when applying a security patch; we
simply patch the running kernel. In the same way, Live Update is
designed to be workload-agnostic. Whether the system is running an
in-memory database, containers, or VMs, its primary goal is to enable
a full kernel update while preserving the userspace-requested state.

Thanks,
Pasha