linux-kernel - Re: [RFC v2 05/16] luo: luo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bAnCRu+k=Q78eA4kcAebxA9NgOorhwRqu-WxC913YBsBw@mail.gmail.com>
Date: Wed, 18 Jun 2025 13:00:52 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: Pratyush Yadav <pratyush@...nel.org>, Jason Gunthorpe <jgg@...pe.ca>, jasonmiu@...gle.com, 
	graf@...zon.com, changyuanl@...gle.com, dmatlack@...gle.com, 
	rientjes@...gle.com, corbet@....net, rdunlap@...radead.org, 
	ilpo.jarvinen@...ux.intel.com, kanie@...ux.alibaba.com, ojeda@...nel.org, 
	aliceryhl@...gle.com, masahiroy@...nel.org, akpm@...ux-foundation.org, 
	tj@...nel.org, yoann.congal@...le.fr, mmaurer@...gle.com, 
	roman.gushchin@...ux.dev, chenridong@...wei.com, axboe@...nel.dk, 
	mark.rutland@....com, jannh@...gle.com, vincent.guittot@...aro.org, 
	hannes@...xchg.org, dan.j.williams@...el.com, david@...hat.com, 
	joel.granados@...nel.org, rostedt@...dmis.org, anna.schumaker@...cle.com, 
	song@...nel.org, zhangguopeng@...inos.cn, linux@...ssschuh.net, 
	linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, linux-mm@...ck.org, 
	gregkh@...uxfoundation.org, tglx@...utronix.de, mingo@...hat.com, 
	bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, 
	rafael@...nel.org, dakr@...nel.org, bartosz.golaszewski@...aro.org, 
	cw00.choi@...sung.com, myungjoo.ham@...sung.com, yesanishhere@...il.com, 
	Jonathan.Cameron@...wei.com, quic_zijuhu@...cinc.com, 
	aleksander.lobakin@...el.com, ira.weiny@...el.com, 
	andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de, 
	bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com, 
	stuart.w.hayes@...il.com
Subject: Re: [RFC v2 05/16] luo: luo_core: integrate with KHO

On Wed, Jun 18, 2025 at 12:40 PM Mike Rapoport <rppt@...nel.org> wrote:
>
> On Wed, Jun 18, 2025 at 10:48:09AM -0400, Pasha Tatashin wrote:
> > On Wed, Jun 18, 2025 at 9:12 AM Pratyush Yadav <pratyush@...nel.org> wrote:
> > >
> > > On Tue, Jun 17 2025, Pasha Tatashin wrote:
> > >
> > > > On Tue, Jun 17, 2025 at 11:24 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
> > > >>
> > > >> On Fri, Jun 13, 2025 at 04:58:27PM +0200, Pratyush Yadav wrote:
> > > >> > On Sat, Jun 07 2025, Pasha Tatashin wrote:
> > > >> > [...]
> > > >> > >>
> > > >> > >> This weirdness happens because luo_prepare() and luo_cancel() control
> > > >> > >> the KHO state machine, but then also get controlled by it via the
> > > >> > >> notifier callbacks. So the relationship between then is not clear.
> > > >> > >> __luo_prepare() at least needs access to struct kho_serialization, so it
> > > >> > >> needs to come from the callback. So I don't have a clear way to clean
> > > >> > >> this all up off the top of my head.
> > > >> > >
> > > >> > > On production machine, without KHO_DEBUGFS, only LUO can control KHO
> > > >> > > state, but if debugfs is enabled, KHO can be finalized manually, and
> > > >> > > in this case LUO transitions to prepared state. In both cases, the
> > > >> > > path is identical. The KHO debugfs path is only for
> > > >> > > developers/debugging purposes.
> > > >> >
> > > >> > What I meant is that even without KHO_DEBUGFS, LUO drives KHO, but then
> > > >> > KHO calls into LUO from the notifier, which makes the control flow
> > > >> > somewhat convoluted. If LUO is supposed to be the only thing that
> > > >> > interacts directly with KHO, maybe we should get rid of the notifier and
> > > >> > only let LUO drive things.
> > > >>
> > > >> Yes, we should. I think we should consider the KHO notifiers and self
> > > >> orchestration as obsoleted by LUO. That's why it was in debugfs
> > > >> because we were not ready to commit to it.
> > > >
> > > > We could do that, however, there is one example KHO user
> > > > `reserve_mem`, that is also not liveupdate related. So, it should
> > > > either be removed or modified to be handled by LUO.
> > >
> > > It still depends on kho_finalize() being called, so it still needs
> > > something to trigger its serialization. It is not automatic. And with
> > > your proposed patch to make debugfs interface optional, it can't even be
> > > used with the config disabled.
> >
> > At least for now, it can still be used via LUO going into prepare
> > state, since LUO changes KHO into finalized state and reserve_mem is
> > registered to be called back from KHO.
> >
> > > So if it must be explicitly triggered to be preserved, why not let the
> > > trigger point be LUO instead of KHO? You can make reservemem a LUO
> > > subsystem instead.
> >
> > Yes, LUO can do that, the only concern I raised is that  `reserve_mem`
> > is not really live update related.
>
> I only now realized what bothered me about "liveupdate". It's the name of
> the driving usecase rather then the name of the technology it implements.
> In the end what LUO does is a (more) sophisticated control for KHO.
>
> But essentially it's not that it actually implements live update, it
> provides kexec handover control plane that enables live update.
>
> And since the same machinery can be used regardless of live update, and I'm
> sure other usecases will appear as soon as the technology will become more
> mature, it makes me think that we probably should just
> s/liveupdate_/kho_control/g or something along those lines.

I disagree, LUO is for liveupdate flows, and is designed specifically
around the live update flows: brownout/blackout/post-liveupdate, it
should not be generalized to anticipate some other random states, and
it should only support participants that are related to live update:
iommufd/vfiofd/kvmfd/memfd/eventfd and controled via "liveupdated" the
userspace agent.

KHO is for preserving memory, LUO uses KHO as a backbone for Live Update.

> > > Although to be honest, things like reservemem (or IMA perhaps?) don't
> > > really fit well with the explicit trigger mechanism. They can be carried
> >
> > Agreed. Another example I was thinking about is "kexec telemetry":
> > precise time information about kexec, including shutdown, purgatory,
> > boot. We are planning to propose kexec telemetry, and it could be LUO
> > subsystem. On the other hand, it could be useful even without live
> > update, just to measure precise kexec reboot time.
> >
> > > across kexec without needing userspace explicitly driving it. Maybe we
> > > allow LUO subsystems to mark themselves as auto-preservable and LUO will
> > > preserve them regardless of state being prepared? Something to think
> > > about later down the line I suppose.
> >
> > We can start with adding `reserve_mem` as regular subsystem, and make
> > this auto-preserve option a future expansion, when if needed.
> > Presumably, `luoctl prepare` would work for whoever plans to use just
> > `reserve_mem`.
>
> I think it would be nice to support auto-preserve sooner than later.

Makes sense.

> reserve_mem can already be useful for ftrace and pstore folks and if it
> would survive a kexec without any userspace intervention it would be great.

The pstore use case is only potential, correct? Or can it already use
reserve_mem?

Pasha