[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250805123103.GH184255@nvidia.com>
Date: Tue, 5 Aug 2025 09:31:03 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: pratyush@...nel.org, jasonmiu@...gle.com, graf@...zon.com,
changyuanl@...gle.com, rppt@...nel.org, dmatlack@...gle.com,
rientjes@...gle.com, corbet@....net, rdunlap@...radead.org,
ilpo.jarvinen@...ux.intel.com, kanie@...ux.alibaba.com,
ojeda@...nel.org, aliceryhl@...gle.com, masahiroy@...nel.org,
akpm@...ux-foundation.org, tj@...nel.org, yoann.congal@...le.fr,
mmaurer@...gle.com, roman.gushchin@...ux.dev, chenridong@...wei.com,
axboe@...nel.dk, mark.rutland@....com, jannh@...gle.com,
vincent.guittot@...aro.org, hannes@...xchg.org,
dan.j.williams@...el.com, david@...hat.com,
joel.granados@...nel.org, rostedt@...dmis.org,
anna.schumaker@...cle.com, song@...nel.org, zhangguopeng@...inos.cn,
linux@...ssschuh.net, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-mm@...ck.org,
gregkh@...uxfoundation.org, tglx@...utronix.de, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org,
hpa@...or.com, rafael@...nel.org, dakr@...nel.org,
bartosz.golaszewski@...aro.org, cw00.choi@...sung.com,
myungjoo.ham@...sung.com, yesanishhere@...il.com,
Jonathan.Cameron@...wei.com, quic_zijuhu@...cinc.com,
aleksander.lobakin@...el.com, ira.weiny@...el.com,
andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de,
bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com,
stuart.w.hayes@...il.com, ptyadav@...zon.de, lennart@...ttering.net,
brauner@...nel.org, linux-api@...r.kernel.org,
linux-fsdevel@...r.kernel.org, saeedm@...dia.com,
ajayachandra@...dia.com, parav@...dia.com, leonro@...dia.com,
witu@...dia.com
Subject: Re: [PATCH v2 10/32] liveupdate: luo_core: Live Update Orchestrator
On Sun, Aug 03, 2025 at 09:11:20PM -0400, Pasha Tatashin wrote:
> Having a global state is necessary for performance optimizations. This
> is similar to why we export the state to userspace via sysfs: it
> allows other subsystems to behave differently during a
> performance-optimized live update versus a normal boot.
> For example, in our code base we have a driver that doesn't
> participate in the live update itself (it has no state to preserve).
> However, during boot, it checks this global state. If it's a live
> update boot, the driver skips certain steps, like loading firmware, to
> accelerate the overall boot time.
TBH, I'm against this. Give the driver a 0 byte state if it wants to
behave differently during live update. We should not be making
implicit things like this.
Plus the usual complaining about building core kernel infrastructure
around weird out of tree drivers.
If userspace wants a device to participate in live update, even just
"optimizations", then it has to opt in.
Frankly, this driver has no idea what the prior kernel did, and by
"optimizing" I think you are actually assuming that the prior kernel
had it bound to a normal kernel driver that left it in some
predictable configuration.
Vs say bound to VFIO and completely messed up.
So this should be represented by a LUO serialization that says "the
prior kernel left this device in well defined state X" even if it
takes 0 bytes to describe that state.
So no globals, there should be a way for a driver to tell if it is
participating in LUO, but not some global 'is luo' boot fla.g
> > + ret = liveupdate_register_subsystem(&luo_file_subsys);
> > + if (ret) {
> > + pr_warn("Failed to register luo_file subsystem [%d]\n", ret);
> > + return ret;
> > + }
> > +
> > + if (liveupdate_state_updated()) {
> >
> > Thats going to be a standard pattern - I would expect that
> > liveupdate_register_subsystem() would do the check for updated and
> > then arrange to call back something like
> > liveupdate_subsystem.ops.post_update()
> >
> > And then post_update() would get the info that is currently under
> > liveupdate_get_subsystem_data() as arguments instead of having to make
> > more functions calls.
> >
> > Maybe even the fdt_node_check_compatible() can be hoisted.
> >
> > That would remove a bunch more liveupdate_state_updated() calls.
>
> That's a good suggestion for a potential refactor. For now, the
> state-check call is inexpensive and is not in a performance-critical
> path. We can certainly implement this optimization later if it becomes
> necessary.
It is not an optimization, it is having a proper logical code
structure that doesn't rely on globals. I'm strongly against
sprinkling globals everywhere in the code when there are simple
logical APIs that entirely avoid it.
When I looked at where there were globals I didn't find any good
justifications, just thinks like this that are poor API design.
Jason
Powered by blists - more mailing lists