[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bB6F634HCw_N5z9E5r_LpbGJrucuFb_5fL4da5_W99e4Q@mail.gmail.com>
Date: Thu, 9 Oct 2025 19:50:12 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Pratyush Yadav <pratyush@...nel.org>
Cc: jasonmiu@...gle.com, graf@...zon.com, changyuanl@...gle.com,
rppt@...nel.org, dmatlack@...gle.com, rientjes@...gle.com, corbet@....net,
rdunlap@...radead.org, ilpo.jarvinen@...ux.intel.com, kanie@...ux.alibaba.com,
ojeda@...nel.org, aliceryhl@...gle.com, masahiroy@...nel.org,
akpm@...ux-foundation.org, tj@...nel.org, yoann.congal@...le.fr,
mmaurer@...gle.com, roman.gushchin@...ux.dev, chenridong@...wei.com,
axboe@...nel.dk, mark.rutland@....com, jannh@...gle.com,
vincent.guittot@...aro.org, hannes@...xchg.org, dan.j.williams@...el.com,
david@...hat.com, joel.granados@...nel.org, rostedt@...dmis.org,
anna.schumaker@...cle.com, song@...nel.org, zhangguopeng@...inos.cn,
linux@...ssschuh.net, linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
linux-mm@...ck.org, gregkh@...uxfoundation.org, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org,
hpa@...or.com, rafael@...nel.org, dakr@...nel.org,
bartosz.golaszewski@...aro.org, cw00.choi@...sung.com,
myungjoo.ham@...sung.com, yesanishhere@...il.com, Jonathan.Cameron@...wei.com,
quic_zijuhu@...cinc.com, aleksander.lobakin@...el.com, ira.weiny@...el.com,
andriy.shevchenko@...ux.intel.com, leon@...nel.org, lukas@...ner.de,
bhelgaas@...gle.com, wagi@...nel.org, djeffery@...hat.com,
stuart.w.hayes@...il.com, lennart@...ttering.net, brauner@...nel.org,
linux-api@...r.kernel.org, linux-fsdevel@...r.kernel.org, saeedm@...dia.com,
ajayachandra@...dia.com, jgg@...dia.com, parav@...dia.com, leonro@...dia.com,
witu@...dia.com, hughd@...gle.com, skhawaja@...gle.com, chrisl@...nel.org,
steven.sistare@...cle.com
Subject: Re: [PATCH v4 00/30] Live Update Orchestrator
On Thu, Oct 9, 2025 at 6:58 PM Pratyush Yadav <pratyush@...nel.org> wrote:
>
> On Tue, Oct 07 2025, Pasha Tatashin wrote:
>
> > On Sun, Sep 28, 2025 at 9:03 PM Pasha Tatashin
> > <pasha.tatashin@...een.com> wrote:
> >>
> [...]
> > 4. New File-Lifecycle-Bound Global State
> > ----------------------------------------
> > A new mechanism for managing global state was proposed, designed to be
> > tied to the lifecycle of the preserved files themselves. This would
> > allow a file owner (e.g., the IOMMU subsystem) to save and retrieve
> > global state that is only relevant when one or more of its FDs are
> > being managed by LUO.
>
> Is this going to replace LUO subsystems? If yes, then why? The global
> state will likely need to have its own lifecycle just like the FDs, and
> subsystems are a simple and clean abstraction to control that. I get the
> idea of only "activating" a subsystem when one or more of its FDs are
> participating in LUO, but we can do that while keeping subsystems
> around.
Thanks for the feedback. The FLB Global State is not replacing the LUO
subsystems. On the contrary, it's a higher-level abstraction that is
itself implemented as a LUO subsystem. The goal is to provide a
solution for a pattern that emerged during the PCI and IOMMU
discussions.
You can see the WIP implementation here, which shows it registering as
a subsystem named "luo-fh-states-v1-struct":
https://github.com/soleen/linux/commit/94e191aab6b355d83633718bc4a1d27dda390001
The existing subsystem API is a low-level tool that provides for the
preservation of a raw 8-byte handle. It doesn't provide locking, nor
is it explicitly tied to the lifecycle of any higher-level object like
a file handler. The new API is designed to solve a more specific
problem: allowing global components (like IOMMU or PCI) to
automatically track when resources relevant to them are added to or
removed from preservation. If HugeTLB requires a subsystem, it can
still use it, but I suspect it might benefit from FLB Global State as
well.
> Here is how I imagine the proposed API would compare against subsystems
> with hugetlb as an example (hugetlb support is still WIP, so I'm still
> not clear on specifics, but this is how I imagine it will work):
>
> - Hugetlb subsystem needs to track its huge page pools and which pages
> are allocated and free. This is its global state. The pools get
> reconstructed after kexec. Post-kexec, the free pages are ready for
> allocation from other "regular" files and the pages used in LUO files
> are reserved.
>
> - Pre-kexec, when a hugetlb FD is preserved, it marks that as preserved
> in hugetlb's global data structure tracking this. This is runtime data
> (say xarray), and _not_ serialized data. Reason being, there are
> likely more FDs to come so no point in wasting time serializing just
> yet.
>
> This can look something like:
>
> hugetlb_luo_preserve_folio(folio, ...);
>
> Nice and simple.
>
> Compare this with the new proposed API:
>
> liveupdate_fh_global_state_get(h, &hugetlb_data);
> // This will have update serialized state now.
> hugetlb_luo_preserve_folio(hugetlb_data, folio, ...);
> liveupdate_fh_global_state_put(h);
>
> We do the same thing but in a very complicated way.
>
> - When the system-wide preserve happens, the hugetlb subsystem gets a
> callback to serialize. It converts its runtime global state to
> serialized state since now it knows no more FDs will be added.
>
> With the new API, this doesn't need to be done since each FD prepare
> already updates serialized state.
>
> - If there are no hugetlb FDs, then the hugetlb subsystem doesn't put
> anything in LUO. This is same as new API.
>
> - If some hugetlb FDs are not restored after liveupdate and the finish
> event is triggered, the subsystem gets its finish() handler called and
> it can free things up.
>
> I don't get how that would work with the new API.
The new API isn't more complicated; It codifies the common pattern of
"create on first use, destroy on last use" into a reusable helper,
saving each file handler from having to reinvent the same reference
counting and locking scheme. But, as you point out, subsystems provide
more control, specifically they handle full creation/free instead of
relying on file-handlers for that.
> My point is, I see subsystems working perfectly fine here and I don't
> get how the proposed API is any better.
>
> Am I missing something?
No, I don't think you are. Your analysis is correct that this is
achievable with subsystems. The goal of the new API is to make that
specific, common use case simpler.
Pasha
Powered by blists - more mailing lists