[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2025100225-abridge-shifty-3d50@gregkh>
Date: Thu, 2 Oct 2025 08:09:11 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: Chris Li <chrisl@...nel.org>, Jason Gunthorpe <jgg@...pe.ca>,
Bjorn Helgaas <bhelgaas@...gle.com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Danilo Krummrich <dakr@...nel.org>, Len Brown <lenb@...nel.org>,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
linux-acpi@...r.kernel.org, David Matlack <dmatlack@...gle.com>,
Pasha Tatashin <tatashin@...gle.com>,
Jason Miu <jasonmiu@...gle.com>, Vipin Sharma <vipinsh@...gle.com>,
Saeed Mahameed <saeedm@...dia.com>,
Adithya Jayachandran <ajayachandra@...dia.com>,
Parav Pandit <parav@...dia.com>, William Tu <witu@...dia.com>,
Mike Rapoport <rppt@...nel.org>, Leon Romanovsky <leon@...nel.org>
Subject: Re: [PATCH v2 06/10] PCI/LUO: Save and restore driver name
On Wed, Oct 01, 2025 at 05:03:19PM -0400, Pasha Tatashin wrote:
> On Wed, Oct 1, 2025 at 1:06 AM Greg Kroah-Hartman
> > On Tue, Sep 30, 2025 at 11:56:58AM -0400, Pasha Tatashin wrote:
> > > > > A driver that preserves state across a reboot already has an implicit
> > > > > contract with its future self about that data's format. The GUID
> > > > > simply makes that contract explicit and machine-checkable. It does not
> > > > > have to be GUID, but nevertheless there has to be a specific contract.
> > > >
> > > > So how are you going to "version" these GUID? I see you use "schema Vx"
> > >
> > > Driver developer who changes a driver to support live-update.
> >
> > I do not understand this response, sorry.
>
> Sorry for the confusion, I misunderstood your question. I thought you
> were asking who would add a new field to a driver. My answer was that
> it would be the developer who is adding support for the Live Update
> feature to that specific driver.
> I now realize you were asking about how the GUID would be versioned.
> Using a GUID was just one of several ideas. My main point is that we
> need some form of versioned compatibility identifier, whether it's a
> string or a number. This would allow the system to verify that the new
> driver can understand the preserved data for this device from the
> previous kernel before it binds to the device.
Again, "versioned" identifiers will not work over time as you can never
drop old versions, AND a driver author does not know if the underlying
structures that are outside of the driver have changed or not, nor if
the compiler settings have changed, or anything else that could affect
it like that have changed.
> > > > And when can you delete an old "schema"? This feels like you are
> > > > forcing future developers to maintain things "for forever"...
> > >
> > > This won't be an issue because of how live update support is planned.
> > > The support model will be phased and limited:
> > >
> > > Initially, and for a while there will be no stability guarantees
> > > between different kernel versions.
> > > Eventually, we will support specific, narrow upgrade paths (e.g.,
> > > minor-to-minor, or stable-A to stable-A+1).
> > > Downgrades and arbitrary version jumps ("any-to-any") will not be
> > > supported upstream. Since we only ever need to handle a well-defined
> > > forward path, the code for old, irrelevant schemas can always be
> > > removed. There is no "forever".
> >
> > This is kernel code, it is always "forever", sorry.
>
> I'm sorry, but I don't quite understand what you mean. There is no
> stable internal kernel API; the upstream tree is constantly evolving
> with features being added, improved, and removed.
Yes, that is very true, but you can not remove user-visible
functionality, which is what you are saying you are going to do here.
> > If you want "minor to minor" update, how is that going to work given
> > that you do not add changes only to "minor" releases (that being the
> > 6.12.y the "y" number).
>
> You are correct. Initially, our plan is to allow live updates to break
> between any kernel version.
Then there is no such thing as live updates :)
> However, it is my hope that we will
> eventually stabilize this process and only allow breakages between,
> for example, versions 6.n and 6.n+2, and eventually from one stable
> release to stable+2. This would create a well-defined window for
> safely removing deprecated data formats and the code that handles them
> from the kernel.
How are you going to define this? We can not break old users when they
upgrade, and so you are going to have to support this "upgrade path" for
forever.
> > Remember, Linux does not use "semantic versioning" as its release
> > numbering is older than that scheme. It just does "this version is
> > newer than that version" and that's it. You can't really take anything
> > else from the number.
>
> Understood. If that's the case, we could use stable releases as the
> basis for defining when a live update can break.
So every single release?
> It would take longer
> to achieve, but it is a possibility. These are the kinds of questions
> that will be discussed at the LPC Liveupdate MC. If you are attending
> LPC, I encourage you to join the discussion, as your thoughts on how
> we can frame long-term live update support would be very valuable.
I will be at LPC, but can't guarantee I can make it to that MC, it all
depends on scheduling.
> > And if this isn't for "upstream" at all, then why have it? We can't add
> > new features and support it if we can't actually use it and it's only
> > for out-of-tree vendor kernels.
>
> Our goal is to have full support in the upstream kernel. Downstream
> users will then need to adapt live updates to their specific needs.
> For example, if a live update from version A to version C is broken, a
> downstream user would either have to update incrementally from A to B
> and then to C, or they would have to internally fix whatever is
> causing the breakage before performing the live update.
What does "internally fix" mean exactly here?
> > And how will you document properly a "well defined forward path"? That
> > should be done first, before you have any code here that we are
> > reviewing.
>
> Currently, and for the near future, live updates will only be
> supported within the same kernel version.
Ok, then no need for any GUID at all. Just update and pray! :)
> > Please do that, get people to agree on the idea and how it will work
> > before asking us to review code.
>
> This is an industry-wide effort. We have engineers from Amazon,
> Google, Microsoft, Nvidia, and other companies meeting bi-weekly to
> discuss Live Update support, and sending and landing patches upstream.
> We are also organizing an LPC Live Update Micro Conference where the
> versioning strategy will be a topic.
>
> For now, we have agreed that the live update can break between and
> kernel versions or with any commit while the feature is under active
> development. This approach allows us the flexibility to build the core
> functionality while we collaboratively define the long-term versioning
> and stability model.
Just keeping a device "alive" while rebooting into the same exact kernel
image seems odd to me given that this is almost never what people
actually do. They update their kernel with the weekly stable release to
get the new bugfixes (remember we fix 13 CVEs a day), and away you go.
You are saying that this workload would not actually be supported, so
why do you want live update at all? Who needs this?
thanks,
greg k-h
Powered by blists - more mailing lists