[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2025093052-resupply-unmixable-e9bb@gregkh>
Date: Tue, 30 Sep 2025 17:08:37 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: Chris Li <chrisl@...nel.org>, Jason Gunthorpe <jgg@...pe.ca>,
Bjorn Helgaas <bhelgaas@...gle.com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Danilo Krummrich <dakr@...nel.org>, Len Brown <lenb@...nel.org>,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
linux-acpi@...r.kernel.org, David Matlack <dmatlack@...gle.com>,
Pasha Tatashin <tatashin@...gle.com>,
Jason Miu <jasonmiu@...gle.com>, Vipin Sharma <vipinsh@...gle.com>,
Saeed Mahameed <saeedm@...dia.com>,
Adithya Jayachandran <ajayachandra@...dia.com>,
Parav Pandit <parav@...dia.com>, William Tu <witu@...dia.com>,
Mike Rapoport <rppt@...nel.org>, Leon Romanovsky <leon@...nel.org>
Subject: Re: [PATCH v2 06/10] PCI/LUO: Save and restore driver name
On Tue, Sep 30, 2025 at 10:53:50AM -0400, Pasha Tatashin wrote:
> On Tue, Sep 30, 2025 at 9:41 AM Greg Kroah-Hartman
> <gregkh@...uxfoundation.org> wrote:
> >
> > On Tue, Sep 30, 2025 at 09:02:44AM -0400, Pasha Tatashin wrote:
> > > On Mon, Sep 29, 2025 at 10:10 PM Chris Li <chrisl@...nel.org> wrote:
> > > >
> > > > On Mon, Sep 29, 2025 at 10:57 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
> > > > >
> > > > > On Tue, Sep 16, 2025 at 12:45:14AM -0700, Chris Li wrote:
> > > > > > Save the PCI driver name into "struct pci_dev_ser" during the PCI
> > > > > > prepare callback.
> > > > > >
> > > > > > After kexec, use driver_set_override() to ensure the device is
> > > > > > bound only to the saved driver.
> > > > >
> > > > > This doesn't seem like a great idea, driver name should not be made
> > > > > ABI.
> > > >
> > > > Let's break it down with baby steps.
> > > >
> > > > 1) Do you agree the liveupdated PCI device needs to bind to the exact
> > > > same driver after kexec?
> > > > To me that is a firm yes. If the driver binds to another driver, we
> > > > can't expect the other driver will understand the original driver's
> > > > saved state.
> > >
> > > Hi Chris,
> > >
> > > Driver name does not have to be an ABI.
> >
> > A driver name can NEVER be an abi, please don't do that.
> >
> > > Drivers that support live
> > > updates should provide a live update-specific ABI to detect
> > > compatibility with the preserved data. We can use a preservation
> > > schema GUID for this.
> > >
> > > > 2) Assume the 1) is yes from you. Are you just not happy that the
> > > > kernel saves the driver name? You want user space to save it, is that
> > > > it?
> > > > How does it reference the driver after kexec otherwise?
> > >
> > > If we use GUID, drivers would advertise the GUIDs they support and we
> > > would modify the core device-driver matching process to use this
> > > information.
> > >
> > > Each driver that supports this mechanism would need to declare an
> > > array of GUIDs it is compatible with. This would be a new field in its
> > > struct pci_driver.
> > >
> > > static const guid_t my_driver_guids[] = {
> > > GUID_INIT(0x123e4567, ...), // Schema V1
> > > GUID_INIT(0x987a6543, ...), // Schema V2
> > > {},
> > > };
> >
> > That's crazy, who is going to be adding all of that to all drivers? And
>
> Only to the drivers that support live updates, that would be just a few drivers.
>
> > knowing to bump this if the internal data representaion changes? And it
> > will change underneath it without the driver even knowing? This feels
> > really really wrong, unless I'm missing something.
>
> A driver that preserves state across a reboot already has an implicit
> contract with its future self about that data's format. The GUID
> simply makes that contract explicit and machine-checkable. It does not
> have to be GUID, but nevertheless there has to be a specific contract.
So how are you going to "version" these GUID? I see you use "schema Vx"
above, but how is that really going to work in the end? Lots of data
structures change underneath the base driver that it knows nothing
about, not to mention basic things like compiler flags and the like
(think about how we have changed things for spectre issues over the
years...)
And when can you delete an old "schema"? This feels like you are
forcing future developers to maintain things "for forever"...
thanks,
greg k-h
Powered by blists - more mailing lists