[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2025093030-shrewdly-defiant-1f3e@gregkh>
Date: Tue, 30 Sep 2025 15:41:40 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: Chris Li <chrisl@...nel.org>, Jason Gunthorpe <jgg@...pe.ca>,
Bjorn Helgaas <bhelgaas@...gle.com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Danilo Krummrich <dakr@...nel.org>, Len Brown <lenb@...nel.org>,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
linux-acpi@...r.kernel.org, David Matlack <dmatlack@...gle.com>,
Pasha Tatashin <tatashin@...gle.com>,
Jason Miu <jasonmiu@...gle.com>, Vipin Sharma <vipinsh@...gle.com>,
Saeed Mahameed <saeedm@...dia.com>,
Adithya Jayachandran <ajayachandra@...dia.com>,
Parav Pandit <parav@...dia.com>, William Tu <witu@...dia.com>,
Mike Rapoport <rppt@...nel.org>, Leon Romanovsky <leon@...nel.org>
Subject: Re: [PATCH v2 06/10] PCI/LUO: Save and restore driver name
On Tue, Sep 30, 2025 at 09:02:44AM -0400, Pasha Tatashin wrote:
> On Mon, Sep 29, 2025 at 10:10 PM Chris Li <chrisl@...nel.org> wrote:
> >
> > On Mon, Sep 29, 2025 at 10:57 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
> > >
> > > On Tue, Sep 16, 2025 at 12:45:14AM -0700, Chris Li wrote:
> > > > Save the PCI driver name into "struct pci_dev_ser" during the PCI
> > > > prepare callback.
> > > >
> > > > After kexec, use driver_set_override() to ensure the device is
> > > > bound only to the saved driver.
> > >
> > > This doesn't seem like a great idea, driver name should not be made
> > > ABI.
> >
> > Let's break it down with baby steps.
> >
> > 1) Do you agree the liveupdated PCI device needs to bind to the exact
> > same driver after kexec?
> > To me that is a firm yes. If the driver binds to another driver, we
> > can't expect the other driver will understand the original driver's
> > saved state.
>
> Hi Chris,
>
> Driver name does not have to be an ABI.
A driver name can NEVER be an abi, please don't do that.
> Drivers that support live
> updates should provide a live update-specific ABI to detect
> compatibility with the preserved data. We can use a preservation
> schema GUID for this.
>
> > 2) Assume the 1) is yes from you. Are you just not happy that the
> > kernel saves the driver name? You want user space to save it, is that
> > it?
> > How does it reference the driver after kexec otherwise?
>
> If we use GUID, drivers would advertise the GUIDs they support and we
> would modify the core device-driver matching process to use this
> information.
>
> Each driver that supports this mechanism would need to declare an
> array of GUIDs it is compatible with. This would be a new field in its
> struct pci_driver.
>
> static const guid_t my_driver_guids[] = {
> GUID_INIT(0x123e4567, ...), // Schema V1
> GUID_INIT(0x987a6543, ...), // Schema V2
> {},
> };
That's crazy, who is going to be adding all of that to all drivers? And
knowing to bump this if the internal data representaion changes? And it
will change underneath it without the driver even knowing? This feels
really really wrong, unless I'm missing something.
> static struct pci_driver my_pci_driver = {
> .name = "my_driver",
> .id_table = my_pci_ids,
> .probe = my_probe,
> .live_update_guids = my_driver_guids,
> };
>
> The kernel's PCI core would perform an extra check before falling back
> to the standard PCI ID matching.
> 1. When a PCI device is discovered, the core first asks the Live
> Update framework: "Is there a preserved GUID for this device?"
> 2. If a GUID is found, the core will only attempt to bind drivers that
> both match the device's PCI ID and have that specific GUID in their
> live_update_guids list.
What "core" is doing this? And how exactly?
And why is PCI somehow special here?
> 3. If no GUID is preserved for the device, the core proceeds with the
> normal matching logic
> 4. If no driver matches the GUID, the device is left unbound. The
> state gets removed during finish(), and the device is reset.
How do you reset a device you are not bound to? That feels ripe for
causing problems (think multi-function devices...)
And what about PCI drivers that are really just a aux-bus "root" point?
How is the sharing of all of the child devices going to work?
This feels really rough and might possibly work if you squint hard
enough and test it in a very limited way with almost no real hardware :)
good luck!
greg k-h
Powered by blists - more mailing lists