[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACePvbXrbR=A43UveqPrBmQHAfvjuJGtw9XyUQvpYe941KwzuA@mail.gmail.com>
Date: Tue, 30 Sep 2025 08:41:29 -0700
From: Chris Li <chrisl@...nel.org>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>, Jason Gunthorpe <jgg@...pe.ca>,
Bjorn Helgaas <bhelgaas@...gle.com>, "Rafael J. Wysocki" <rafael@...nel.org>,
Danilo Krummrich <dakr@...nel.org>, Len Brown <lenb@...nel.org>, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, linux-acpi@...r.kernel.org,
David Matlack <dmatlack@...gle.com>, Pasha Tatashin <tatashin@...gle.com>,
Jason Miu <jasonmiu@...gle.com>, Vipin Sharma <vipinsh@...gle.com>,
Saeed Mahameed <saeedm@...dia.com>, Adithya Jayachandran <ajayachandra@...dia.com>,
Parav Pandit <parav@...dia.com>, William Tu <witu@...dia.com>, Mike Rapoport <rppt@...nel.org>,
Leon Romanovsky <leon@...nel.org>
Subject: Re: [PATCH v2 06/10] PCI/LUO: Save and restore driver name
On Tue, Sep 30, 2025 at 6:41 AM Greg Kroah-Hartman
<gregkh@...uxfoundation.org> wrote:
>
> On Tue, Sep 30, 2025 at 09:02:44AM -0400, Pasha Tatashin wrote:
> > On Mon, Sep 29, 2025 at 10:10 PM Chris Li <chrisl@...nel.org> wrote:
> > >
> > > On Mon, Sep 29, 2025 at 10:57 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
> > > >
> > > > On Tue, Sep 16, 2025 at 12:45:14AM -0700, Chris Li wrote:
> > > > > Save the PCI driver name into "struct pci_dev_ser" during the PCI
> > > > > prepare callback.
> > > > >
> > > > > After kexec, use driver_set_override() to ensure the device is
> > > > > bound only to the saved driver.
> > > >
> > > > This doesn't seem like a great idea, driver name should not be made
> > > > ABI.
> > >
> > > Let's break it down with baby steps.
> > >
> > > 1) Do you agree the liveupdated PCI device needs to bind to the exact
> > > same driver after kexec?
> > > To me that is a firm yes. If the driver binds to another driver, we
> > > can't expect the other driver will understand the original driver's
> > > saved state.
> >
> > Hi Chris,
> >
> > Driver name does not have to be an ABI.
>
> A driver name can NEVER be an abi, please don't do that.
Can you please clarify that.
for example, the pci has this sysfs control api:
"/sys/bus/pci/devices/0000:04:00.0/driver_override" which takes the
*driver name* as data to override what driver is allowed to bind to
this device.
Does this driver_override consider it as using the driver name as part
of the abi? If not, why?
What live update wants is to make that driver_override persistent over
kexec. It does not introduce the "driver_override" API. That is
pre-existing conditions. The PCI liveupdate just wants to use it.
I want to get some basic understanding before adventure into the more
complex solutions.
> > Drivers that support live
> > updates should provide a live update-specific ABI to detect
> > compatibility with the preserved data. We can use a preservation
> > schema GUID for this.
> >
> > > 2) Assume the 1) is yes from you. Are you just not happy that the
> > > kernel saves the driver name? You want user space to save it, is that
> > > it?
> > > How does it reference the driver after kexec otherwise?
> >
> > If we use GUID, drivers would advertise the GUIDs they support and we
> > would modify the core device-driver matching process to use this
> > information.
> >
> > Each driver that supports this mechanism would need to declare an
> > array of GUIDs it is compatible with. This would be a new field in its
> > struct pci_driver.
> >
> > static const guid_t my_driver_guids[] = {
> > GUID_INIT(0x123e4567, ...), // Schema V1
> > GUID_INIT(0x987a6543, ...), // Schema V2
> > {},
> > };
>
> That's crazy, who is going to be adding all of that to all drivers? And
> knowing to bump this if the internal data representaion changes? And it
> will change underneath it without the driver even knowing? This feels
> really really wrong, unless I'm missing something.
The GUID is more complex than a driver name. I am fine with not using
GUID if you are so strongly opposed to it.
You are saying don't do A(driver name) and B(GUID). I am waiting for
the part where you say "please do C instead".
Do you have any other suggestion how to prevent the live update PCI
device bind to a different driver after kexec? I am happy to work on
the direction you point out and turn that into a patch for the
discussion purpose.
Thanks
Chris
> > static struct pci_driver my_pci_driver = {
> > .name = "my_driver",
> > .id_table = my_pci_ids,
> > .probe = my_probe,
> > .live_update_guids = my_driver_guids,
> > };
> >
> > The kernel's PCI core would perform an extra check before falling back
> > to the standard PCI ID matching.
> > 1. When a PCI device is discovered, the core first asks the Live
> > Update framework: "Is there a preserved GUID for this device?"
> > 2. If a GUID is found, the core will only attempt to bind drivers that
> > both match the device's PCI ID and have that specific GUID in their
> > live_update_guids list.
>
> What "core" is doing this? And how exactly?
>
> And why is PCI somehow special here?
>
> > 3. If no GUID is preserved for the device, the core proceeds with the
> > normal matching logic
> > 4. If no driver matches the GUID, the device is left unbound. The
> > state gets removed during finish(), and the device is reset.
>
> How do you reset a device you are not bound to? That feels ripe for
> causing problems (think multi-function devices...)
>
> And what about PCI drivers that are really just a aux-bus "root" point?
> How is the sharing of all of the child devices going to work?
>
> This feels really rough and might possibly work if you squint hard
> enough and test it in a very limited way with almost no real hardware :)
>
> good luck!
>
> greg k-h
Powered by blists - more mailing lists