[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACePvbVCXGn-c3dVZfLTq+GbcFfjWchN0OwEHDNs_-EV6TJfyg@mail.gmail.com>
Date: Thu, 2 Oct 2025 15:05:25 -0700
From: Chris Li <chrisl@...nel.org>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>, Jason Gunthorpe <jgg@...pe.ca>,
Bjorn Helgaas <bhelgaas@...gle.com>, "Rafael J. Wysocki" <rafael@...nel.org>,
Danilo Krummrich <dakr@...nel.org>, Len Brown <lenb@...nel.org>, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, linux-acpi@...r.kernel.org,
David Matlack <dmatlack@...gle.com>, Pasha Tatashin <tatashin@...gle.com>,
Jason Miu <jasonmiu@...gle.com>, Vipin Sharma <vipinsh@...gle.com>,
Saeed Mahameed <saeedm@...dia.com>, Adithya Jayachandran <ajayachandra@...dia.com>,
Parav Pandit <parav@...dia.com>, William Tu <witu@...dia.com>, Mike Rapoport <rppt@...nel.org>,
Leon Romanovsky <leon@...nel.org>
Subject: Re: [PATCH v2 06/10] PCI/LUO: Save and restore driver name
On Tue, Sep 30, 2025 at 10:13 PM Greg Kroah-Hartman
<gregkh@...uxfoundation.org> wrote:
> >
> > for example, the pci has this sysfs control api:
> >
> > "/sys/bus/pci/devices/0000:04:00.0/driver_override" which takes the
> > *driver name* as data to override what driver is allowed to bind to
> > this device.
> > Does this driver_override consider it as using the driver name as part
> > of the abi? If not, why?
>
> Because the bind/unbind/override was created as a debug facility for
> doing kernel development and then people have turned it into a "let's
> operate our massive cloud systems with this fragile feature".
Frankly, I did not know that it was a debug API or should be treated like one.
Let's say we want to make it right for now and future, any
suggestion/guide line for the new API?
> We have never said that driver names will remain the same across
> releases, and they have changed over time. Device ids have also moved
That is fine. The LUO PCI just says that at the old kernel that does
the liveupdate from, that is the driver name "foo1" in the old kernel
A1. The new kernel A2 that gets boot will know about the old kernel
A1, at least in the typical data center. There will be a test live
update A1 to A2. Validation before officially rolling out the
liveupdate kernel. The new kernel A2 can know that, oh, on this old
kernel, A1, this driver "foo2" used to call "foo1" in A1. Then it can
let the PCI core bind to the "foo2" for that device instead. Later
when A2 liveupdate to A3, A3 can drop the knowledge of the "foo1" if
we are sure the A1 kernel is no longer supported.
> from one driver to another as well, making the "control" of the device
> seem to have changed names.
The name can be changed, just the new kernel needs to know about the
change and handle it. Extra complexity but not impossible.
>
> > What live update wants is to make that driver_override persistent over
> > kexec. It does not introduce the "driver_override" API. That is
> > pre-existing conditions. The PCI liveupdate just wants to use it.
>
> That does not mean that this is the correct api to use at all. Again,
> this was a debugging aid, to help with users who wanted to add a device
> id to a driver without having to rebuild it. Don't make it something
> that it was never intended to be.
>
> Why not just make a new api as you are doing something new here? That
> way you get to define it to work exactly the way you need?
Sure, I can invent a new API. I am just a bit afraid to introduce a
new API and carry the burden of supporting it forever.
Another idea is that we don't remember the driver's name. The kernel
just enforces that, if the device is liveupdate, no auto probe at all.
Then push the responsibility to the user space to load the driver and
manually bind the device to the right driver. The user space will
still need to know what is the previous driver name or some way to
identify the right driver for this liveupdate process. Somebody will
need to know something like a driver name and pass that to the new
kernel to restore it. But not the kernel.
It will have a drawback on extra latency of the black out window, now
after PCI scans the PCI bus, a user space program will be run to bind
and probe the driver.
>
> > I want to get some basic understanding before adventure into the more
> > complex solutions.
>
> You mean "real" solutions :)
I mean the more upstream accepted solutions.
> It's not my requirement to say "here is C", but rather I am saying "B is
> not going to scale over time as GUIDs are a pain to manage".
I can agree to that.
> > Do you have any other suggestion how to prevent the live update PCI
> > device bind to a different driver after kexec? I am happy to work on
> > the direction you point out and turn that into a patch for the
> > discussion purpose.
>
> Why prevent it? Why not just have a special api just for drivers that
> want to use this new feature?
The typical GPU will bind to the VFIO driver when the VM is using it.
If we don't prevent auto probe, the PCI device will auto probe to the
native driver on the next kexec. Naturally, the native driver will
have no day to decode the data saved from the previous vfio driver.
Chris
Powered by blists - more mailing lists