[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHSfeNhpocI4nmQk@wunner.de>
Date: Mon, 14 Jul 2025 08:11:04 +0200
From: Lukas Wunner <lukas@...ner.de>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: linux-kernel@...r.kernel.org, Keith Busch <kbusch@...nel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Parav Pandit <parav@...dia.com>, virtualization@...ts.linux.dev,
stefanha@...hat.com, alok.a.tiwari@...cle.com,
linux-pci@...r.kernel.org
Subject: Re: [PATCH RFC v5 1/5] pci: report surprise removal event
On Wed, Jul 09, 2025 at 04:55:26PM -0400, Michael S. Tsirkin wrote:
> At the moment, in case of a surprise removal, the regular remove
> callback is invoked, exclusively. This works well, because mostly, the
> cleanup would be the same.
>
> However, there's a race: imagine device removal was initiated by a user
> action, such as driver unbind, and it in turn initiated some cleanup and
> is now waiting for an interrupt from the device. If the device is now
> surprise-removed, that never arrives and the remove callback hangs
> forever.
For PCI devices in a hotplug slot, user space can initiate "safe removal"
by writing "0" to the hotplug slot's "power" file in sysfs.
If the PCI device is yanked from the slot while safe removal is ongoing,
there is likewise no way for the driver to know that the device is
suddenly gone. That's because pciehp_unconfigure_device() only calls
pci_dev_set_disconnected() in the surprise removal case, not for
safe removal.
The solution proposed here is thus not a complete one: It may work
if user space initiated *driver* removal, but not if it initiated *safe*
removal of the entire device. For virtio, that may be sufficient.
> +++ b/drivers/pci/pci.h
> @@ -553,6 +553,12 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
> pci_dev_set_io_state(dev, pci_channel_io_perm_failure);
> pci_doe_disconnected(dev);
>
> + if (READ_ONCE(dev->disconnect_work_enable)) {
> + /* Make sure work is up to date. */
> + smp_rmb();
> + schedule_work(&dev->disconnect_work);
> + }
> +
> return 0;
> }
Going through all the callers of pci_dev_set_disconnected(),
I suppose the (only) one you're interested in is
pciehp_unconfigure_device().
The other callers are related to runtime resume, resume from
system sleep and ACPI slots.
Instead of amending pci_dev_set_disconnected(), I'd prefer
an approach where pciehp_unconfigure_device() first marks
all devices disconnected, then wakes up some global waitqueue, e.g.:
- if (!presence)
+ if (!presence) {
pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
+ wake_up_all(&pci_disconnected_wq);
+ }
The benefit is that there's no delay when marking devices disconnected.
(Granted, the delay is small for smp_rmb() + schedule_work().)
And just having a global waitqueue is simpler and may be useful
for other use cases.
So instead of adding timeouts when waiting for interrupts, drivers would
be woken via the waitqueue.
But again, it's not a complete solution as it doesn't cover the
"surprise removal during safe removal" case.
I also agree with Bjorn's and Keith's comments that the driver should
use timeouts for robustness, but still wanted to provide additional
(hopefully constructive) thoughts.
Thanks!
Lukas
Powered by blists - more mailing lists