linux-kernel - Re: [PATCH RFC v5 1/5] pci: report surprise removal event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aHlZE18kPuHuDtTT@wunner.de>
Date: Thu, 17 Jul 2025 22:12:03 +0200
From: Lukas Wunner <lukas@...ner.de>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: linux-kernel@...r.kernel.org, Keith Busch <kbusch@...nel.org>,
	Bjorn Helgaas <bhelgaas@...gle.com>,
	Parav Pandit <parav@...dia.com>, virtualization@...ts.linux.dev,
	stefanha@...hat.com, alok.a.tiwari@...cle.com,
	linux-pci@...r.kernel.org
Subject: Re: [PATCH RFC v5 1/5] pci: report surprise removal event

On Thu, Jul 17, 2025 at 11:11:44AM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 14, 2025 at 08:11:04AM +0200, Lukas Wunner wrote:
> > On Wed, Jul 09, 2025 at 04:55:26PM -0400, Michael S. Tsirkin wrote:
> > > At the moment, in case of a surprise removal, the regular remove
> > > callback is invoked, exclusively.  This works well, because mostly, the
> > > cleanup would be the same.
> > > 
> > > However, there's a race: imagine device removal was initiated by a user
> > > action, such as driver unbind, and it in turn initiated some cleanup and
> > > is now waiting for an interrupt from the device. If the device is now
> > > surprise-removed, that never arrives and the remove callback hangs
> > > forever.
> > 
> > For PCI devices in a hotplug slot, user space can initiate "safe removal"
> > by writing "0" to the hotplug slot's "power" file in sysfs.
> > 
> > If the PCI device is yanked from the slot while safe removal is ongoing,
> > there is likewise no way for the driver to know that the device is
> > suddenly gone.  That's because pciehp_unconfigure_device() only calls
> > pci_dev_set_disconnected() in the surprise removal case, not for
> > safe removal.
> > 
> > The solution proposed here is thus not a complete one:  It may work
> > if user space initiated *driver* removal, but not if it initiated *safe*
> > removal of the entire device.  For virtio, that may be sufficient.
> 
> So just as an idea, something like this can work I guess?  I'm yet to
> test this - wrote this on the go -

Don't bother, it won't work:

pciehp_handle_presence_or_link_change() is called from pciehp_ist(),
the IRQ thread.  During safe removal the IRQ thread is busy in
pciehp_unconfigure_device() and waiting for the driver to unbind
from devices being safe-removed.

An IRQ thread is always single-threaded.  There's no second instance
of the IRQ thread being run when another interrupt is signaled.
Rather, the IRQ thread is re-run when it has finished.

In *theory* what would be possible is to plumb this into pciehp_isr().
That's the hardirq handler.  This one will indeed be run when an
interrupt comes in while the IRQ thread is running.  Normally the
hardirq handler would just collect the events for later consumption
by the IRQ thread.  The hardirq handler could *theoretically* mark
devices gone while they're being safe-removed.

I'm saying "theoretically" because in reality I don't think this is
a viable approach either:  pciehp_ist() contains code to *ignore*
link or presence changes if they were caused by a Secondary Bus Reset
or Downstream Port Containment.  In that case we do *not* want to mark
devices disconnected because they're only *temporarily* inaccessible.
This requires waiting for the SBR or DPC to conclude, which can take
several seconds.  We can't wait in the hardirq handler.

So this cannot be solved with the current architecture of pciehp,
at least not easily or in an elegant way.  Sorry!

Thanks,

Lukas