[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250717193122-mutt-send-email-mst@kernel.org>
Date: Thu, 17 Jul 2025 19:31:57 -0400
From: "Michael S. Tsirkin" <mst@...hat.com>
To: Lukas Wunner <lukas@...ner.de>
Cc: linux-kernel@...r.kernel.org, Keith Busch <kbusch@...nel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Parav Pandit <parav@...dia.com>, virtualization@...ts.linux.dev,
stefanha@...hat.com, alok.a.tiwari@...cle.com,
linux-pci@...r.kernel.org
Subject: Re: [PATCH RFC v5 1/5] pci: report surprise removal event
On Thu, Jul 17, 2025 at 10:12:03PM +0200, Lukas Wunner wrote:
> On Thu, Jul 17, 2025 at 11:11:44AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 14, 2025 at 08:11:04AM +0200, Lukas Wunner wrote:
> > > On Wed, Jul 09, 2025 at 04:55:26PM -0400, Michael S. Tsirkin wrote:
> > > > At the moment, in case of a surprise removal, the regular remove
> > > > callback is invoked, exclusively. This works well, because mostly, the
> > > > cleanup would be the same.
> > > >
> > > > However, there's a race: imagine device removal was initiated by a user
> > > > action, such as driver unbind, and it in turn initiated some cleanup and
> > > > is now waiting for an interrupt from the device. If the device is now
> > > > surprise-removed, that never arrives and the remove callback hangs
> > > > forever.
> > >
> > > For PCI devices in a hotplug slot, user space can initiate "safe removal"
> > > by writing "0" to the hotplug slot's "power" file in sysfs.
> > >
> > > If the PCI device is yanked from the slot while safe removal is ongoing,
> > > there is likewise no way for the driver to know that the device is
> > > suddenly gone. That's because pciehp_unconfigure_device() only calls
> > > pci_dev_set_disconnected() in the surprise removal case, not for
> > > safe removal.
> > >
> > > The solution proposed here is thus not a complete one: It may work
> > > if user space initiated *driver* removal, but not if it initiated *safe*
> > > removal of the entire device. For virtio, that may be sufficient.
> >
> > So just as an idea, something like this can work I guess? I'm yet to
> > test this - wrote this on the go -
>
> Don't bother, it won't work:
>
> pciehp_handle_presence_or_link_change() is called from pciehp_ist(),
> the IRQ thread. During safe removal the IRQ thread is busy in
> pciehp_unconfigure_device() and waiting for the driver to unbind
> from devices being safe-removed.
Confused. I thought safe removal happens in the userspace thread
that wrote into sysfs?
> An IRQ thread is always single-threaded. There's no second instance
> of the IRQ thread being run when another interrupt is signaled.
> Rather, the IRQ thread is re-run when it has finished.
>
> In *theory* what would be possible is to plumb this into pciehp_isr().
> That's the hardirq handler. This one will indeed be run when an
> interrupt comes in while the IRQ thread is running. Normally the
> hardirq handler would just collect the events for later consumption
> by the IRQ thread. The hardirq handler could *theoretically* mark
> devices gone while they're being safe-removed.
>
> I'm saying "theoretically" because in reality I don't think this is
> a viable approach either: pciehp_ist() contains code to *ignore*
> link or presence changes if they were caused by a Secondary Bus Reset
> or Downstream Port Containment. In that case we do *not* want to mark
> devices disconnected because they're only *temporarily* inaccessible.
> This requires waiting for the SBR or DPC to conclude, which can take
> several seconds. We can't wait in the hardirq handler.
>
> So this cannot be solved with the current architecture of pciehp,
> at least not easily or in an elegant way. Sorry!
>
> Thanks,
>
> Lukas
Powered by blists - more mailing lists