linux-kernel - Re: [PATCH RFC] pci: report surprise removal events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aGHOzj3_MQ3x7hAD@kbusch-mbp>
Date: Sun, 29 Jun 2025 17:39:58 -0600
From: Keith Busch <kbusch@...nel.org>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Lukas Wunner <lukas@...ner.de>, linux-kernel@...r.kernel.org,
	Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
	Parav Pandit <parav@...dia.com>, virtualization@...ts.linux.dev,
	stefanha@...hat.com, alok.a.tiwari@...cle.com
Subject: Re: [PATCH RFC] pci: report surprise removal events

On Sun, Jun 29, 2025 at 01:28:08PM -0400, Michael S. Tsirkin wrote:
> On Sun, Jun 29, 2025 at 03:36:27PM +0200, Lukas Wunner wrote:
> > On Sat, Jun 28, 2025 at 02:58:49PM -0400, Michael S. Tsirkin wrote:
> > 
> > 1/ The device_lock() will reintroduce the issues solved by 74ff8864cc84.
> 
> I see. What other way is there to prevent dev->driver from going away,
> though? I guess I can add a new spinlock and take it both here and when
> dev->driver changes? Acceptable?

You're already holding the pci_bus_sem here, so the final device 'put'
can't have been called yet, so the device is valid and thread safe in
this context. I think maintaining the desired lifetime of the
instantiated driver is just a matter of reference counting within your
driver.

Just a thought on your patch, instead of introducing a new callback, you
could call the existing '->error_detected()' callback with the
previously set 'pci_channel_io_perm_failure' status. That would totally
work for nvme to kick its cleanup much quicker than the blk_mq timeout
handling we currently rely on for this scenario.