linux-kernel - Re: [PATCH RFC] pci: report surprise removal events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aGKUqsudjfk8wCHI@kbusch-mbp>
Date: Mon, 30 Jun 2025 07:44:10 -0600
From: Keith Busch <kbusch@...nel.org>
To: Parav Pandit <parav@...dia.com>
Cc: "Michael S. Tsirkin" <mst@...hat.com>, Lukas Wunner <lukas@...ner.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Bjorn Helgaas <bhelgaas@...gle.com>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	"virtualization@...ts.linux.dev" <virtualization@...ts.linux.dev>,
	"stefanha@...hat.com" <stefanha@...hat.com>,
	"alok.a.tiwari@...cle.com" <alok.a.tiwari@...cle.com>
Subject: Re: [PATCH RFC] pci: report surprise removal events

On Mon, Jun 30, 2025 at 04:07:55AM +0000, Parav Pandit wrote:
> 
> > From: Keith Busch <kbusch@...nel.org>
> > Sent: 30 June 2025 05:10 AM
> > 
> > On Sun, Jun 29, 2025 at 01:28:08PM -0400, Michael S. Tsirkin wrote:
> > > On Sun, Jun 29, 2025 at 03:36:27PM +0200, Lukas Wunner wrote:
> > > > On Sat, Jun 28, 2025 at 02:58:49PM -0400, Michael S. Tsirkin wrote:
> > > >
> > > > 1/ The device_lock() will reintroduce the issues solved by 74ff8864cc84.
> > >
> > > I see. What other way is there to prevent dev->driver from going away,
> > > though? I guess I can add a new spinlock and take it both here and
> > > when
> > > dev->driver changes? Acceptable?
> > 
> > You're already holding the pci_bus_sem here, so the final device 'put'
> > can't have been called yet, so the device is valid and thread safe in this
> > context. I think maintaining the desired lifetime of the instantiated driver is
> > just a matter of reference counting within your driver.
> > 
> > Just a thought on your patch, instead of introducing a new callback, you could
> > call the existing '->error_detected()' callback with the previously set
> > 'pci_channel_io_perm_failure' status. That would totally work for nvme to
> > kick its cleanup much quicker than the blk_mq timeout handling we currently
> > rely on for this scenario.
> 
> error_detected() callback is also called while holding the device_lock() by report_error_detected().
> So when remove() callback is ongoing for graceful removal and driver is waiting for the request completions,
> 
> If the error_detected() will be stuck on device lock.

But I didn't suggest calling error_detected from report_error_detected.
Just call it directly without device_lock. It's not very feasible to
enforce a non-blocking callback, though, if speed is really a concern
here.