lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <CY8PR12MB7195F2F2900BAEA69F5431E9DC46A@CY8PR12MB7195.namprd12.prod.outlook.com>
Date: Mon, 30 Jun 2025 04:07:55 +0000
From: Parav Pandit <parav@...dia.com>
To: Keith Busch <kbusch@...nel.org>, "Michael S. Tsirkin" <mst@...hat.com>
CC: Lukas Wunner <lukas@...ner.de>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	"virtualization@...ts.linux.dev" <virtualization@...ts.linux.dev>,
	"stefanha@...hat.com" <stefanha@...hat.com>, "alok.a.tiwari@...cle.com"
	<alok.a.tiwari@...cle.com>
Subject: RE: [PATCH RFC] pci: report surprise removal events


> From: Keith Busch <kbusch@...nel.org>
> Sent: 30 June 2025 05:10 AM
> 
> On Sun, Jun 29, 2025 at 01:28:08PM -0400, Michael S. Tsirkin wrote:
> > On Sun, Jun 29, 2025 at 03:36:27PM +0200, Lukas Wunner wrote:
> > > On Sat, Jun 28, 2025 at 02:58:49PM -0400, Michael S. Tsirkin wrote:
> > >
> > > 1/ The device_lock() will reintroduce the issues solved by 74ff8864cc84.
> >
> > I see. What other way is there to prevent dev->driver from going away,
> > though? I guess I can add a new spinlock and take it both here and
> > when
> > dev->driver changes? Acceptable?
> 
> You're already holding the pci_bus_sem here, so the final device 'put'
> can't have been called yet, so the device is valid and thread safe in this
> context. I think maintaining the desired lifetime of the instantiated driver is
> just a matter of reference counting within your driver.
> 
> Just a thought on your patch, instead of introducing a new callback, you could
> call the existing '->error_detected()' callback with the previously set
> 'pci_channel_io_perm_failure' status. That would totally work for nvme to
> kick its cleanup much quicker than the blk_mq timeout handling we currently
> rely on for this scenario.

error_detected() callback is also called while holding the device_lock() by report_error_detected().
So when remove() callback is ongoing for graceful removal and driver is waiting for the request completions,

If the error_detected() will be stuck on device lock.

We likely need async notification from driver kernel core without holding the device lock, so driver can initiate cleanup in this corner case.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ