[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<CY8PR12MB7195F2F2900BAEA69F5431E9DC46A@CY8PR12MB7195.namprd12.prod.outlook.com>
Date: Mon, 30 Jun 2025 04:07:55 +0000
From: Parav Pandit <parav@...dia.com>
To: Keith Busch <kbusch@...nel.org>, "Michael S. Tsirkin" <mst@...hat.com>
CC: Lukas Wunner <lukas@...ner.de>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"virtualization@...ts.linux.dev" <virtualization@...ts.linux.dev>,
"stefanha@...hat.com" <stefanha@...hat.com>, "alok.a.tiwari@...cle.com"
<alok.a.tiwari@...cle.com>
Subject: RE: [PATCH RFC] pci: report surprise removal events
> From: Keith Busch <kbusch@...nel.org>
> Sent: 30 June 2025 05:10 AM
>
> On Sun, Jun 29, 2025 at 01:28:08PM -0400, Michael S. Tsirkin wrote:
> > On Sun, Jun 29, 2025 at 03:36:27PM +0200, Lukas Wunner wrote:
> > > On Sat, Jun 28, 2025 at 02:58:49PM -0400, Michael S. Tsirkin wrote:
> > >
> > > 1/ The device_lock() will reintroduce the issues solved by 74ff8864cc84.
> >
> > I see. What other way is there to prevent dev->driver from going away,
> > though? I guess I can add a new spinlock and take it both here and
> > when
> > dev->driver changes? Acceptable?
>
> You're already holding the pci_bus_sem here, so the final device 'put'
> can't have been called yet, so the device is valid and thread safe in this
> context. I think maintaining the desired lifetime of the instantiated driver is
> just a matter of reference counting within your driver.
>
> Just a thought on your patch, instead of introducing a new callback, you could
> call the existing '->error_detected()' callback with the previously set
> 'pci_channel_io_perm_failure' status. That would totally work for nvme to
> kick its cleanup much quicker than the blk_mq timeout handling we currently
> rely on for this scenario.
error_detected() callback is also called while holding the device_lock() by report_error_detected().
So when remove() callback is ongoing for graceful removal and driver is waiting for the request completions,
If the error_detected() will be stuck on device lock.
We likely need async notification from driver kernel core without holding the device lock, so driver can initiate cleanup in this corner case.
Powered by blists - more mailing lists