[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3347624899623b430d62d94abcf870dac7354e0a.camel@linux.ibm.com>
Date: Wed, 08 Oct 2025 18:46:45 +0200
From: Gerd Bayer <gbayer@...ux.ibm.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Tariq Toukan <tariqt@...dia.com>, Saeed Mahameed <saeedm@...dia.com>,
Leon Romanovsky <leon@...nel.org>, Shay Drori <shayd@...dia.com>,
Mark
Bloch <mbloch@...dia.com>, Andrew Lunn <andrew+netdev@...n.ch>,
"David S .
Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub
Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Alex Vesker
<valex@...lanox.com>,
Feras Daoud <ferasda@...lanox.com>, netdev@...r.kernel.org,
linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org,
Niklas Schnelle <schnelle@...ux.ibm.com>, linux-s390@...r.kernel.org
Subject: Re: [PATCH net v2] net/mlx5: Avoid deadlock between PCI error
recovery and health reporter
On Tue, 2025-10-07 at 13:21 -0300, Jason Gunthorpe wrote:
> On Tue, Oct 07, 2025 at 04:48:26PM +0200, Gerd Bayer wrote:
> > - task: kmcheck
> > mlx5_unload_one() tries to acquire devlink lock while the PCI
> > error
> > recovery code has set pdev->block_cfg_access by way of
> > pci_cfg_access_lock()
>
> This seems wrong, arch code shouldn't invoke the driver's error
> handler while hodling pci_dev_lock().
Seeing how powerpc's EEH is also just acquiring the device_lock while
executing the PCI error recovery call-back, I'll be investigating that
route by "demoting" pci_dev_lock() to device_lock() (i.e. not including
the blockage of PCI config accesses)
Initial tests look promising, but I need to do more experimenting and
want to check the AER path in passing, too.
> Or at least if we do want to do this the locking should be documented
> and some lockdep map should be added to pci_cfg_access_lock() and the
> normal AER path..
This change of contract sounds a lot more intrusive to device drivers -
so I'm not actually pursuing this.
>
> Jason
Thanks, Gerd
Powered by blists - more mailing lists