[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250108124913.k2p6h4pja2k5vazf@thinkpad>
Date: Wed, 8 Jan 2025 18:19:13 +0530
From: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
To: Johan Hovold <johan@...nel.org>
Cc: mhi@...ts.linux.dev, linux-arm-msm@...r.kernel.org,
linux-kernel@...r.kernel.org,
Loic Poulain <loic.poulain@...aro.org>
Subject: Re: mhi resume failure on reboot with 6.13-rc2
On Thu, Dec 19, 2024 at 09:36:32AM +0100, Johan Hovold wrote:
> On Thu, Dec 19, 2024 at 12:05:55AM +0530, Manivannan Sadhasivam wrote:
> > On Wed, Dec 18, 2024 at 03:26:38PM +0100, Johan Hovold wrote:
> > > On Wed, Dec 18, 2024 at 07:39:10PM +0530, Manivannan Sadhasivam wrote:
> > > > On Wed, Dec 18, 2024 at 02:55:02PM +0100, Johan Hovold wrote:
>
> > > > > But that's not going to happen as that reset is what is currently
> > > > > causing the deadlock and which would simply be skipped if you switch to
> > > > > pci_try_reset_function().
> > > > >
> > > >
> > > > mhi_pci_runtime_resume() will queue the recovery_work() and return. So I was
> > > > hoping that by the time pci_try_reset_function() is called, the lock would be
> > > > available.
> > >
> > > We can't rely on luck with timings, and this is the very reason for the
> > > deadlock I'm currently seeing (i.e. the recovery thread is still running
> > > when another thread grabs the lock and waits for the recovery thread to
> > > finish).
> > >
> > > Perhaps the recovery work should be done synchronously in the resume
> > > handler to avoid such issues.
> >
> > Synchronously? How can that help when the recovery_work() cannot acquire the
> > lock?
>
> During system suspend, pm core waits for any on-going runtime resume
> operations to complete before taking the device lock and suspending the
> device.
>
Right, but mhi_pci_runtime_resume() is also called from mhi_pci_resume(). So we
cannot safely carry out the recovery_work() synchronously without the
pci_try_reset_function() change.
> Unfortunately, that's currently not the case during shutdown() where
> those operations are reversed, so that would indeed need to be addressed
> first.
>
> But what the driver is currently doing looks highly questionable as it
> returns success when it failed to resume the device (after scheduling
> the asynchronous recovery work).
>
I completely agree and this goes against what PM core expects. IMO we need
two fixes, one uses pci_try_reset_function() and another recovers the device
synchronously from mhi_pci_runtime_resume() and passes the return value to PM
core.
Will post the patches.
- Mani
--
மணிவண்ணன் சதாசிவம்
Powered by blists - more mailing lists