[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMZdPi-2O_aNWHnRKkMGUKsrTdfTTuNM70y_7X5xvADiyk-+VQ@mail.gmail.com>
Date: Wed, 8 Jan 2025 15:46:26 +0100
From: Loic Poulain <loic.poulain@...aro.org>
To: manivannan.sadhasivam@...aro.org
Cc: mhi@...ts.linux.dev, Johan Hovold <johan@...nel.org>, linux-arm-msm@...r.kernel.org,
linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH 1/2] bus: mhi: host: pci_generic: Use pci_try_reset_function()
to avoid deadlock
On Wed, 8 Jan 2025 at 14:39, Manivannan Sadhasivam via B4 Relay
<devnull+manivannan.sadhasivam.linaro.org@...nel.org> wrote:
>
> From: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
>
> There are multiple places from where the recovery work gets scheduled
> asynchronously. Also, there are multiple places where the caller waits
> synchronously for the recovery to be completed. One such place is during
> the PM shutdown() callback.
>
> If the device is not alive during recovery_work, it will try to reset the
> device using pci_reset_function(). This function internally will take the
> device_lock() first before resetting the device. By this time, if the lock
> has already been acquired, then recovery_work will get stalled while
> waiting for the lock. And if the lock was already acquired by the caller
> which waits for the recovery_work to be completed, it will lead to
> deadlock.
>
> This is what happened on the X1E80100 CRD device when the device died
> before shutdown() callback. Driver core calls the driver's shutdown()
> callback while holding the device_lock() leading to deadlock.
>
> And this deadlock scenario can occur on other paths as well, like during
> the PM suspend() callback, where the driver core would hold the
> device_lock() before calling driver's suspend() callback. And if the
> recovery_work was already started, it could lead to deadlock. This is also
> observed on the X1E80100 CRD.
>
> So to fix both issues, use pci_try_reset_function() in recovery_work. This
> function first checks for the availability of the device_lock() before
> trying to reset the device. If the lock is available, it will acquire it
> and reset the device. Otherwise, it will return -EAGAIN. If that happens,
> recovery_work will fail with the error message "Recovery failed" as not
> much could be done.
>
> Cc: stable@...r.kernel.org # 5.12
> Reported-by: Johan Hovold <johan@...nel.org>
> Closes: https://lore.kernel.org/mhi/Z1me8iaK7cwgjL92@hovoldconsulting.com
> Fixes: 7389337f0a78 ("mhi: pci_generic: Add suspend/resume/recovery procedure")
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
Reviewed-by: Loic Poulain <loic.poulain@...aro.org>
Powered by blists - more mailing lists