linux-kernel - [PATCH 2/2] bus: mhi: host: pci_generic: Recover the device synchronously from mhi_pci_runtime

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250108-mhi_recovery_fix-v1-2-a0a00a17da46@linaro.org>
Date: Wed, 08 Jan 2025 19:09:28 +0530
From: Manivannan Sadhasivam via B4 Relay <devnull+manivannan.sadhasivam.linaro.org@...nel.org>
To: mhi@...ts.linux.dev, Loic Poulain <loic.poulain@...aro.org>
Cc: Johan Hovold <johan@...nel.org>, linux-arm-msm@...r.kernel.org, 
 linux-kernel@...r.kernel.org, 
 Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>, 
 stable@...r.kernel.org
Subject: [PATCH 2/2] bus: mhi: host: pci_generic: Recover the device
 synchronously from mhi_pci_runtime_resume()

From: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>

Currently, in mhi_pci_runtime_resume(), if the resume fails, recovery_work
is started asynchronously and success is returned. But this doesn't align
with what PM core expects as documented in
Documentation/power/runtime_pm.rst:

"Once the subsystem-level resume callback (or the driver resume callback,
if invoked directly) has completed successfully, the PM core regards the
device as fully operational, which means that the device _must_ be able to
complete I/O operations as needed.  The runtime PM status of the device is
then 'active'."

So the PM core ends up marking the runtime PM status of the device as
'active', even though the device is not able to handle the I/O operations.
This same condition more or less applies to system resume as well.

So to avoid this ambiguity, try to recover the device synchronously from
mhi_pci_runtime_resume() and return the actual error code in the case of
recovery failure.

For doing so, move the recovery code to __mhi_pci_recovery_work() helper
and call that from both mhi_pci_recovery_work() and
mhi_pci_runtime_resume(). Former still ignores the return value, while the
latter passes it to PM core.

Cc: stable@...r.kernel.org # 5.13
Reported-by: Johan Hovold <johan@...nel.org>
Closes: https://lore.kernel.org/mhi/Z2PbEPYpqFfrLSJi@hovoldconsulting.com
Fixes: d3800c1dce24 ("bus: mhi: pci_generic: Add support for runtime PM")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
---
 drivers/bus/mhi/host/pci_generic.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/bus/mhi/host/pci_generic.c b/drivers/bus/mhi/host/pci_generic.c
index e92df380c785..f6de407e077e 100644
--- a/drivers/bus/mhi/host/pci_generic.c
+++ b/drivers/bus/mhi/host/pci_generic.c
@@ -997,10 +997,8 @@ static void mhi_pci_runtime_put(struct mhi_controller *mhi_cntrl)
 	pm_runtime_put(mhi_cntrl->cntrl_dev);
 }
 
-static void mhi_pci_recovery_work(struct work_struct *work)
+static int __mhi_pci_recovery_work(struct mhi_pci_device *mhi_pdev)
 {
-	struct mhi_pci_device *mhi_pdev = container_of(work, struct mhi_pci_device,
-						       recovery_work);
 	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
 	struct pci_dev *pdev = to_pci_dev(mhi_cntrl->cntrl_dev);
 	int err;
@@ -1035,13 +1033,25 @@ static void mhi_pci_recovery_work(struct work_struct *work)
 
 	set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status);
 	mod_timer(&mhi_pdev->health_check_timer, jiffies + HEALTH_CHECK_PERIOD);
-	return;
+
+	return 0;
 
 err_unprepare:
 	mhi_unprepare_after_power_down(mhi_cntrl);
 err_try_reset:
-	if (pci_try_reset_function(pdev))
+	err = pci_try_reset_function(pdev);
+	if (err)
 		dev_err(&pdev->dev, "Recovery failed\n");
+
+	return err;
+}
+
+static void mhi_pci_recovery_work(struct work_struct *work)
+{
+	struct mhi_pci_device *mhi_pdev = container_of(work, struct mhi_pci_device,
+						       recovery_work);
+
+	__mhi_pci_recovery_work(mhi_pdev);
 }
 
 static void health_check(struct timer_list *t)
@@ -1400,15 +1410,10 @@ static int __maybe_unused mhi_pci_runtime_resume(struct device *dev)
 	return 0;
 
 err_recovery:
-	/* Do not fail to not mess up our PCI device state, the device likely
-	 * lost power (d3cold) and we simply need to reset it from the recovery
-	 * procedure, trigger the recovery asynchronously to prevent system
-	 * suspend exit delaying.
-	 */
-	queue_work(system_long_wq, &mhi_pdev->recovery_work);
+	err = __mhi_pci_recovery_work(mhi_pdev);
 	pm_runtime_mark_last_busy(dev);
 
-	return 0;
+	return err;
 }
 
 static int  __maybe_unused mhi_pci_suspend(struct device *dev)

-- 
2.25.1