[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251128103240.1723386-1-mukesh.ojha@oss.qualcomm.com>
Date: Fri, 28 Nov 2025 16:02:40 +0530
From: Mukesh Ojha <mukesh.ojha@....qualcomm.com>
To: Bjorn Andersson <andersson@...nel.org>,
Mathieu Poirier <mathieu.poirier@...aro.org>
Cc: linux-arm-msm@...r.kernel.org, linux-remoteproc@...r.kernel.org,
linux-kernel@...r.kernel.org,
Mukesh Ojha <mukesh.ojha@....qualcomm.com>
Subject: [PATCH v4] remoteproc: qcom: Fix NULL pointer issue
There is a scenario, when fatal interrupt triggers rproc crash handling
while a user-space recovery is initiated in parallel. The overlapping
recovery/stop sequences race on rproc state and subdevice teardown,
resulting in a NULL pointer dereference in the GLINK SMEM unregister
path.
Process-A Process-B
fatal error interrupt happens
rproc_crash_handler_work()
mutex_lock_interruptible(&rproc->lock);
...
rproc->state = RPROC_CRASHED;
...
mutex_unlock(&rproc->lock);
rproc_trigger_recovery()
mutex_lock_interruptible(&rproc->lock);
qcom_pas_stop()
qcom_q6v5_pas 20c00000.remoteproc: failed to shutdown: -22
remoteproc remoteproc3: can't stop rproc: -22
mutex_unlock(&rproc->lock);
echo enabled > /sys/class/remoteproc/remoteprocX/recovery
recovery_store()
rproc_trigger_recovery()
mutex_lock_interruptible(&rproc->lock);
rproc_stop()
glink_subdev_stop()
qcom_glink_smem_unregister() ==|
|
V
Unable to handle kernel NULL pointer dereference
at virtual address 0000000000000358
It is tempting to introduce a remoteproc state that could be set from
the ->ops->stop() callback, which would have avoided the second attempt
and prevented the crash. However, making remoteproc recovery dependent
on manual intervention or a system reboot is not ideal. We should always
try to recover the remote processor if possible. A failure in the
->ops->stop() callback might be temporary or caused by a timeout, and a
recovery attempt could still succeed, as seen in similar scenarios.
Therefore, instead of adding a restrictive state, let’s add a NULL check
at the appropriate places to avoid a kernel crash and allow the system
to move forward gracefully.
Signed-off-by: Mukesh Ojha <mukesh.ojha@....qualcomm.com>
---
Changes in v4: https://lore.kernel.org/all/20241016045546.2613436-1-quic_mojha@quicinc.com/
- Brought the same change from v2.
- Added smd->edge NULL check.
- Rephrased the commit text.
Changes in v3:
- Fix kernel test reported error.
Changes in v2: https://lore.kernel.org/lkml/20240925103351.1628788-1-quic_mojha@quicinc.com/
- Removed NULL pointer check instead added a new state to signify
non-recoverable state of remoteproc.
drivers/remoteproc/qcom_common.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c
index 8c8688f99f0a..6480293d2f61 100644
--- a/drivers/remoteproc/qcom_common.c
+++ b/drivers/remoteproc/qcom_common.c
@@ -209,6 +209,9 @@ static void glink_subdev_stop(struct rproc_subdev *subdev, bool crashed)
{
struct qcom_rproc_glink *glink = to_glink_subdev(subdev);
+ if (!glink->edge)
+ return;
+
qcom_glink_smem_unregister(glink->edge);
glink->edge = NULL;
}
@@ -320,6 +323,9 @@ static void smd_subdev_stop(struct rproc_subdev *subdev, bool crashed)
{
struct qcom_rproc_subdev *smd = to_smd_subdev(subdev);
+ if (!smd->edge)
+ return;
+
qcom_smd_unregister_edge(smd->edge);
smd->edge = NULL;
}
--
2.50.1
Powered by blists - more mailing lists