[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8dd9291e-d960-4b3f-b7ea-d8181170f023@linaro.org>
Date: Fri, 28 Nov 2025 11:45:01 +0000
From: Bryan O'Donoghue <bryan.odonoghue@...aro.org>
To: Mukesh Ojha <mukesh.ojha@....qualcomm.com>,
Bjorn Andersson <andersson@...nel.org>,
Mathieu Poirier <mathieu.poirier@...aro.org>
Cc: linux-arm-msm@...r.kernel.org, linux-remoteproc@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] remoteproc: qcom: Fix NULL pointer issue
On 28/11/2025 10:32, Mukesh Ojha wrote:
> There is a scenario, when fatal interrupt triggers rproc crash handling
> while a user-space recovery is initiated in parallel. The overlapping
> recovery/stop sequences race on rproc state and subdevice teardown,
> resulting in a NULL pointer dereference in the GLINK SMEM unregister
> path.
>
> Process-A Process-B
>
> fatal error interrupt happens
>
> rproc_crash_handler_work()
> mutex_lock_interruptible(&rproc->lock);
> ...
>
> rproc->state = RPROC_CRASHED;
> ...
> mutex_unlock(&rproc->lock);
>
> rproc_trigger_recovery()
> mutex_lock_interruptible(&rproc->lock);
>
> qcom_pas_stop()
> qcom_q6v5_pas 20c00000.remoteproc: failed to shutdown: -22
> remoteproc remoteproc3: can't stop rproc: -22
> mutex_unlock(&rproc->lock);
>
> echo enabled > /sys/class/remoteproc/remoteprocX/recovery
> recovery_store()
> rproc_trigger_recovery()
> mutex_lock_interruptible(&rproc->lock);
> rproc_stop()
> glink_subdev_stop()
> qcom_glink_smem_unregister() ==|
> |
> V
> Unable to handle kernel NULL pointer dereference
> at virtual address 0000000000000358
>
> It is tempting to introduce a remoteproc state that could be set from
> the ->ops->stop() callback, which would have avoided the second attempt
> and prevented the crash. However, making remoteproc recovery dependent
> on manual intervention or a system reboot is not ideal. We should always
> try to recover the remote processor if possible. A failure in the
> ->ops->stop() callback might be temporary or caused by a timeout, and a
> recovery attempt could still succeed, as seen in similar scenarios.
> Therefore, instead of adding a restrictive state, let’s add a NULL check
> at the appropriate places to avoid a kernel crash and allow the system
> to move forward gracefully.
>
> Signed-off-by: Mukesh Ojha <mukesh.ojha@....qualcomm.com>
> ---
> Changes in v4: https://lore.kernel.org/all/20241016045546.2613436-1-quic_mojha@quicinc.com/
> - Brought the same change from v2.
> - Added smd->edge NULL check.
> - Rephrased the commit text.
>
> Changes in v3:
> - Fix kernel test reported error.
>
> Changes in v2: https://lore.kernel.org/lkml/20240925103351.1628788-1-quic_mojha@quicinc.com/
> - Removed NULL pointer check instead added a new state to signify
> non-recoverable state of remoteproc.
>
> drivers/remoteproc/qcom_common.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c
> index 8c8688f99f0a..6480293d2f61 100644
> --- a/drivers/remoteproc/qcom_common.c
> +++ b/drivers/remoteproc/qcom_common.c
> @@ -209,6 +209,9 @@ static void glink_subdev_stop(struct rproc_subdev *subdev, bool crashed)
> {
> struct qcom_rproc_glink *glink = to_glink_subdev(subdev);
>
> + if (!glink->edge)
> + return;
> +
> qcom_glink_smem_unregister(glink->edge);
> glink->edge = NULL;
> }
> @@ -320,6 +323,9 @@ static void smd_subdev_stop(struct rproc_subdev *subdev, bool crashed)
> {
> struct qcom_rproc_subdev *smd = to_smd_subdev(subdev);
>
> + if (!smd->edge)
> + return;
> +
> qcom_smd_unregister_edge(smd->edge);
> smd->edge = NULL;
> }
> --
> 2.50.1
>
>
Since this fixes a real bug, you need a Fixes tag.
Once added.
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@...aro.org>
Powered by blists - more mailing lists