linux-kernel - Re: [PATCH v4] remoteproc: qcom: Fix NULL pointer issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8dd9291e-d960-4b3f-b7ea-d8181170f023@linaro.org>
Date: Fri, 28 Nov 2025 11:45:01 +0000
From: Bryan O'Donoghue <bryan.odonoghue@...aro.org>
To: Mukesh Ojha <mukesh.ojha@....qualcomm.com>,
 Bjorn Andersson <andersson@...nel.org>,
 Mathieu Poirier <mathieu.poirier@...aro.org>
Cc: linux-arm-msm@...r.kernel.org, linux-remoteproc@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] remoteproc: qcom: Fix NULL pointer issue

On 28/11/2025 10:32, Mukesh Ojha wrote:
> There is a scenario, when fatal interrupt triggers rproc crash handling
> while a user-space recovery is initiated in parallel. The overlapping
> recovery/stop sequences race on rproc state and subdevice teardown,
> resulting in a NULL pointer dereference in the GLINK SMEM unregister
> path.
> 
> 	Process-A                			Process-B
> 
>    fatal error interrupt happens
> 
>    rproc_crash_handler_work()
>      mutex_lock_interruptible(&rproc->lock);
>      ...
> 
>         rproc->state = RPROC_CRASHED;
>      ...
>      mutex_unlock(&rproc->lock);
> 
>      rproc_trigger_recovery()
>       mutex_lock_interruptible(&rproc->lock);
> 
>        qcom_pas_stop()
>        qcom_q6v5_pas 20c00000.remoteproc: failed to shutdown: -22
>        remoteproc remoteproc3: can't stop rproc: -22
>       mutex_unlock(&rproc->lock);
> 
> 						echo enabled > /sys/class/remoteproc/remoteprocX/recovery
> 						recovery_store()
> 						 rproc_trigger_recovery()
> 						  mutex_lock_interruptible(&rproc->lock);
> 						   rproc_stop()
> 						    glink_subdev_stop()
> 						      qcom_glink_smem_unregister() ==|
>                                                                                       |
>                                                                                       V
> 						      Unable to handle kernel NULL pointer dereference
>                                                                  at virtual address 0000000000000358
> 
> It is tempting to introduce a remoteproc state that could be set from
> the ->ops->stop() callback, which would have avoided the second attempt
> and prevented the crash. However, making remoteproc recovery dependent
> on manual intervention or a system reboot is not ideal. We should always
> try to recover the remote processor if possible. A failure in the
> ->ops->stop() callback might be temporary or caused by a timeout, and a
> recovery attempt could still succeed, as seen in similar scenarios.
> Therefore, instead of adding a restrictive state, let’s add a NULL check
> at the appropriate places to avoid a kernel crash and allow the system
> to move forward gracefully.
> 
> Signed-off-by: Mukesh Ojha <mukesh.ojha@....qualcomm.com>
> ---
> Changes in v4: https://lore.kernel.org/all/20241016045546.2613436-1-quic_mojha@quicinc.com/
>   - Brought the same change from v2.
>   - Added smd->edge NULL check.
>   - Rephrased the commit text.
> 
> Changes in v3:
>   - Fix kernel test reported error.
> 
> Changes in v2: https://lore.kernel.org/lkml/20240925103351.1628788-1-quic_mojha@quicinc.com/
>   - Removed NULL pointer check instead added a new state to signify
>     non-recoverable state of remoteproc.
> 
>   drivers/remoteproc/qcom_common.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c
> index 8c8688f99f0a..6480293d2f61 100644
> --- a/drivers/remoteproc/qcom_common.c
> +++ b/drivers/remoteproc/qcom_common.c
> @@ -209,6 +209,9 @@ static void glink_subdev_stop(struct rproc_subdev *subdev, bool crashed)
>   {
>   	struct qcom_rproc_glink *glink = to_glink_subdev(subdev);
> 
> +	if (!glink->edge)
> +		return;
> +
>   	qcom_glink_smem_unregister(glink->edge);
>   	glink->edge = NULL;
>   }
> @@ -320,6 +323,9 @@ static void smd_subdev_stop(struct rproc_subdev *subdev, bool crashed)
>   {
>   	struct qcom_rproc_subdev *smd = to_smd_subdev(subdev);
> 
> +	if (!smd->edge)
> +		return;
> +
>   	qcom_smd_unregister_edge(smd->edge);
>   	smd->edge = NULL;
>   }
> --
> 2.50.1
> 
> 

Since this fixes a real bug, you need a Fixes tag.

Once added.

Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@...aro.org>