[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zw2CAbMozI8vu4SL@hu-mojha-hyd.qualcomm.com>
Date: Tue, 15 Oct 2024 02:11:37 +0530
From: Mukesh Ojha <quic_mojha@...cinc.com>
To: Bjorn Andersson <andersson@...nel.org>,
Mathieu Poirier
<mathieu.poirier@...aro.org>
CC: <linux-remoteproc@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] remoteproc: Add a new remoteproc state RPROC_DEFUNCT
On Tue, Oct 15, 2024 at 02:01:18AM +0530, Mukesh Ojha wrote:
> Multiple call to glink_subdev_stop() for the same remoteproc can happen
> if rproc_stop() fails from Process-A that leaves the rproc state to
> RPROC_CRASHED state later a call to recovery_store from user space in
> Process B triggers rproc_trigger_recovery() of the same remoteproc to
> recover it results in NULL pointer dereference issue in
> qcom_glink_smem_unregister().
>
> There is other side to this issue if we want to fix this via adding a
> NULL check on glink->edge which does not guarantees that the remoteproc
> will recover in second call from Process B as it has failed in the first
> Process A during SMC shutdown call and may again fail at the same call
> and rproc can not recover for such case.
>
> Add a new rproc state RPROC_DEFUNCT i.e., non recoverable state of
> remoteproc and the only way to recover from it via system restart.
>
> Process-A Process-B
>
> fatal error interrupt happens
>
> rproc_crash_handler_work()
> mutex_lock_interruptible(&rproc->lock);
> ...
>
> rproc->state = RPROC_CRASHED;
> ...
> mutex_unlock(&rproc->lock);
>
> rproc_trigger_recovery()
> mutex_lock_interruptible(&rproc->lock);
>
> adsp_stop()
> qcom_q6v5_pas 20c00000.remoteproc: failed to shutdown: -22
> remoteproc remoteproc3: can't stop rproc: -22
> mutex_unlock(&rproc->lock);
>
> echo enabled > /sys/class/remoteproc/remoteprocX/recovery
> recovery_store()
> rproc_trigger_recovery()
> mutex_lock_interruptible(&rproc->lock);
> rproc_stop()
> glink_subdev_stop()
> qcom_glink_smem_unregister() ==|
> |
> V
> Unable to handle kernel NULL pointer dereference
> at virtual address 0000000000000358
>
> Signed-off-by: Mukesh Ojha <quic_mojha@...cinc.com>
> ---
> Changes in v2:
> - Removed NULL pointer check instead added a new state to signify
> non-recoverable state of remoteproc.
>
> drivers/remoteproc/remoteproc_core.c | 3 ++-
> drivers/remoteproc/remoteproc_sysfs.c | 1 +
> include/linux/remoteproc.h | 5 ++++-
> 3 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index f276956f2c5c..494c8fcc63ca 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1727,6 +1727,7 @@ static int rproc_stop(struct rproc *rproc, bool crashed)
> /* power off the remote processor */
> ret = rproc->ops->stop(rproc);
> if (ret) {
> + rproc->state = RPROC_DEFUNCT;
I have put it here, but I am more inclined towards adding this
assignment in qcom remoteproc(pas) driver.
-Mukesh
> }
> @@ -1839,7 +1840,7 @@ int rproc_trigger_recovery(struct rproc *rproc)
> return ret;
>
> /* State could have changed before we got the mutex */
> - if (rproc->state != RPROC_CRASHED)
> + if (rproc_start == RPROC_DEFUNCT || rproc->state != RPROC_CRASHED)
> goto unlock_mutex;
>
> dev_err(dev, "recovering %s\n", rproc->name);
> diff --git a/drivers/remoteproc/remoteproc_sysfs.c b/drivers/remoteproc/remoteproc_sysfs.c
> index 138e752c5e4e..5f722b4576b2 100644
> --- a/drivers/remoteproc/remoteproc_sysfs.c
> +++ b/drivers/remoteproc/remoteproc_sysfs.c
> @@ -171,6 +171,7 @@ static const char * const rproc_state_string[] = {
> [RPROC_DELETED] = "deleted",
> [RPROC_ATTACHED] = "attached",
> [RPROC_DETACHED] = "detached",
> + [RPROC_DEFUNCT] = "defunct",
> [RPROC_LAST] = "invalid",
> };
>
> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> index b4795698d8c2..3e4ba06c6a9a 100644
> --- a/include/linux/remoteproc.h
> +++ b/include/linux/remoteproc.h
> @@ -417,6 +417,8 @@ struct rproc_ops {
> * has attached to it
> * @RPROC_DETACHED: device has been booted by another entity and waiting
> * for the core to attach to it
> + * @RPROC_DEFUNCT: device neither crashed nor responding to any of the
> + * requests and can only recover on system restart.
> * @RPROC_LAST: just keep this one at the end
> *
> * Please note that the values of these states are used as indices
> @@ -433,7 +435,8 @@ enum rproc_state {
> RPROC_DELETED = 4,
> RPROC_ATTACHED = 5,
> RPROC_DETACHED = 6,
> - RPROC_LAST = 7,
> + RPROC_DEFUNCT = 7,
> + RPROC_LAST = 8,
> };
>
> /**
> --
> 2.34.1
>
Powered by blists - more mailing lists