linux-kernel - Re: [PATCH V7 2/2] remoteproc: support attach recovery after rproc crash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220927174438.GA2883698@p14s>
Date:   Tue, 27 Sep 2022 11:44:38 -0600
From:   Mathieu Poirier <mathieu.poirier@...aro.org>
To:     Arnaud POULIQUEN <arnaud.pouliquen@...s.st.com>
Cc:     Peng Fan <peng.fan@....com>,
        "Peng Fan (OSS)" <peng.fan@....nxp.com>,
        Bjorn Andersson <andersson@...nel.org>,
        "bjorn.andersson@...aro.org" <bjorn.andersson@...aro.org>,
        "linux-remoteproc@...r.kernel.org" <linux-remoteproc@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH V7 2/2] remoteproc: support attach recovery after rproc
 crash

On Tue, Sep 27, 2022 at 10:10:31AM +0200, Arnaud POULIQUEN wrote:
> Hi,
> 
> On 9/27/22 05:03, Peng Fan wrote:
> > Hi Mathieu,
> > 
> >> Subject: Re: [PATCH V7 2/2] remoteproc: support attach recovery after rproc
> >> crash
> >>
> >> On Tue, Jul 05, 2022 at 09:15:27AM +0800, Peng Fan (OSS) wrote:
> >>> From: Peng Fan <peng.fan@....com>
> >>>
> >>> Current logic only support main processor to stop/start the remote
> >>> processor after crash. However to SoC, such as i.MX8QM/QXP, the remote
> >>> processor could do attach recovery after crash and trigger watchdog to
> >>> reboot itself. It does not need main processor to load image, or
> >>> stop/start remote processor.
> >>>
> >>> Introduce two functions: rproc_attach_recovery, rproc_boot_recovery
> >>> for the two cases. Boot recovery is as before, let main processor to
> >>> help recovery, while attach recovery is to recover itself without help.
> >>> To attach recovery, we only do detach and attach.
> >>>
> >>> Acked-by: Arnaud Pouliquen <arnaud.pouliquen@...s.st.com>
> >>> Signed-off-by: Peng Fan <peng.fan@....com>
> >>> ---
> >>>  drivers/remoteproc/remoteproc_core.c | 62
> >>> +++++++++++++++++++---------
> >>>  1 file changed, 43 insertions(+), 19 deletions(-)
> >>>
> >>> diff --git a/drivers/remoteproc/remoteproc_core.c
> >>> b/drivers/remoteproc/remoteproc_core.c
> >>> index ed374c8bf14a..ef5b9310bc83 100644
> >>> --- a/drivers/remoteproc/remoteproc_core.c
> >>> +++ b/drivers/remoteproc/remoteproc_core.c
> >>> @@ -1884,6 +1884,45 @@ static int __rproc_detach(struct rproc *rproc)
> >>>  	return 0;
> >>>  }
> >>>
> >>> +static int rproc_attach_recovery(struct rproc *rproc) {
> >>> +	int ret;
> >>> +
> >>> +	ret = __rproc_detach(rproc);
> >>> +	if (ret)
> >>> +		return ret;
> >>
> >> I thought there was a specific reason to _not_ call rproc->ops->coredump()
> >> for processors that have been attached to but looking at the STM32 and
> >> IMX_DSP now, it would seem logical to do so.  Am I missing something?
> > 
> > ATTACH RECOVERY is to support recovery without help from Linux,
> > 
> > STM32 and IMX_DSP, both require linux to load image and start remote
> > core. So the two cases should not enable feature:
> > RPROC_FEAT_ATTACH_ON_RECOVERY
> > 
> > Also considering the recovery is out of linux control, actually when linux
> > start dump, remote core may already recovered. 
> 
> I asked myself the same question. Indeed how to manage a core dump if the
> remote processor restarts autonomously.
> The answer doesn't seem obvious because it seems to be platform specific.
> 
> For time being on STM32 we consider that the remoteproc memory can be corrupted
> so we don't plan to enable the feature by default even if the hardware allows it.
> 
> If we implement it, I would see 2 use cases:
> - no core dump, the remote processor restart autonomously (need to manage the
> VIRTIO_CONFIG_S_NEEDS_RESET in resource table vdev for resynchronization)
> - core dump and the Linux stm32 driver handle the reset of the remote
> processor core to be able to perform the core dump (no firmware loading)
> 
> What about dealing with the coredump in a separate thread, based on a concrete
> use case/need?

Definitely, we can deal with that later.

Peng - please send me a rebase as quickly as possible.

> 
> Regards,
> Arnaud
>  
> > 
> >>
> >> And this set will need a rebase.
> > 
> > I'll do the rebase and send V8 if the upper explanation could eliminate
> > your concern.
> > 
> > Thanks,
> > Peng.
> > 
> >>
> >> Thanks,
> >> Mathieu
> >>
> >>> +
> >>> +	return __rproc_attach(rproc);
> >>> +}
> >>> +
> >>> +static int rproc_boot_recovery(struct rproc *rproc) {
> >>> +	const struct firmware *firmware_p;
> >>> +	struct device *dev = &rproc->dev;
> >>> +	int ret;
> >>> +
> >>> +	ret = rproc_stop(rproc, true);
> >>> +	if (ret)
> >>> +		return ret;
> >>> +
> >>> +	/* generate coredump */
> >>> +	rproc->ops->coredump(rproc);
> >>> +
> >>> +	/* load firmware */
> >>> +	ret = request_firmware(&firmware_p, rproc->firmware, dev);
> >>> +	if (ret < 0) {
> >>> +		dev_err(dev, "request_firmware failed: %d\n", ret);
> >>> +		return ret;
> >>> +	}
> >>> +
> >>> +	/* boot the remote processor up again */
> >>> +	ret = rproc_start(rproc, firmware_p);
> >>> +
> >>> +	release_firmware(firmware_p);
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>>  /**
> >>>   * rproc_trigger_recovery() - recover a remoteproc
> >>>   * @rproc: the remote processor
> >>> @@ -1898,7 +1937,6 @@ static int __rproc_detach(struct rproc *rproc)
> >>>   */
> >>>  int rproc_trigger_recovery(struct rproc *rproc)  {
> >>> -	const struct firmware *firmware_p;
> >>>  	struct device *dev = &rproc->dev;
> >>>  	int ret;
> >>>
> >>> @@ -1912,24 +1950,10 @@ int rproc_trigger_recovery(struct rproc
> >>> *rproc)
> >>>
> >>>  	dev_err(dev, "recovering %s\n", rproc->name);
> >>>
> >>> -	ret = rproc_stop(rproc, true);
> >>> -	if (ret)
> >>> -		goto unlock_mutex;
> >>> -
> >>> -	/* generate coredump */
> >>> -	rproc->ops->coredump(rproc);
> >>> -
> >>> -	/* load firmware */
> >>> -	ret = request_firmware(&firmware_p, rproc->firmware, dev);
> >>> -	if (ret < 0) {
> >>> -		dev_err(dev, "request_firmware failed: %d\n", ret);
> >>> -		goto unlock_mutex;
> >>> -	}
> >>> -
> >>> -	/* boot the remote processor up again */
> >>> -	ret = rproc_start(rproc, firmware_p);
> >>> -
> >>> -	release_firmware(firmware_p);
> >>> +	if (rproc_has_feature(rproc, RPROC_FEAT_ATTACH_ON_RECOVERY))
> >>> +		ret = rproc_attach_recovery(rproc);
> >>> +	else
> >>> +		ret = rproc_boot_recovery(rproc);
> >>>
> >>>  unlock_mutex:
> >>>  	mutex_unlock(&rproc->lock);
> >>> --
> >>> 2.25.1
> >>>