lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 8 May 2020 15:47:56 -0600
From:   Mathieu Poirier <mathieu.poirier@...aro.org>
To:     Bjorn Andersson <bjorn.andersson@...aro.org>
Cc:     ohad@...ery.com, loic.pallardy@...com, arnaud.pouliquen@...com,
        s-anna@...com, linux-remoteproc@...r.kernel.org, corbet@....net,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 09/14] remoteproc: Deal with synchronisation when
 crashing

On Tue, May 05, 2020 at 06:01:56PM -0700, Bjorn Andersson wrote:
> On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
> 
> > Refactor function rproc_trigger_recovery() in order to avoid
> > reloading the firmware image when synchronising with a remote
> > processor rather than booting it.  Also part of the process,
> > properly set the synchronisation flag in order to properly
> > recover the system.
> > 
> > Signed-off-by: Mathieu Poirier <mathieu.poirier@...aro.org>
> > ---
> >  drivers/remoteproc/remoteproc_core.c     | 23 ++++++++++++++------
> >  drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++
> >  2 files changed, 43 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > index ef88d3e84bfb..3a84a38ba37b 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc)
> >   */
> >  int rproc_trigger_recovery(struct rproc *rproc)
> >  {
> > -	const struct firmware *firmware_p;
> > +	const struct firmware *firmware_p = NULL;
> >  	struct device *dev = &rproc->dev;
> >  	int ret;
> >  
> > @@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
> >  	/* generate coredump */
> >  	rproc_coredump(rproc);
> >  
> > -	/* load firmware */
> > -	ret = request_firmware(&firmware_p, rproc->firmware, dev);
> > -	if (ret < 0) {
> > -		dev_err(dev, "request_firmware failed: %d\n", ret);
> > -		goto unlock_mutex;
> > +	/* load firmware if need be */
> > +	if (!rproc_needs_syncing(rproc)) {
> > +		ret = request_firmware(&firmware_p, rproc->firmware, dev);
> > +		if (ret < 0) {
> > +			dev_err(dev, "request_firmware failed: %d\n", ret);
> > +			goto unlock_mutex;
> > +		}
> >  	}
> >  
> > -	/* boot the remote processor up again */
> > +	/* boot up or synchronise with the remote processor again */
> >  	ret = rproc_start(rproc, firmware_p);
> >  
> >  	release_firmware(firmware_p);
> > @@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work)
> >  	dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt,
> >  		rproc->name);
> >  
> > +	/*
> > +	 * The remote processor has crashed - tell the core what operation
> > +	 * to use from hereon, i.e whether an external entity will reboot
> > +	 * the MCU or it is now the remoteproc core's responsability.
> > +	 */
> > +	rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED);
> 
> If I follow the logic correctly, you're essentially using
> rproc->sync_with_rproc to pass an additional parameter down through
> rproc_trigger_recovery() to tell everyone below to "load firmware and
> boot the core or not".

I am using the value of rproc::sync_flags::after_crash to set
rproc->sync_with_rproc.  That way the core can know whether it should boot the
remote processor or synchronise with it.

> 
> And given that the comment alludes to some unknown logic determining the
> continuation I think it would be much preferable to essentially just
> pass rproc->sync_flags.after_crash down through these functions.
>

The only thing we need to do is set the value of rproc->sync_with_rproc
properly, which rproc_set_sync_flag() does.  I have decided to use a wrapper
function to allow us to change how the rproc->sync_with_rproc is handled without
touching anything else in the code.
 
> 
> And per my comment on a previous patch, is there any synchronization
> with the remote controller when this happens?

I can't seem to find that comment - can you indicate which patch that was?  As
it is today the core doesn't provide synchronisation, it is up to the platform
driver to probe the remote processor to make sure it is up.  I suppose
sync_ops::start() would be a perfect candidate for that.   

> 
> Regards,
> Bjorn
> 
> > +
> >  	mutex_unlock(&rproc->lock);
> >  
> >  	if (!rproc->recovery_disabled)
> > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> > index 3985c084b184..61500981155c 100644
> > --- a/drivers/remoteproc/remoteproc_internal.h
> > +++ b/drivers/remoteproc/remoteproc_internal.h
> > @@ -24,6 +24,33 @@ struct rproc_debug_trace {
> >  	struct rproc_mem_entry trace_mem;
> >  };
> >  
> > +/*
> > + * enum rproc_sync_states - remote processsor sync states
> > + *
> > + * @RPROC_SYNC_STATE_CRASHED	state to use after the remote processor
> > + *				has crashed but has not been recovered by
> > + *				the remoteproc core yet.
> > + *
> > + * Keeping these separate from the enum rproc_state in order to avoid
> > + * introducing coupling between the state of the MCU and the synchronisation
> > + * operation to use.
> > + */
> > +enum rproc_sync_states {
> > +	RPROC_SYNC_STATE_CRASHED,
> > +};
> > +
> > +static inline void rproc_set_sync_flag(struct rproc *rproc,
> > +				       enum rproc_sync_states state)
> > +{
> > +	switch (state) {
> > +	case RPROC_SYNC_STATE_CRASHED:
> > +		rproc->sync_with_rproc = rproc->sync_flags.after_crash;
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +}
> > +
> >  /* from remoteproc_core.c */
> >  void rproc_release(struct kref *kref);
> >  irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id);
> > -- 
> > 2.20.1
> > 

Powered by blists - more mailing lists