lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221013173442.GA1279972@p14s>
Date:   Thu, 13 Oct 2022 11:34:42 -0600
From:   Mathieu Poirier <mathieu.poirier@...aro.org>
To:     "Aiqun(Maria) Yu" <quic_aiquny@...cinc.com>
Cc:     linux-remoteproc@...r.kernel.org, linux-arm-msm@...r.kernel.org,
        linux-kernel@...r.kernel.org, quic_clew@...cinc.com
Subject: Re: [PATCH v4] remoteproc: core: do pm relax when in RPROC_OFFLINE

On Thu, Oct 13, 2022 at 09:40:09AM +0800, Aiqun(Maria) Yu wrote:
> Hi Mathieu,
> 
> On 10/13/2022 4:43 AM, Mathieu Poirier wrote:
> > Please add what has changed from one version to another, either in a cover
> > letter or after the "Signed-off-by".  There are many examples on how to do that
> > on the mailing list.
> > 
> Thx for the information, will take a note and benefit for next time.
> 
> > On Fri, Sep 16, 2022 at 03:12:31PM +0800, Maria Yu wrote:
> > > RPROC_OFFLINE state indicate there is no recovery process
> > > is in progress and no chance to do the pm_relax.
> > > Because when recovering from crash, rproc->lock is held and
> > > state is RPROC_CRASHED -> RPROC_OFFLINE -> RPROC_RUNNING,
> > > and then unlock rproc->lock.
> > 
> > You are correct - because the lock is held rproc->state should be set to RPROC_RUNNING
> > when rproc_trigger_recovery() returns.  If that is not the case then something
> > went wrong.
> > 
> > Function rproc_stop() sets rproc->state to RPROC_OFFLINE just before returning,
> > so we know the remote processor was stopped.  Therefore if rproc->state is set
> > to RPROC_OFFLINE something went wrong in either request_firmware() or
> > rproc_start().  Either way the remote processor is offline and the system probably
> > in an unknown/unstable.  As such I don't see how calling pm_relax() can help
> > things along.
> > 
> PROC_OFFLINE is possible that rproc_shutdown is triggered and successfully
> finished.
> Even if it is multi crash rproc_crash_handler_work contention issue, and
> last rproc_trigger_recovery bailed out with only
> rproc->state==RPROC_OFFLINE, it is still worth to do pm_relax in pair.
> Since the subsystem may still can be recovered with customer's next trigger
> of rproc_start, and we can make each error out path clean with pm resources.
> 
> > I suggest spending time understanding what leads to the failure when recovering
> > from a crash and address that problem(s).
> > 
> In current case, the customer's information is that the issue happened when
> rproc_shutdown is triggered at similar time. So not an issue from error out
> of rproc_trigger_recovery.

That is a very important element to consider and should have been mentioned from
the beginning.  What I see happening is the following:

rproc_report_crash()
        pm_stay_awake()
        queue_work() // current thread is suspended

rproc_shutdown()
        rproc_stop()
                rproc->state = RPROC_OFFLINE;

rproc_crash_handler_work()
        if (rproc->state == RPROC_OFFLINE)
                return // pm_relax() is not called

The right way to fix this is to add a pm_relax() in rproc_shutdown() and
rproc_detach(), along with a very descriptive comment as to why it is needed.


> > Thanks,
> > Mathieu
> > 
> > 
> > > When the state is in RPROC_OFFLINE it means separate request
> > > of rproc_stop was done and no need to hold the wakeup source
> > > in crash handler to recover any more.
> > > 
> > > Signed-off-by: Maria Yu <quic_aiquny@...cinc.com>
> > > ---
> > >   drivers/remoteproc/remoteproc_core.c | 11 +++++++++++
> > >   1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > > index e5279ed9a8d7..6bc7b8b7d01e 100644
> > > --- a/drivers/remoteproc/remoteproc_core.c
> > > +++ b/drivers/remoteproc/remoteproc_core.c
> > > @@ -1956,6 +1956,17 @@ static void rproc_crash_handler_work(struct work_struct *work)
> > >   	if (rproc->state == RPROC_CRASHED || rproc->state == RPROC_OFFLINE) {
> > >   		/* handle only the first crash detected */
> > >   		mutex_unlock(&rproc->lock);
> > > +		/*
> > > +		 * RPROC_OFFLINE state indicate there is no recovery process
> > > +		 * is in progress and no chance to have pm_relax in place.
> > > +		 * Because when recovering from crash, rproc->lock is held and
> > > +		 * state is RPROC_CRASHED -> RPROC_OFFLINE -> RPROC_RUNNING,
> > > +		 * and then unlock rproc->lock.
> > > +		 * RPROC_OFFLINE is only an intermediate state in recovery
> > > +		 * process.
> > > +		 */
> > > +		if (rproc->state == RPROC_OFFLINE)
> > > +			pm_relax(rproc->dev.parent);
> > >   		return;
> > >   	}
> > > -- 
> > > 2.7.4
> > > 
> 
> 
> -- 
> Thx and BRs,
> Aiqun(Maria) Yu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ