linux-kernel - Re: [PATCH v4] remoteproc: core: do pm relax when in RPROC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8807a9a6-d93d-aef5-15f4-88648a6ecbe2@quicinc.com>
Date:   Thu, 20 Oct 2022 13:52:05 +0800
From:   "Aiqun(Maria) Yu" <quic_aiquny@...cinc.com>
To:     Mathieu Poirier <mathieu.poirier@...aro.org>
CC:     <linux-remoteproc@...r.kernel.org>,
        <linux-arm-msm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <quic_clew@...cinc.com>
Subject: Re: [PATCH v4] remoteproc: core: do pm relax when in RPROC_OFFLINE

On 10/14/2022 2:03 AM, Mathieu Poirier wrote:
> On Thu, Oct 13, 2022 at 11:34:42AM -0600, Mathieu Poirier wrote:
>> On Thu, Oct 13, 2022 at 09:40:09AM +0800, Aiqun(Maria) Yu wrote:
>>> Hi Mathieu,
>>>
>>> On 10/13/2022 4:43 AM, Mathieu Poirier wrote:
>>>> Please add what has changed from one version to another, either in a cover
>>>> letter or after the "Signed-off-by".  There are many examples on how to do that
>>>> on the mailing list.
>>>>
>>> Thx for the information, will take a note and benefit for next time.
>>>
>>>> On Fri, Sep 16, 2022 at 03:12:31PM +0800, Maria Yu wrote:
>>>>> RPROC_OFFLINE state indicate there is no recovery process
>>>>> is in progress and no chance to do the pm_relax.
>>>>> Because when recovering from crash, rproc->lock is held and
>>>>> state is RPROC_CRASHED -> RPROC_OFFLINE -> RPROC_RUNNING,
>>>>> and then unlock rproc->lock.
>>>>
>>>> You are correct - because the lock is held rproc->state should be set to RPROC_RUNNING
>>>> when rproc_trigger_recovery() returns.  If that is not the case then something
>>>> went wrong.
>>>>
>>>> Function rproc_stop() sets rproc->state to RPROC_OFFLINE just before returning,
>>>> so we know the remote processor was stopped.  Therefore if rproc->state is set
>>>> to RPROC_OFFLINE something went wrong in either request_firmware() or
>>>> rproc_start().  Either way the remote processor is offline and the system probably
>>>> in an unknown/unstable.  As such I don't see how calling pm_relax() can help
>>>> things along.
>>>>
>>> PROC_OFFLINE is possible that rproc_shutdown is triggered and successfully
>>> finished.
>>> Even if it is multi crash rproc_crash_handler_work contention issue, and
>>> last rproc_trigger_recovery bailed out with only
>>> rproc->state==RPROC_OFFLINE, it is still worth to do pm_relax in pair.
>>> Since the subsystem may still can be recovered with customer's next trigger
>>> of rproc_start, and we can make each error out path clean with pm resources.
>>>
>>>> I suggest spending time understanding what leads to the failure when recovering
>>>> from a crash and address that problem(s).
>>>>
>>> In current case, the customer's information is that the issue happened when
>>> rproc_shutdown is triggered at similar time. So not an issue from error out
>>> of rproc_trigger_recovery.
>>
>> That is a very important element to consider and should have been mentioned from
>> the beginning.  What I see happening is the following:
>>
>> rproc_report_crash()
>>          pm_stay_awake()
>>          queue_work() // current thread is suspended
>>
>> rproc_shutdown()
>>          rproc_stop()
>>                  rproc->state = RPROC_OFFLINE;
>>
>> rproc_crash_handler_work()
>>          if (rproc->state == RPROC_OFFLINE)
>>                  return // pm_relax() is not called
>>
>> The right way to fix this is to add a pm_relax() in rproc_shutdown() and
>> rproc_detach(), along with a very descriptive comment as to why it is needed.
> 
> Thinking about this further there are more ramifications to consider.  Please
> confirm the above scenario is what you are facing.  I will advise on how to move
> forward if that is the case.
> 
Not sure if the situation is clear or not. So resend the email again.

The above senario is what customer is facing. crash hanppened while at 
the same time shutdown is triggered.
And the device cannto goes to suspend state after that.
the subsystem can still be start normally after this.

>>
>>
>>>> Thanks,
>>>> Mathieu
>>>>
>>>>
>>>>> When the state is in RPROC_OFFLINE it means separate request
>>>>> of rproc_stop was done and no need to hold the wakeup source
>>>>> in crash handler to recover any more.
>>>>>
>>>>> Signed-off-by: Maria Yu <quic_aiquny@...cinc.com>
>>>>> ---
>>>>>    drivers/remoteproc/remoteproc_core.c | 11 +++++++++++
>>>>>    1 file changed, 11 insertions(+)
>>>>>
>>>>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>>>>> index e5279ed9a8d7..6bc7b8b7d01e 100644
>>>>> --- a/drivers/remoteproc/remoteproc_core.c
>>>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>>>> @@ -1956,6 +1956,17 @@ static void rproc_crash_handler_work(struct work_struct *work)
>>>>>    	if (rproc->state == RPROC_CRASHED || rproc->state == RPROC_OFFLINE) {
>>>>>    		/* handle only the first crash detected */
>>>>>    		mutex_unlock(&rproc->lock);
>>>>> +		/*
>>>>> +		 * RPROC_OFFLINE state indicate there is no recovery process
>>>>> +		 * is in progress and no chance to have pm_relax in place.
>>>>> +		 * Because when recovering from crash, rproc->lock is held and
>>>>> +		 * state is RPROC_CRASHED -> RPROC_OFFLINE -> RPROC_RUNNING,
>>>>> +		 * and then unlock rproc->lock.
>>>>> +		 * RPROC_OFFLINE is only an intermediate state in recovery
>>>>> +		 * process.
>>>>> +		 */
>>>>> +		if (rproc->state == RPROC_OFFLINE)
>>>>> +			pm_relax(rproc->dev.parent);
>>>>>    		return;
>>>>>    	}
>>>>> -- 
>>>>> 2.7.4
>>>>>
>>>
>>>
>>> -- 
>>> Thx and BRs,
>>> Aiqun(Maria) Yu


-- 
Thx and BRs,
Aiqun(Maria) Yu