linux-kernel - Re: [PATCH v4] remoteproc: core: do pm relax when in RPROC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 11 Nov 2022 08:52:11 +0800
From:   "Aiqun(Maria) Yu" <quic_aiquny@...cinc.com>
To:     Mathieu Poirier <mathieu.poirier@...aro.org>
CC:     Arnaud POULIQUEN <arnaud.pouliquen@...s.st.com>,
        <linux-remoteproc@...r.kernel.org>,
        <linux-arm-msm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <quic_clew@...cinc.com>
Subject: Re: [PATCH v4] remoteproc: core: do pm relax when in RPROC_OFFLINE

On 11/11/2022 4:50 AM, Mathieu Poirier wrote:
> I had a couple of good discussions with our power management expert
> and even then, the way forward isn't as clear as I would have liked.
> I am currently travelling and as such don't have the required time to
> go into greater details, something I will be doing next week.
> 
Thx Mathieu for the info updated.
I'll wait for your update next week then.
let me know any initial questions that you have, perhaps I can also 
discuss that with our power team at the same time.
> On Sun, 6 Nov 2022 at 18:14, Aiqun(Maria) Yu <quic_aiquny@...cinc.com> wrote:
>>
>> Hi,
>> On 11/4/2022 11:59 PM, Mathieu Poirier wrote:
>>> On Thu, Nov 03, 2022 at 10:03:49AM +0800, Aiqun(Maria) Yu wrote:
>>>> On 11/3/2022 2:03 AM, Mathieu Poirier wrote:
>>>>> On Wed, Nov 02, 2022 at 06:53:49PM +0800, Aiqun(Maria) Yu wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Let me think about this carefully.
>>>>>>
>>>>>> When in RPROC_RECOVERY_FAIL case we want to re-do the recovery process again
>>>>>> or just leave the pm_relax?
>>>>>
>>>>> Neither.
>>>>>
>>>>> When a recovery fail we don't want to call pm_relax().  The code in
>>>>> rproc_crash_handler_work() becomes:
>>>>>
>>>>>      if (rproc->state == RPROC_OFFLINE) {
>>>>>                    /* We have raced with rproc_shutdown() */
>>>>>                    pm_relax()
>>>>>              mutex_unlock(&rproc->lock);
>>>>>              return;
>>>>>            }
>>>>>
>>>>>      if (rproc->state == RPROC_CRASHED ||
>>>>>                rproc->state == RPROC_RECOVERY_FAILED) {
>>>>>              /* handle only the first crash detected */
>>>>>              mutex_unlock(&rproc->lock);
>>>>>              return;
>>>>>      }
>>>>>
>>>>>
>>>>> RPROC_RECOVERY_FAILED gets set in rproc_boot_recovery() if request_firmware() or
>>>>> rproc_start() fail.  Function rproc_trigger_recovery() needs to allow for the
>>>>> recovery the the remote processor is in RPROC_RECOVERY_FAILED state.  As such
>>>>> the condition becomes:
>>>>>
>>>>>            /* State could have changed before we got the mutex */
>>>>>      if (rproc->state != RPROC_CRASHED &&
>>>>>                rproc->state != RPROC_RECOVERY_FAILED)
>>>>>              goto unlock_mutex;
>>>>>
>>>>> Start with that and we can look at corner cases (if some exists) with a fresh
>>>>> patchset.  Note that I have not addressed the attach/detach() scenario in the
>>>>> above.
>>>>
>>>> If we didn't deal with the recovery failed case with correct pm_relax call,
>>>> it may left the device in a state that cannot enter to suspend state.
>>>
>>> That is what I am looking for.  We don't want to give the impression that
>>> everything is fine by allowing the device to suspend.  If the remote processor
>>> can't be recovered than it needs to be dealth with.
>> For the normal recovery failed case, it still need to do pm_relax to not
>> prevent the device goes to suspend. It is what in normal recovery failed
>> case we do in rproc_crash_handler_work as well.
>> rproc_crash_handler_work will not check the result of the
>> rproc_trigger_recovery return value, and will always do pm_relax.
>>
>> For current conconrency cornor case as well, it is better to consistant
>> with the current design of recovery fail senarios in normal cases.
>>
>> I personally agree that we shouldn't do nothing when it is a
>> RPROC_RECOVERY_FAILED senario when it is in rproc_crash_handler_work
>> check, because it maybe crash happened when it is trying to do the recovery.
>> So I suggested to do a continue try of trigger recovery again instead of
>> doing nothing and bail out if it is a RPROC_RECOVERY_FAILED state.
>>
>>>
>>>> Because first PROC_RECOVERY_FAIL case cannot ensure it have pm_relax called
>>>> before the second crash handler call pm_stay_awake or not.
>>>>
>>>
>>> I've been thinking about that part.  I don't think adding a wake_count to
>>> control calls to pm_stay_awake()/pm_relax() is the best way to go.  There is a
>>> similar count happening in the PM runtime subsystem and that is what we should
>>> be using.  I have asked a power management expert at Linaro for guidance with
>>> this matter.  I should be able to get back to you with a way forward by the end
>>> of next week.
>>>
>> Thx for the specific date provided as well. I will wait until your reply
>> for next patchset then.
>>
>>>> So, What about the atomic count along with pm_relax and pm_stay_awake ?
>>>>
>>>> struct rproc{
>>>> ...
>>>> atomic_t wake_count;
>>>> ...
>>>> }
>>>>
>>>> rproc_pm_stay_awake()
>>>> {
>>>>       atomic_inc(&wake_count);
>>>>       pm_stay_awake();
>>>> }
>>>>
>>>> rproc_pm_relax()
>>>> {
>>>>       if (atomic_dec_return(&wake_count) == 0)
>>>>               pm_stay_awake();
>>>> }
>>>>
>>>> can refer code like:
>>>>
>>>> rproc_report_crash()
>>>> {
>>>>       ...
>>>>       rproc_pm_stay_awake();
>>>>       queue_work();
>>>>       ...
>>>> }
>>>>
>>>> rproc_crash_handler_work()
>>>> {
>>>>       ...
>>>>       if (rproc->state == RPROC_OFFLINE || rproc->state == RPROC_CRASHED) {
>>>>                     /* We have raced with rproc_shutdown() */
>>>>                     rproc_pm_relax();
>>>>               mutex_unlock(&rproc->lock);
>>>>               return;
>>>>             }
>>>>       ...
>>>> }
>>>>
>>>>>
>>>>> Thanks,
>>>>> Mathieu
>>>>>
>>>>>>
>>>>>> recovery fail case 1:
>>>>>> |                                      |firstcrash interrupt issued
>>>>>> | second crashed interrupt issued      | rproc_report_crash()
>>>>>> | rproc_report_crash()                 |          pm_stay_awake()
>>>>>> |          pm_stay_awake()             |          queue_work()
>>>>>> |          queue_work()                |rproc_crash_handler_work()
>>>>>> |                                      |mutex_lock(&rproc->lock);
>>>>>> |                                      |rproc_stop()
>>>>>> |rproc_crash_handler_work()            |rproc->state = RPROC_OFFLINE;
>>>>>> |                                      |RPROC_RECOVERY_FAIL //new
>>>>>> |                                      |mutex_unlock(&rproc->lock);
>>>>>> |mutex_lock(&rproc->lock);             |pm_relax()
>>>>>> |if (rproc->state == RPROC_OFFLINE)    |
>>>>>> |return // shouldn't do pm_relax if RPROC_RECOVERY_FAIL?  |
>>>>>> |mutex_unlock(&rproc->lock);           |
>>>>>> |                                      |
>>>>>> |                                      |
>>>>>> |                                      |
>>>>>>
>>>>>> recovery fail case 2:
>>>>>> |                                      |firstcrash interrupt issued
>>>>>> |                                      | rproc_report_crash()
>>>>>> |                                      |          pm_stay_awake()
>>>>>> |                                      |          queue_work()
>>>>>> |                                      |rproc_crash_handler_work()
>>>>>> |                                      |mutex_lock(&rproc->lock);
>>>>>> |                                      |rproc_stop()
>>>>>> |                                      |rproc->state = RPROC_OFFLINE;
>>>>>> |                                      |RPROC_RECOVERY_FAIL //new
>>>>>> |                                      |mutex_unlock(&rproc->lock);
>>>>>> |                                      |pm_relax()
>>>>>> |
>>>>>> | second crashed interrupt issued      |
>>>>>> | rproc_report_crash()                 |
>>>>>> |          pm_stay_awake()             |
>>>>>> |          queue_work()                |
>>>>>> |pm_stay_awake()
>>>>>> |mutex_lock(&rproc->lock);
>>>>>> |if (rproc->state == RPROC_OFFLINE)    |
>>>>>> |return // still need do pm_relax if RPROC_RECOVERY_FAIL?  |
>>>>>> |mutex_unlock(&rproc->lock);           |
>>>>>> |                                      |
>>>>>> |                                      |
>>>>>> |                                      |
>>>>>>
>>>>>> Maybe I can have:
>>>>>> 1. the pm_stay_awake and pm_relax with count based and call with paired for
>>>>>> fix current concurency issue.
>>>>>> 2. RPROC_RECOVERY_FAIL can be another patch for continue try to do recovery
>>>>>> work.
>>>>>> 3. handle RPROC_DETACHED case.
>>>>>>
>>>>>> On 11/2/2022 4:11 AM, Mathieu Poirier wrote:
>>>>>>> On Fri, 28 Oct 2022 at 09:31, Arnaud POULIQUEN
>>>>>>> <arnaud.pouliquen@...s.st.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 10/24/22 05:17, Aiqun(Maria) Yu wrote:
>>>>>>>>> On 10/22/2022 3:34 AM, Mathieu Poirier wrote:
>>>>>>>>>> On Wed, 19 Oct 2022 at 23:52, Aiqun(Maria) Yu <quic_aiquny@...cinc.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 10/14/2022 2:03 AM, Mathieu Poirier wrote:
>>>>>>>>>>>> On Thu, Oct 13, 2022 at 11:34:42AM -0600, Mathieu Poirier wrote:
>>>>>>>>>>>>> On Thu, Oct 13, 2022 at 09:40:09AM +0800, Aiqun(Maria) Yu wrote:
>>>>>>>>>>>>>> Hi Mathieu,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10/13/2022 4:43 AM, Mathieu Poirier wrote:
>>>>>>>>>>>>>>> Please add what has changed from one version to another, either in a cover
>>>>>>>>>>>>>>> letter or after the "Signed-off-by".  There are many examples on how to
>>>>>>>>>>>>>>> do that
>>>>>>>>>>>>>>> on the mailing list.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thx for the information, will take a note and benefit for next time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Sep 16, 2022 at 03:12:31PM +0800, Maria Yu wrote:
>>>>>>>>>>>>>>>> RPROC_OFFLINE state indicate there is no recovery process
>>>>>>>>>>>>>>>> is in progress and no chance to do the pm_relax.
>>>>>>>>>>>>>>>> Because when recovering from crash, rproc->lock is held and
>>>>>>>>>>>>>>>> state is RPROC_CRASHED -> RPROC_OFFLINE -> RPROC_RUNNING,
>>>>>>>>>>>>>>>> and then unlock rproc->lock.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You are correct - because the lock is held rproc->state should be set to
>>>>>>>>>>>>>>> RPROC_RUNNING
>>>>>>>>>>>>>>> when rproc_trigger_recovery() returns.  If that is not the case then
>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>> went wrong.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Function rproc_stop() sets rproc->state to RPROC_OFFLINE just before
>>>>>>>>>>>>>>> returning,
>>>>>>>>>>>>>>> so we know the remote processor was stopped.  Therefore if rproc->state
>>>>>>>>>>>>>>> is set
>>>>>>>>>>>>>>> to RPROC_OFFLINE something went wrong in either request_firmware() or
>>>>>>>>>>>>>>> rproc_start().  Either way the remote processor is offline and the system
>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>> in an unknown/unstable.  As such I don't see how calling pm_relax() can help
>>>>>>>>>>>>>>> things along.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> PROC_OFFLINE is possible that rproc_shutdown is triggered and successfully
>>>>>>>>>>>>>> finished.
>>>>>>>>>>>>>> Even if it is multi crash rproc_crash_handler_work contention issue, and
>>>>>>>>>>>>>> last rproc_trigger_recovery bailed out with only
>>>>>>>>>>>>>> rproc->state==RPROC_OFFLINE, it is still worth to do pm_relax in pair.
>>>>>>>>>>>>>> Since the subsystem may still can be recovered with customer's next trigger
>>>>>>>>>>>>>> of rproc_start, and we can make each error out path clean with pm resources.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I suggest spending time understanding what leads to the failure when
>>>>>>>>>>>>>>> recovering
>>>>>>>>>>>>>>> from a crash and address that problem(s).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In current case, the customer's information is that the issue happened when
>>>>>>>>>>>>>> rproc_shutdown is triggered at similar time. So not an issue from error out
>>>>>>>>>>>>>> of rproc_trigger_recovery.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That is a very important element to consider and should have been mentioned
>>>>>>>>>>>>> from
>>>>>>>>>>>>> the beginning.  What I see happening is the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> rproc_report_crash()
>>>>>>>>>>>>>               pm_stay_awake()
>>>>>>>>>>>>>               queue_work() // current thread is suspended
>>>>>>>>>>>>>
>>>>>>>>>>>>> rproc_shutdown()
>>>>>>>>>>>>>               rproc_stop()
>>>>>>>>>>>>>                       rproc->state = RPROC_OFFLINE;
>>>>>>>>>>>>>
>>>>>>>>>>>>> rproc_crash_handler_work()
>>>>>>>>>>>>>               if (rproc->state == RPROC_OFFLINE)
>>>>>>>>>>>>>                       return // pm_relax() is not called
>>>>>>>>>>>>>
>>>>>>>>>>>>> The right way to fix this is to add a pm_relax() in rproc_shutdown() and
>>>>>>>>>>>>> rproc_detach(), along with a very descriptive comment as to why it is needed.
>>>>>>>>>>>>
>>>>>>>>>>>> Thinking about this further there are more ramifications to consider.  Please
>>>>>>>>>>>> confirm the above scenario is what you are facing.  I will advise on how to
>>>>>>>>>>>> move
>>>>>>>>>>>> forward if that is the case.
>>>>>>>>>>>>
>>>>>>>>>>> Not sure if the situation is clear or not. So resend the email again.
>>>>>>>>>>>
>>>>>>>>>>> The above senario is what customer is facing. crash hanppened while at
>>>>>>>>>>> the same time shutdown is triggered.
>>>>>>>>>>
>>>>>>>>>> Unfortunately this is not enough details to address a problem as
>>>>>>>>>> complex as this one.
>>>>>>>>>>
>>>>>>>>>>> And the device cannto goes to suspend state after that.
>>>>>>>>>>> the subsystem can still be start normally after this.
>>>>>>>>>>
>>>>>>>>>> If the code flow I pasted above reflects the problem at hand, the
>>>>>>>>>> current patch will not be sufficient to address the issue.  If Arnaud
>>>>>>>>>> confirms my suspicions we will have to think about a better solution.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Mathiew,
>>>>>>>>>
>>>>>>>>> Could you pls have more details of any side effects other then power issue of
>>>>>>>>> the current senario?
>>>>>>>>> Why the current patch is not sufficient pls?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Have the current senario in details with rproc->lock information in details:
>>>>>>>>>
>>>>>>>>> | subsystem crashed interrupt issued      | user trigger shutdown
>>>>>>>>> | rproc_report_crash()                    |
>>>>>>>>> |          pm_stay_awake()                |
>>>>>>>>> |          queue_work()                   |
>>>>>>>>> |                                         |rproc_shutdown
>>>>>>>>> |                                         |mutex_lock(&rproc->lock);
>>>>>>>>> |                                         |rproc_stop()
>>>>>>>>> |rproc_crash_handler_work()               |rproc->state = RPROC_OFFLINE;
>>>>>>>>> |                                         |mutex_unlock(&rproc->lock);
>>>>>>>>> |mutex_lock(&rproc->lock);                |
>>>>>>>>> |if (rproc->state == RPROC_OFFLINE)       |
>>>>>>>>> |return // pm_relax() is not called       |rproc_boot
>>>>>>>>> |mutex_unlock(&rproc->lock);              |
>>>>>>>>> |                                         |mutex_lock(&rproc->lock);
>>>>>>>>> |                                         |rproc_start()
>>>>>>>>> |                                         |mutex_unlock(&rproc->lock);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Agree with Mathieu, this is not so simple.
>>>>>>>>
>>>>>>>
>>>>>>> Thanks for looking into this.
>>>>>>>
>>>>>>>> Here is my view  hoping I haven't missed a point in your discussion or
>>>>>>>> an other corner cases.
>>>>>>>>
>>>>>>>> I tried to analyze the issues (in what follows, the term "condition" means
>>>>>>>> the "if" condition in which Aiqun proposes to add the fix) :
>>>>>>>>
>>>>>>>> I can see 4 use cases with race condition
>>>>>>>>
>>>>>>>> 1) crash report while already one is treated (rproc_boot_recovery called)
>>>>>>>>          => not a real use case as if the remote processor is crashed we
>>>>>>>>                   should not have a second crash report
>>>>>>>>
>>>>>>>
>>>>>>> That part is of great concern to me.  *Theoretically* we should not
>>>>>>> get a new crash report while one has already been dispatched but the
>>>>>>> current code accounts for this scenario and as such the possibility
>>>>>>> can't be dismissed.  Therefore we need to expect rproc_report_crash()
>>>>>>> to be called multiple times before a single instance of
>>>>>>> rproc_boot_recovery() is scheduled.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> 2) rproc_stop executed between the queuing of the crash work and the call of
>>>>>>>>       rproc_crash_handler_work
>>>>>>>>        => rproc->state = RPROC_OFFLINE
>>>>>>>>        => we enter in the "condition" and the pm_relax has to be called
>>>>>>>>        => This commit fix should solve this use case
>>>>>>>>
>>>>>>>> 3) rproc_detach executed between the queue of the crash work and the call of
>>>>>>>>       rproc_crash_handler_work
>>>>>>>>        => rproc->state = RPROC_DETACHED;
>>>>>>>>        => we don't go in "the condition" and issue because the recovery reattach
>>>>>>>>           to the remote processor
>>>>>>>>        => but pm_relax is called
>>>>>>>>        => probably need an extra fix to avoid to re-attach
>>>>>>>>
>>>>>>>> 4) crash report while already one is treated (rproc_attach_recovery called)
>>>>>>>>        this one corresponds to an auto reboot of the remote processor, with a
>>>>>>>>        new crash
>>>>>>>>        => rproc->state = RPROC_CRASHED or rproc->state = RPROC_DETACHED;
>>>>>>>>        4)a) rproc->state = RPROC_CRASHED if rproc->recovery_disabled = true
>>>>>>>>             => should call pm_relax if rproc->recovery_disabled = true
>>>>>>>>             => commit does not work for this use case
>>>>>>>>
>>>>>>>>        4)b) rproc->state = RPROC_DETACHED if recovery fails
>>>>>>>>            => error case with an unstable state
>>>>>>>>            => how to differentiate it from the use case 3) ?
>>>>>>>>            => introduce a RPROC_RECOVERY_FAIL state?
>>>>>>>>
>>>>>>>
>>>>>>> The case where a recovery fails needs to be considered and is the
>>>>>>> reason the original patch doesn't work.  Right now in
>>>>>>> rproc_crash_handler_work(), it is not possible to differentiate
>>>>>>> between a legitimate shutdown request (scenario #2 above) and a
>>>>>>> recovery that went wrong.  I think introducing RPROC_RECOVERY_FAIL
>>>>>>> would greatly simplify things.
>>>>>>>
>>>>>>> My initial evaluation had not considered the attach/detach scenarios -
>>>>>>> thanks for adding that in the mix.
>>>>>>>
>>>>>>> Aiqun, please send a new patchset that adds a new remote processor
>>>>>>> state, i.e RPROC_RECOVERY_FAIL.  There should also be another patch in
>>>>>>> that set that takes attach/detach scenarios into account.  The code
>>>>>>> between the v6.0 and v6.1 cycle has changed a lot in that area so make
>>>>>>> sure to properly rebase.
>>>>>>>
>>>>>> I will try.
>>>>>>
>>>>>>>>
>>>>>>>> Then pm_stay_awake is called when the crash work is queued.
>>>>>>>> It seems to me coherent to call the pm_relax in the work handler.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Here is a quick and dirty patch (not tested) that should take into account the
>>>>>>>> main use cases ( except 1) and 4)b) )
>>>>>>>>
>>>>>>>> @@ -2009,8 +2009,18 @@ static void rproc_crash_handler_work(struct work_struct *work)
>>>>>>>>
>>>>>>>>             mutex_lock(&rproc->lock);
>>>>>>>>
>>>>>>>> -       if (rproc->state == RPROC_CRASHED || rproc->state == RPROC_OFFLINE) {
>>>>>>>> +       if (rproc->state == RPROC_CRASHED || rproc->state == RPROC_OFFLINE ||
>>>>>>>> +           rproc->state == RPROC_DETACHED) {
>>>>>>>>                     /* handle only the first crash detected */
>>>>>>>> +
>>>>>>>> +               /*
>>>>>>>> +                * call pm-relax in following use cases:
>>>>>>>> +                * - the remote processor has been stopped by the user
>>>>>>>> +                * - the remote processor is detached
>>>>>>>> +                + - the remote proc has an autonomous reset but recovery_disabled is true.
>>>>>>>> +                */
>>>>>>>> +               if(rproc->state != RPROC_CRASHED || rproc->recovery_disabled)
>>>>>>>> +                       pm_relax(rproc->dev.parent);
>>>>>>>>                     mutex_unlock(&rproc->lock);
>>>>>>>>                     return;
>>>>>>>>             }
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Arnaud
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Mathieu
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> When the state is in RPROC_OFFLINE it means separate request
>>>>>>>>>>>>>>>> of rproc_stop was done and no need to hold the wakeup source
>>>>>>>>>>>>>>>> in crash handler to recover any more.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Maria Yu <quic_aiquny@...cinc.com>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>         drivers/remoteproc/remoteproc_core.c | 11 +++++++++++
>>>>>>>>>>>>>>>>         1 file changed, 11 insertions(+)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> diff --git a/drivers/remoteproc/remoteproc_core.c
>>>>>>>>>>>>>>>> b/drivers/remoteproc/remoteproc_core.c
>>>>>>>>>>>>>>>> index e5279ed9a8d7..6bc7b8b7d01e 100644
>>>>>>>>>>>>>>>> --- a/drivers/remoteproc/remoteproc_core.c
>>>>>>>>>>>>>>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>>>>>>>>>>>>>>> @@ -1956,6 +1956,17 @@ static void rproc_crash_handler_work(struct
>>>>>>>>>>>>>>>> work_struct *work)
>>>>>>>>>>>>>>>>                if (rproc->state == RPROC_CRASHED || rproc->state ==
>>>>>>>>>>>>>>>> RPROC_OFFLINE) {
>>>>>>>>>>>>>>>>                        /* handle only the first crash detected */
>>>>>>>>>>>>>>>>                        mutex_unlock(&rproc->lock);
>>>>>>>>>>>>>>>> +         /*
>>>>>>>>>>>>>>>> +          * RPROC_OFFLINE state indicate there is no recovery process
>>>>>>>>>>>>>>>> +          * is in progress and no chance to have pm_relax in place.
>>>>>>>>>>>>>>>> +          * Because when recovering from crash, rproc->lock is held and
>>>>>>>>>>>>>>>> +          * state is RPROC_CRASHED -> RPROC_OFFLINE -> RPROC_RUNNING,
>>>>>>>>>>>>>>>> +          * and then unlock rproc->lock.
>>>>>>>>>>>>>>>> +          * RPROC_OFFLINE is only an intermediate state in recovery
>>>>>>>>>>>>>>>> +          * process.
>>>>>>>>>>>>>>>> +          */
>>>>>>>>>>>>>>>> +         if (rproc->state == RPROC_OFFLINE)
>>>>>>>>>>>>>>>> +                 pm_relax(rproc->dev.parent);
>>>>>>>>>>>>>>>>                        return;
>>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> 2.7.4
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Thx and BRs,
>>>>>>>>>>>>>> Aiqun(Maria) Yu
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Thx and BRs,
>>>>>>>>>>> Aiqun(Maria) Yu
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thx and BRs,
>>>>>> Aiqun(Maria) Yu
>>>>
>>>>
>>>> --
>>>> Thx and BRs,
>>>> Aiqun(Maria) Yu
>>
>>
>> --
>> Thx and BRs,
>> Aiqun(Maria) Yu


-- 
Thx and BRs,
Aiqun(Maria) Yu