[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f708d462-d863-ed3c-159d-acd5bffc6cba@codeaurora.org>
Date: Fri, 4 Jun 2021 13:46:58 -0700
From: Siddharth Gupta <sidgup@...eaurora.org>
To: Bjorn Andersson <bjorn.andersson@...aro.org>,
Alex Elder <elder@...aro.org>
Cc: ohad@...ery.com, mathieu.poirier@...aro.org,
linux-remoteproc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] remoteproc: use freezable workqueue for crash
notifications
On 5/28/2021 5:12 PM, Siddharth Gupta wrote:
>
> On 5/27/2021 8:55 PM, Bjorn Andersson wrote:
>> On Wed 19 May 18:44 CDT 2021, Alex Elder wrote:
>>
>>> When a remoteproc has crashed, rproc_report_crash() is called to
>>> handle whatever recovery is desired. This can happen at almost any
>>> time, often triggered by an interrupt, though it can also be
>>> initiated by a write to debugfs file remoteproc/remoteproc*/crash.
>>>
>>> When a crash is reported, the crash handler worker is scheduled to
>>> run (rproc_crash_handler_work()). One thing that worker does is
>>> call rproc_trigger_recovery(), which calls rproc_stop(). That calls
>>> the ->stop method for any remoteproc subdevices before making the
>>> remote processor go offline.
>>>
>>> The Q6V5 modem remoteproc driver implements an SSR subdevice that
>>> notifies registered drivers when the modem changes operational state
>>> (prepare, started, stop/crash, unprepared). The IPA driver
>>> registers to receive these notifications.
>>>
>>> With that as context, I'll now describe the problem.
>>>
>>> There was a situation in which buggy modem firmware led to a modem
>>> crash very soon after system (AP) resume had begun. The crash caused
>>> a remoteproc SSR crash notification to be sent to the IPA driver.
>>> The problem was that, although system resume had begun, it had not
>>> yet completed, and the IPA driver was still in a suspended state.
>>>
>>> This scenario could happen to any driver that registers for these
>>> SSR notifications, because they are delivered without knowledge of
>>> the (suspend) state of registered recipient drivers.
>>>
>>> This patch offers a simple fix for this, by having the crash
>>> handling worker function run on the system freezable workqueue.
>>> This workqueue does not operate if user space is frozen (for
>>> suspend). As a result, the SSR subdevice only delivers its
>>> crash notification when the system is fully operational (i.e.,
>>> neither suspended nor in suspend/resume transition).
>>>
>> This makes sense to me; both that it ensures that we spend our resources
>> on the actual system resume and that it avoids surprises from this
>> happening while the system still is in a funky state...
>>
>> Reviewed-by: Bjorn Andersson <bjorn.andersson@...aro.org>
>>
>> But it would be nice to get some input from other users of the
>> framework.
> This patch sounds like a good idea for cases where the
> request_firmware() APIs fallback to userspace firmware loading.
>
> Will test out this patch and report back.
>
> Thanks,
> Sid
I was able to test out this change with one of our usecases, no
issues to report.
Could you please CC stable as well?
Thanks,
Sid
Tested-by: Siddharth Gupta <sidgup@...eaurora.org>
>>
>> Regards,
>> Bjorn
>>
>>> Signed-off-by: Alex Elder <elder@...aro.org>
>>> ---
>>> drivers/remoteproc/remoteproc_core.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_core.c
>>> b/drivers/remoteproc/remoteproc_core.c
>>> index 39cf44cb08035..6bedf2d2af239 100644
>>> --- a/drivers/remoteproc/remoteproc_core.c
>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>> @@ -2724,8 +2724,8 @@ void rproc_report_crash(struct rproc *rproc,
>>> enum rproc_crash_type type)
>>> dev_err(&rproc->dev, "crash detected in %s: type %s\n",
>>> rproc->name, rproc_crash_to_string(type));
>>> - /* create a new task to handle the error */
>>> - schedule_work(&rproc->crash_handler);
>>> + /* Have a worker handle the error; ensure system is not
>>> suspended */
>>> + queue_work(system_freezable_wq, &rproc->crash_handler);
>>> }
>>> EXPORT_SYMBOL(rproc_report_crash);
>>> --
>>> 2.27.0
>>>
Powered by blists - more mailing lists