[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87c3f902b94bc243fc28e0ce79303dd4@codeaurora.org>
Date: Thu, 17 Dec 2020 10:21:20 -0800
From: rishabhb@...eaurora.org
To: Alex Elder <elder@...aro.org>
Cc: Bjorn Andersson <bjorn.andersson@...aro.org>,
linux-remoteproc@...r.kernel.org, linux-kernel@...r.kernel.org,
tsoni@...eaurora.org, psodagud@...eaurora.org,
sidgup@...eaurora.org
Subject: Re: [PATCH] remoteproc: Create a separate workqueue for recovery
tasks
On 2020-12-17 08:12, Alex Elder wrote:
> On 12/15/20 4:55 PM, Bjorn Andersson wrote:
>> On Sat 12 Dec 14:48 CST 2020, Rishabh Bhatnagar wrote:
>>
>>> Create an unbound high priority workqueue for recovery tasks.
>
> I have been looking at a different issue that is caused by
> crash notification.
>
> What happened was that the modem crashed while the AP was
> in system suspend (or possibly even resuming) state. And
> there is no guarantee that the system will have called a
> driver's ->resume callback when the crash notification is
> delivered.
>
> In my case (in the IPA driver), handling a modem crash
> cannot be done while the driver is suspended; i.e. the
> activities in its ->resume callback must be completed
> before we can recover from the crash.
>
> For this reason I might like to change the way the
> crash notification is handled, but what I'd rather see
> is to have the work queue not run until user space
> is unfrozen, which would guarantee that all drivers
> that have registered for a crash notification will
> be resumed when the notification arrives.
>
> I'm not sure how that interacts with what you are
> looking for here. I think the workqueue could still
> be unbound, but its work would be delayed longer before
> any notification (and recovery) started.
>
> -Alex
>
>
In that case, maybe adding a "WQ_FREEZABLE" flag might help?
>
>> This simply repeats $subject
>>
>>> Recovery time is an important parameter for a subsystem and there
>>> might be situations where multiple subsystems crash around the same
>>> time. Scheduling into an unbound workqueue increases parallelization
>>> and avoids time impact.
>>
>> You should be able to write this more succinctly. The important part
>> is
>> that you want an unbound work queue to allow recovery to happen in
>> parallel - which naturally implies that you care about recovery
>> latency.
>>
>>> Also creating a high priority workqueue
>>> will utilize separate worker threads with higher nice values than
>>> normal ones.
>>>
>>
>> This doesn't describe why you need the higher priority.
>>
>>
>> I believe, and certainly with the in-line coredump, that we're running
>> our recovery work for way too long to be queued on the system_wq. As
>> such the content of the patch looks good!
>>
>> Regards,
>> Bjorn
>>
>>> Signed-off-by: Rishabh Bhatnagar <rishabhb@...eaurora.org>
>>> ---
>>> drivers/remoteproc/remoteproc_core.c | 9 ++++++++-
>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_core.c
>>> b/drivers/remoteproc/remoteproc_core.c
>>> index 46c2937..8fd8166 100644
>>> --- a/drivers/remoteproc/remoteproc_core.c
>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>> @@ -48,6 +48,8 @@ static DEFINE_MUTEX(rproc_list_mutex);
>>> static LIST_HEAD(rproc_list);
>>> static struct notifier_block rproc_panic_nb;
>>> +static struct workqueue_struct *rproc_wq;
>>> +
>>> typedef int (*rproc_handle_resource_t)(struct rproc *rproc,
>>> void *, int offset, int avail);
>>> @@ -2475,7 +2477,7 @@ void rproc_report_crash(struct rproc *rproc,
>>> enum rproc_crash_type type)
>>> rproc->name, rproc_crash_to_string(type));
>>> /* create a new task to handle the error */
>>> - schedule_work(&rproc->crash_handler);
>>> + queue_work(rproc_wq, &rproc->crash_handler);
>>> }
>>> EXPORT_SYMBOL(rproc_report_crash);
>>> @@ -2520,6 +2522,10 @@ static void __exit rproc_exit_panic(void)
>>> static int __init remoteproc_init(void)
>>> {
>>> + rproc_wq = alloc_workqueue("rproc_wq", WQ_UNBOUND | WQ_HIGHPRI, 0);
>>> + if (!rproc_wq)
>>> + return -ENOMEM;
>>> +
>>> rproc_init_sysfs();
>>> rproc_init_debugfs();
>>> rproc_init_cdev();
>>> @@ -2536,6 +2542,7 @@ static void __exit remoteproc_exit(void)
>>> rproc_exit_panic();
>>> rproc_exit_debugfs();
>>> rproc_exit_sysfs();
>>> + destroy_workqueue(rproc_wq);
>>> }
>>> module_exit(remoteproc_exit);
>>> -- The Qualcomm Innovation Center, Inc. is a member of the Code
>>> Aurora Forum,
>>> a Linux Foundation Collaborative Project
>>>
Powered by blists - more mailing lists