linux-kernel - Re: [PATCH] remoteproc: Create a separate workqueue for recovery tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87c3f902b94bc243fc28e0ce79303dd4@codeaurora.org>
Date:   Thu, 17 Dec 2020 10:21:20 -0800
From:   rishabhb@...eaurora.org
To:     Alex Elder <elder@...aro.org>
Cc:     Bjorn Andersson <bjorn.andersson@...aro.org>,
        linux-remoteproc@...r.kernel.org, linux-kernel@...r.kernel.org,
        tsoni@...eaurora.org, psodagud@...eaurora.org,
        sidgup@...eaurora.org
Subject: Re: [PATCH] remoteproc: Create a separate workqueue for recovery
 tasks

On 2020-12-17 08:12, Alex Elder wrote:
> On 12/15/20 4:55 PM, Bjorn Andersson wrote:
>> On Sat 12 Dec 14:48 CST 2020, Rishabh Bhatnagar wrote:
>> 
>>> Create an unbound high priority workqueue for recovery tasks.
> 
> I have been looking at a different issue that is caused by
> crash notification.
> 
> What happened was that the modem crashed while the AP was
> in system suspend (or possibly even resuming) state.  And
> there is no guarantee that the system will have called a
> driver's ->resume callback when the crash notification is
> delivered.
> 
> In my case (in the IPA driver), handling a modem crash
> cannot be done while the driver is suspended; i.e. the
> activities in its ->resume callback must be completed
> before we can recover from the crash.
> 
> For this reason I might like to change the way the
> crash notification is handled, but what I'd rather see
> is to have the work queue not run until user space
> is unfrozen, which would guarantee that all drivers
> that have registered for a crash notification will
> be resumed when the notification arrives.
> 
> I'm not sure how that interacts with what you are
> looking for here.  I think the workqueue could still
> be unbound, but its work would be delayed longer before
> any notification (and recovery) started.
> 
> 					-Alex
> 
> 
In that case, maybe adding a "WQ_FREEZABLE" flag might help?
> 
>> This simply repeats $subject
>> 
>>> Recovery time is an important parameter for a subsystem and there
>>> might be situations where multiple subsystems crash around the same
>>> time.  Scheduling into an unbound workqueue increases parallelization
>>> and avoids time impact.
>> 
>> You should be able to write this more succinctly. The important part 
>> is
>> that you want an unbound work queue to allow recovery to happen in
>> parallel - which naturally implies that you care about recovery 
>> latency.
>> 
>>> Also creating a high priority workqueue
>>> will utilize separate worker threads with higher nice values than
>>> normal ones.
>>> 
>> 
>> This doesn't describe why you need the higher priority.
>> 
>> 
>> I believe, and certainly with the in-line coredump, that we're running
>> our recovery work for way too long to be queued on the system_wq. As
>> such the content of the patch looks good!
>> 
>> Regards,
>> Bjorn
>> 
>>> Signed-off-by: Rishabh Bhatnagar <rishabhb@...eaurora.org>
>>> ---
>>>   drivers/remoteproc/remoteproc_core.c | 9 ++++++++-
>>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/drivers/remoteproc/remoteproc_core.c 
>>> b/drivers/remoteproc/remoteproc_core.c
>>> index 46c2937..8fd8166 100644
>>> --- a/drivers/remoteproc/remoteproc_core.c
>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>> @@ -48,6 +48,8 @@ static DEFINE_MUTEX(rproc_list_mutex);
>>>   static LIST_HEAD(rproc_list);
>>>   static struct notifier_block rproc_panic_nb;
>>>   +static struct workqueue_struct *rproc_wq;
>>> +
>>>   typedef int (*rproc_handle_resource_t)(struct rproc *rproc,
>>>   				 void *, int offset, int avail);
>>>   @@ -2475,7 +2477,7 @@ void rproc_report_crash(struct rproc *rproc, 
>>> enum rproc_crash_type type)
>>>   		rproc->name, rproc_crash_to_string(type));
>>>     	/* create a new task to handle the error */
>>> -	schedule_work(&rproc->crash_handler);
>>> +	queue_work(rproc_wq, &rproc->crash_handler);
>>>   }
>>>   EXPORT_SYMBOL(rproc_report_crash);
>>>   @@ -2520,6 +2522,10 @@ static void __exit rproc_exit_panic(void)
>>>     static int __init remoteproc_init(void)
>>>   {
>>> +	rproc_wq = alloc_workqueue("rproc_wq", WQ_UNBOUND | WQ_HIGHPRI, 0);
>>> +	if (!rproc_wq)
>>> +		return -ENOMEM;
>>> +
>>>   	rproc_init_sysfs();
>>>   	rproc_init_debugfs();
>>>   	rproc_init_cdev();
>>> @@ -2536,6 +2542,7 @@ static void __exit remoteproc_exit(void)
>>>   	rproc_exit_panic();
>>>   	rproc_exit_debugfs();
>>>   	rproc_exit_sysfs();
>>> +	destroy_workqueue(rproc_wq);
>>>   }
>>>   module_exit(remoteproc_exit);
>>>   -- The Qualcomm Innovation Center, Inc. is a member of the Code 
>>> Aurora Forum,
>>> a Linux Foundation Collaborative Project
>>>