[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhsmh8ro2d4du.mognet@vschneid.remote.csb>
Date: Fri, 05 Aug 2022 17:47:09 +0100
From: Valentin Schneider <vschneid@...hat.com>
To: Lai Jiangshan <jiangshanlai@...il.com>
Cc: LKML <linux-kernel@...r.kernel.org>, Tejun Heo <tj@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Frederic Weisbecker <frederic@...nel.org>,
Juri Lelli <juri.lelli@...hat.com>,
Phil Auld <pauld@...hat.com>,
Marcelo Tosatti <mtosatti@...hat.com>
Subject: Re: [RFC PATCH v3 2/3] workqueue: Unbind workers before sending
them to exit()
On 05/08/22 11:16, Lai Jiangshan wrote:
> On Tue, Aug 2, 2022 at 4:42 PM Valentin Schneider <vschneid@...hat.com> wrote:
>> +/*
>> + * Unlikely as it may be, a worker could wake after destroy_worker() has
>> + * happened but before reap_workers(). WORKER_DIE would be set in worker->flags,
>> + * so it would be able to kfree(worker) and head out to do_exit().
>> + *
>> + * Rather than make the reaper wait for each to-be-reaped kworker to exit and
>> + * kfree(worker) itself, make the kworkers (which have nothing to do but go
>> + * do_exit() anyway) wait for the reaper to be done with them.
>> + */
>> +static void worker_wait_reaped(struct worker *worker)
>> +{
>> + WARN_ON_ONCE(current != worker->task);
>> +
>> + for (;;) {
>> + set_current_state(TASK_INTERRUPTIBLE);
>> + if (READ_ONCE(worker->reaped))
>> + break;
>> + schedule();
>> + }
>> + __set_current_state(TASK_RUNNING);
>> +}
>
>
> It is not a good idea to add this scheduler-ist code here.
>
> Using wq_pool_attach_mutex to protects the whole body of idle_reaper_fn()
> can stop the worker from freeing itself since the worker has to
> get the mutex before exiting.
>
Right, there's worker_detach_from_pool() before kfree(worker), hadn't
thought of that. I want to limit how many locks I'm hoarding with the
reaper, but given that one is for attach/detach I think that's OK - and I
also really don't like this worker_wait_reaped() function, so will be happy
to get rid of it. I'll give this a try, thanks!
> And I don't think batching destruction is a good idea since
> it is not a hot path.
>
The batching is mostly there because checking & removing a worker from its
pool->idle_list has to be done under pool->lock, but changing its affinity
requires a sleepable context, so I batched that outside of the spinlock
section.
>> while (too_many_workers(pool)) {
>> - struct worker *worker;
>> unsigned long expires;
>> + unsigned long now = jiffies;
>>
>> /* idle_list is kept in LIFO order, check the last one */
>> worker = list_entry(pool->idle_list.prev, struct worker, entry);
>> expires = worker->last_active + IDLE_WORKER_TIMEOUT;
>>
>> - if (time_before(jiffies, expires)) {
>> - mod_timer(&pool->idle_timer, expires);
>> + /*
>> + * Careful: queueing a work item from here can and will cause a
>> + * self-deadlock when dealing with an unbound pool. However,
>> + * here the delay *cannot* be zero and *has* to be in the
>> + * future, which works.
>> + */
>> + if (time_before(now, expires)) {
>
> IMHO, using raw_spin_unlock_irq(&pool->lock) here is better than
> violating locking rules *overtly* and documenting that it can not be
> really violated. But It would bring a "goto" statement.
I was worried about serializing accesses to pool->idle_reaper_work and its
underlying timer (worker_enter_idle() vs idle_reaper_fn()), though I think
the worst that can happen if idle_reaper_fn() does that without holding
pool->lock is worker_enter_idle() pushing back the timer to
IDLE_WORKER_TIMEOUT (rather than (last_active + IDLE_WORKER_TIMEOUT) -
now).
>> + mod_delayed_work(system_unbound_wq,
>> + &pool->idle_reaper_work,
>> + expires - now);
>> break;
>> }
>> @@ -5030,11 +5128,8 @@ static void rebind_workers(struct worker_pool *pool)
>> * of all workers first and then clear UNBOUND. As we're called
>> * from CPU_ONLINE, the following shouldn't fail.
>> */
>> - for_each_pool_worker(worker, pool) {
>> - kthread_set_per_cpu(worker->task, pool->cpu);
>> - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
>> - pool->attrs->cpumask) < 0);
>> - }
>> + for_each_pool_worker(worker, pool)
>> + rebind_worker(worker, pool);
>
>
> It is better to skip the workers which are WORKER_DIE.
> Or just detach the worker when reaping it.
Hadn't even thought about this racing with to-be-destroyed workers. Having
worker_detach_from_pool() done by the worker itself is convenient for the
serialization with wq_pool_attach_mutex as you suggested, let me scratch my
head some more.
>
>>
>> raw_spin_lock_irq(&pool->lock);
>>
>> diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
>> index e00b1204a8e9..a3d60e10a76f 100644
>> --- a/kernel/workqueue_internal.h
>> +++ b/kernel/workqueue_internal.h
>> @@ -46,6 +46,7 @@ struct worker {
>> unsigned int flags; /* X: flags */
>> int id; /* I: worker id */
>> int sleeping; /* None */
>> + int reaped; /* None */
>>
>> /*
>> * Opaque string set with work_set_desc(). Printed out with task
>> --
>> 2.31.1
>>
Powered by blists - more mailing lists