[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhsmhv8ldck2a.mognet@vschneid.remote.csb>
Date: Wed, 11 Jan 2023 12:49:49 +0000
From: Valentin Schneider <vschneid@...hat.com>
To: Tejun Heo <tj@...nel.org>
Cc: linux-kernel@...r.kernel.org,
Lai Jiangshan <jiangshanlai@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Frederic Weisbecker <frederic@...nel.org>,
Juri Lelli <juri.lelli@...hat.com>,
Phil Auld <pauld@...hat.com>,
Marcelo Tosatti <mtosatti@...hat.com>
Subject: Re: [PATCH v7 4/4] workqueue: Unbind kworkers before sending them
to exit()
On 10/01/23 10:28, Tejun Heo wrote:
> Hello,
>
> The series generally looks good to me. Just one thing.
>
> On Mon, Jan 09, 2023 at 01:33:16PM +0000, Valentin Schneider wrote:
>> @@ -3658,13 +3702,24 @@ static void put_unbound_pool(struct worker_pool *pool)
>> TASK_UNINTERRUPTIBLE);
>> pool->flags |= POOL_MANAGER_ACTIVE;
>>
>> + /*
>> + * We need to hold wq_pool_attach_mutex() while destroying the workers,
>> + * but we can't grab it in rcuwait_wait_event() as it can clobber
>> + * current's task state. We can drop pool->lock here as we've set
>> + * POOL_MANAGER_ACTIVE, no one else can steal our manager position.
>> + */
>> + raw_spin_unlock_irq(&pool->lock);
>> + mutex_lock(&wq_pool_attach_mutex);
>> + raw_spin_lock_irq(&pool->lock);
>
> The original pattern was a bit weird to begin with and this makes it quite
> worse.
That it does!
> Let's do something more straight forward like:
>
> while (true) {
> rcuwait_wait_event(&manager_wait,
> !(pool->flags & POOL_MANAGER_ACTIVE),
> TASK_UNINTERRUPTIBLE);
> mutex_lock(&wq_pool_attach_mutex);
> raw_spin_lock_irq(&pool->lock);
> if (!(pool->flags & POOL_MANAGER_ACTIVE)) {
> pool->flags |= POOL_MANAGER_ACTIVE;
> break;
> }
> raw_spin_unlock_irq(&pool->lock);
> mutex_unlock(&wq_pool_attach_mutex);
> }
>
That should do the trick, I'll go test it out.
While we're here, for my own education I was trying to figure out in what
scenarios we can hit this manager-already-active condition. When sending
out v6 I had convinced myself it could happen during failed
initialization of a new unbound pool, but having another look at it now I'm
not so sure anymore.
The only scenario I can think of now is around maybe_create_worker()'s
release of pool->lock, as that implies another worker can drain the
pool->worklist and thus let pool->refcnt reach 0 while another worker is
being the pool manager. Am I looking at the right thing?
Thanks
Powered by blists - more mailing lists