linux-kernel - Re: [PATCH v7 4/4] workqueue: Unbind kworkers before sending them to exit()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmhv8ldck2a.mognet@vschneid.remote.csb>
Date:   Wed, 11 Jan 2023 12:49:49 +0000
From:   Valentin Schneider <vschneid@...hat.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     linux-kernel@...r.kernel.org,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Phil Auld <pauld@...hat.com>,
        Marcelo Tosatti <mtosatti@...hat.com>
Subject: Re: [PATCH v7 4/4] workqueue: Unbind kworkers before sending them
 to exit()

On 10/01/23 10:28, Tejun Heo wrote:
> Hello,
>
> The series generally looks good to me. Just one thing.
>
> On Mon, Jan 09, 2023 at 01:33:16PM +0000, Valentin Schneider wrote:
>> @@ -3658,13 +3702,24 @@ static void put_unbound_pool(struct worker_pool *pool)
>>  			   TASK_UNINTERRUPTIBLE);
>>  	pool->flags |= POOL_MANAGER_ACTIVE;
>>  
>> +	/*
>> +	 * We need to hold wq_pool_attach_mutex() while destroying the workers,
>> +	 * but we can't grab it in rcuwait_wait_event() as it can clobber
>> +	 * current's task state. We can drop pool->lock here as we've set
>> +	 * POOL_MANAGER_ACTIVE, no one else can steal our manager position.
>> +	 */
>> +	raw_spin_unlock_irq(&pool->lock);
>> +	mutex_lock(&wq_pool_attach_mutex);
>> +	raw_spin_lock_irq(&pool->lock);
>
> The original pattern was a bit weird to begin with and this makes it quite
> worse.

That it does!

> Let's do something more straight forward like:
>
>         while (true) {
>                 rcuwait_wait_event(&manager_wait,
>                                    !(pool->flags & POOL_MANAGER_ACTIVE),
>                                    TASK_UNINTERRUPTIBLE);
>                 mutex_lock(&wq_pool_attach_mutex);
>                 raw_spin_lock_irq(&pool->lock);
>                 if (!(pool->flags & POOL_MANAGER_ACTIVE)) {
>                         pool->flags |= POOL_MANAGER_ACTIVE;
>                         break;
>                 }
>                 raw_spin_unlock_irq(&pool->lock);
>                 mutex_unlock(&wq_pool_attach_mutex);
>         }
>

That should do the trick, I'll go test it out.


While we're here, for my own education I was trying to figure out in what
scenarios we can hit this manager-already-active condition. When sending
out v6 I had convinced myself it could happen during failed
initialization of a new unbound pool, but having another look at it now I'm
not so sure anymore.

The only scenario I can think of now is around maybe_create_worker()'s
release of pool->lock, as that implies another worker can drain the
pool->worklist and thus let pool->refcnt reach 0 while another worker is
being the pool manager. Am I looking at the right thing?

Thanks