lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200528143519.GN83516@mtj.thefacebook.com>
Date:   Thu, 28 May 2020 10:35:19 -0400
From:   Tejun Heo <tj@...nel.org>
To:     Lai Jiangshan <laijs@...ux.alibaba.com>
Cc:     linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [PATCH 1/2] workqueue: pin the pool while it is managing

Hello,

On Thu, May 28, 2020 at 03:06:55AM +0000, Lai Jiangshan wrote:
> @@ -2129,10 +2128,21 @@ __acquires(&pool->lock)
>  static bool manage_workers(struct worker *worker)
>  {
>  	struct worker_pool *pool = worker->pool;
> +	struct work_struct *work = list_first_entry(&pool->worklist,
> +					struct work_struct, entry);

I'm not sure about this. It's depending on an external condition (active
work item) which isn't obvious and when that condition breaks the resulting
bug will be one which is difficult to reproduce. Adding to that, pwq isn't
even the object this code path is interested in, which is the cause of the
previous problem too.

> @@ -2140,7 +2150,7 @@ static bool manage_workers(struct worker *worker)
>  
>  	pool->manager = NULL;
>  	pool->flags &= ~POOL_MANAGER_ACTIVE;
> -	wake_up(&wq_manager_wait);
> +	put_pwq(pwq);

So, this works only because pwq release bounces through another work item,
so even if a worker of the pool which is currently being destroyed initiates
the release of the containing pool, it still works out, because by the time
the async release path kicks in and grabs the pool lock, everything should
be idle.

I get that this can work but it's sitting on top of a bunch of subtleties.
The current code is more verbose but also significantly more explicit and
straight-forward. I'd rather keep the current behavior unless we can get rid
of the subtleties.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ