lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 16 Aug 2022 11:46:11 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Lai Jiangshan <jiangshanlai@...il.com>
Cc:     linux-kernel@...r.kernel.org,
        Lai Jiangshan <jiangshan.ljs@...group.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Petr Mladek <pmladek@...e.com>, Michal Hocko <mhocko@...e.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Wedson Almeida Filho <wedsonaf@...gle.com>,
        Waiman Long <longman@...hat.com>
Subject: Re: [RFC PATCH 2/8] workqueue: Make create_worker() safe against
 prematurely wakeups

On Thu, Aug 04, 2022 at 04:41:29PM +0800, Lai Jiangshan wrote:
...
> Any kthread is supposed to stay in TASK_UNINTERRUPTIBLE sleep
> until it is explicitly woken. But a spurious wakeup might
> break this expectation.

I'd rephrase the above. It's more that we can't assume that a sleeping task
will stay sleeping and should expect spurious wakeups.

> @@ -176,6 +176,7 @@ struct worker_pool {
>  						/* L: hash of busy workers */
>  
>  	struct worker		*manager;	/* L: purely informational */
> +	struct completion	created;	/* create_worker(): worker created */

Can we define sth like worker_create_args struct which contains a
completion, pointers to the worker and pool and use an on-stack instance to
carry the create parameters to the new worker thread? It's kinda odd to have
a persistent copy of completion in the pool

> @@ -1949,6 +1951,7 @@ static struct worker *create_worker(struct worker_pool *pool)
>  	else
>  		snprintf(id_buf, sizeof(id_buf), "u%d:%d", pool->id, id);
>  
> +	reinit_completion(&pool->created);

which keeps getting reinitialized.

> @@ -2383,10 +2380,17 @@ static int worker_thread(void *__worker)
>  	struct worker *worker = __worker;
>  	struct worker_pool *pool = worker->pool;
>  
> +	/* attach the worker to the pool */
> +	worker_attach_to_pool(worker, pool);

It's also odd for the new worker to have pool already set and then we attach
to that pool.

> @@ -37,8 +37,15 @@ struct worker {
>  	/* 64 bytes boundary on 64bit, 32 on 32bit */
>  
>  	struct task_struct	*task;		/* I: worker task */
> -	struct worker_pool	*pool;		/* A: the associated pool */
> -						/* L: for rescuers */
> +
> +	/*
> +	 * The associated pool, locking rules:
> +	 *   PF_WQ_WORKER: from the current worker
> +	 *   PF_WQ_WORKER && wq_pool_attach_mutex: from remote tasks
> +	 *   None: from the current worker when the worker is coming up
> +	 */
> +	struct worker_pool	*pool;

I have a difficult time understanding the above comment. Can you please
follow the same style as others?

I was hoping that this problem would be fixed through kthread changes but
that doesn't seem to have happened yet and given that we need to keep
modifying cpumasks dynamically anyway (e.g. for unbound pool config change),
solving it from wq side is fine too especially if we can leverage the same
code paths that the dynamic changes are using.

That said, some of complexities are from CPU hotplug messing with worker
cpumasks and wq trying to restore them and it seems likely that all these
will be simpler with the persistent cpumask that Waiman is working on. Lai,
can you please take a look at that patchset?

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ