lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210105131737.GH3040@hirez.programming.kicks-ass.net>
Date:   Tue, 5 Jan 2021 14:17:37 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Lai Jiangshan <jiangshanlai@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Qian Cai <cai@...hat.com>,
        Vincent Donnefort <vincent.donnefort@....com>,
        Dexuan Cui <decui@...rosoft.com>,
        Lai Jiangshan <laijs@...ux.alibaba.com>,
        Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask

On Tue, Jan 05, 2021 at 04:23:44PM +0800, Lai Jiangshan wrote:
> On Tue, Jan 5, 2021 at 10:41 AM Lai Jiangshan <jiangshanlai@...il.com> wrote:
> > On Mon, Jan 4, 2021 at 9:56 PM Peter Zijlstra <peterz@...radead.org> wrote:
> > > On Sat, Dec 26, 2020 at 10:51:11AM +0800, Lai Jiangshan wrote:
> > > > From: Lai Jiangshan <laijs@...ux.alibaba.com>
> > > >
> > > > wq_online_cpumask is the cached result of cpu_online_mask with the
> > > > going-down cpu cleared.
> > >
> > > You can't use cpu_active_mask ?
> >
> > When a cpu is going out:
> > (cpu_active_mask is not protected by workqueue mutexs.)

But it is protected by the hotplug lock, which is really all you need
afaict.

If the worker thread gets spawned before workqueue_offline_cpu(), said
function will observe it and adjust the mask, if it gets spawned after
it, it must observe a 'reduced' cpu_active_mask.

> >
> > create_worker() for unbound pool  |  cpu offlining
> > check cpu_active_mask             |
> check wq_online_cpumask
> >                                   |  remove bit from cpu_active_mask
> >                                   |  no cpu in pool->attrs->cpumask is active
> > set pool->attrs->cpumask to worker|
> > and hit the warning
>                                     |  remove bit from wq_online_cpumask
> 
> Even with the help of wq_online_cpumask, the patchset can't silence
> the warning in __set_cpus_allowed_ptr() in this case.  It is indeed
> hard to suppress the warning for unbound pools.  Maybe we need something
> like this (outmost callback of CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE,
> so that workqueue can do preparation when offlining before AP_ACTIVE):
> 
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 0042ef362511..ac2103deb20b 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -20,6 +20,9 @@
>   *               |                               ^
>   *               v                               |
>   *              AP_ACTIVE                      AP_ACTIVE
> + *               |                               ^
> + *               v                               |
> + *              ONLINE                         ONLINE
>   */
> 
>  enum cpuhp_state {
> @@ -194,6 +197,7 @@ enum cpuhp_state {
>         CPUHP_AP_X86_HPET_ONLINE,
>         CPUHP_AP_X86_KVM_CLK_ONLINE,
>         CPUHP_AP_ACTIVE,
> +       CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE,
>         CPUHP_ONLINE,
>  };
> 

That's waay to late, by then userspace is long running and expecting
things to 'just-work'.

But afaict, things will mostly work for you when you use cpu_active_mask
on cpu-down and cpu_online_mask on cpu-up.

But I think I see the problem, it is spawning a new worker after
workqueue_online_cpu() but before sched_cpu_activate(), right? That
wants to have the wider mask set.

To solve that, the spawning of workers thing needs to know where we are
in the hotplug process, and it can track that using
workqueue_{on,off}line_cpu(). If it happens after offline, it needs to
use cpu_active_mask, if it happens after online cpu_online_mask is your
guy.

Does that make sense?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ