linux-kernel - Re: [PATCH] workqueue: fix rebind bound workers warning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANRm+CzvA00tBZA9f+_zy1aunQVThGVMKjMkNZwW4vtQKudrJg@mail.gmail.com>
Date:	Mon, 9 May 2016 15:28:51 +0800
From:	Wanpeng Li <kernellwp@...il.com>
To:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Cc:	Wanpeng Li <wanpeng.li@...mail.com>, Tejun Heo <tj@...nel.org>,
	Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [PATCH] workqueue: fix rebind bound workers warning

Sorry to quick ping you Tejun, just hope it can catch the upcoming
merge window. :-)
2016-05-05 9:41 GMT+08:00 Wanpeng Li <kernellwp@...il.com>:
> From: Wanpeng Li <wanpeng.li@...mail.com>
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 16 at kernel/workqueue.c:4559 rebind_workers+0x1c0/0x1d0
> Modules linked in:
> CPU: 0 PID: 16 Comm: cpuhp/0 Not tainted 4.6.0-rc4+ #31
> Hardware name: IBM IBM System x3550 M4 Server -[7914IUW]-/00Y8603, BIOS -[D7E128FUS-1.40]- 07/23/2013
>  0000000000000000 ffff881037babb58 ffffffff8139d885 0000000000000010
>  0000000000000000 0000000000000000 0000000000000000 ffff881037babba8
>  ffffffff8108505d ffff881037ba0000 000011cf3e7d6e60 0000000000000046
> Call Trace:
>  dump_stack+0x89/0xd4
>  __warn+0xfd/0x120
>  warn_slowpath_null+0x1d/0x20
>  rebind_workers+0x1c0/0x1d0
>  workqueue_cpu_up_callback+0xf5/0x1d0
>  notifier_call_chain+0x64/0x90
>  ? trace_hardirqs_on_caller+0xf2/0x220
>  ? notify_prepare+0x80/0x80
>  __raw_notifier_call_chain+0xe/0x10
>  __cpu_notify+0x35/0x50
>  notify_down_prepare+0x5e/0x80
>  ? notify_prepare+0x80/0x80
>  cpuhp_invoke_callback+0x73/0x330
>  ? __schedule+0x33e/0x8a0
>  cpuhp_down_callbacks+0x51/0xc0
>  cpuhp_thread_fun+0xc1/0xf0
>  smpboot_thread_fn+0x159/0x2a0
>  ? smpboot_create_threads+0x80/0x80
>  kthread+0xef/0x110
>  ? wait_for_completion+0xf0/0x120
>  ? schedule_tail+0x35/0xf0
>  ret_from_fork+0x22/0x50
>  ? __init_kthread_worker+0x70/0x70
> ---[ end trace eb12ae47d2382d8f ]---
> notify_down_prepare: attempt to take down CPU 0 failed
>
> This bug can be reproduced by below config w/ nohz_full= all cpus:
>
> CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
> CONFIG_DEBUG_HOTPLUG_CPU0=y
> CONFIG_NO_HZ_FULL=y
>
> The boot CPU handles housekeeping duty(unbound timers, workqueues,
> timekeeping, ...) on behalf of full dynticks CPUs. It must remain
> online when nohz full is enabled. There is a priority set to every
> notifier_blocks:
>
> workqueue_cpu_up > tick_nohz_cpu_down > workqueue_cpu_down
>
> So tick_nohz_cpu_down callback failed when down prepare cpu 0, and
> notifier_blocks behind tick_nohz_cpu_down will not be called any
> more, which leads to workers are actually not unbound. Then hotplug
> state machine will fallback to undo and online cpu 0 again. Workers
> will be rebound unconditionally even if they are not unbound and
> trigger the warning in this progress.
>
> This patch fix it by catching !DISASSOCIATED to avoid rebind bound
> workers.
>
> Cc: Tejun Heo <tj@...nel.org>
> Cc: Lai Jiangshan <jiangshanlai@...il.com>
> Suggested-by: Lai Jiangshan <jiangshanlai@...il.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@...mail.com>
> ---
>  kernel/workqueue.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 2232ae3..cc18920 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4525,6 +4525,12 @@ static void rebind_workers(struct worker_pool *pool)
>                                                   pool->attrs->cpumask) < 0);
>
>         spin_lock_irq(&pool->lock);
> +
> +       if (!(pool->flags & POOL_DISASSOCIATED)) {
> +               spin_unlock_irq(&pool->lock);
> +               return;
> +       }
> +
>         pool->flags &= ~POOL_DISASSOCIATED;
>
>         for_each_pool_worker(worker, pool) {
> --
> 1.9.1
>