lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZV4mPoOvIgX9Um0z@slm.duckdns.org>
Date:   Wed, 22 Nov 2023 06:03:10 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Waiman Long <longman@...hat.com>
Cc:     zhuangel570 <zhuangel570@...il.com>, jiangshanlai@...il.com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] workqueue: Make sure that wq_unbound_cpumask is never
 empty

Hello,

On Tue, Nov 21, 2023 at 05:08:29PM -0500, Waiman Long wrote:
> On 11/21/23 16:39, Tejun Heo wrote:
> > During boot, depending on how the housekeeping and workqueue.unbound_cpus
> > masks are set, wq_unbound_cpumask can end up empty. Since 8639ecebc9b1
> > ("workqueue: Implement non-strict affinity scope for unbound workqueues"),
> > this may end up feeding -1 as a CPU number into scheduler leading to oopses.
> > 
> >    BUG: unable to handle page fault for address: ffffffff8305e9c0
> >    #PF: supervisor read access in kernel mode
> >    #PF: error_code(0x0000) - not-present page
> >    ...
> >    Call Trace:
> >     <TASK>
> >     select_idle_sibling+0x79/0xaf0
> >     select_task_rq_fair+0x1cb/0x7b0
> >     try_to_wake_up+0x29c/0x5c0
> >     wake_up_process+0x19/0x20
> >     kick_pool+0x5e/0xb0
> >     __queue_work+0x119/0x430
> >     queue_work_on+0x29/0x30
> >    ...
> > 
> > An empty wq_unbound_cpumask is a clear misconfiguration and already
> > disallowed once system is booted up. Let's warn on and ignore
> > unbound_cpumask restrictions which lead to no unbound cpus. While at it,
> > also remove now unncessary empty check on wq_unbound_cpumask in
> > wq_select_unbound_cpu().
> > 
> > Signed-off-by: Tejun Heo<tj@...nel.org>
> > Reported-by: Yong He<alexyonghe@...cent.com>
> > Link:http://lkml.kernel.org/r/20231120121623.119780-1-alexyonghe@tencent.com
> > Fixes: 8639ecebc9b1 ("workqueue: Implement non-strict affinity scope for unbound workqueues")
> > Cc:stable@...r.kernel.org  # v6.6+
> > ---
> > Hello,
> > 
> > Yong He, zhuangel570, can you please verify that this patch makes the oops
> > go away? Waiman, this touches code that you've recently worked on. AFAICS,
> > they shouldn't interact or cause conflicts. cc'ing just in case.
> 
> It does conflict with commit fe28f631fa94 ("workqueue: Add
> workqueue_unbound_exclude_cpumask() to exclude CPUs from
> wq_unbound_cpumask") as it has the following hunk:
> 
> @@ -6534,11 +6606,14 @@ void __init workqueue_init_early(void)
>         BUILD_BUG_ON(__alignof__(struct pool_workqueue) < __alignof__(long
> long));
> 
>         BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL));
> + BUG_ON(!alloc_cpumask_var(&wq_requested_unbound_cpumask, GFP_KERNEL));
> +       BUG_ON(!zalloc_cpumask_var(&wq_isolated_cpumask, GFP_KERNEL));
>         cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_WQ));
>         cpumask_and(wq_unbound_cpumask, wq_unbound_cpumask,
> housekeeping_cpumask(HK_TYPE_DOMAIN));
> 
>         if (!cpumask_empty(&wq_cmdline_cpumask))
>                 cpumask_and(wq_unbound_cpumask, wq_unbound_cpumask,
> &wq_cmdline_cpumask);
> +       cpumask_copy(wq_requested_unbound_cpumask, wq_unbound_cpumask);
> 
>         pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);
...
> Is it possible to route this patch to cgroup for 6.8 to avoid conflict?
> Other than that, the patch looks good to me.

It's a workqueue fix patch, so what I'm gonna do is land this in
wq/for-6.6-fixes and just resolve it in cgroup/for-next.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ