[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJhGHyDWswu4gzT0qJzR3vBC6ESm1yui+JHFHfueXan95i0NUg@mail.gmail.com>
Date: Wed, 23 Dec 2020 23:01:53 +0800
From: Lai Jiangshan <jiangshanlai@...il.com>
To: Dexuan-Linux Cui <dexuan.linux@...il.com>
Cc: Dexuan Cui <decui@...rosoft.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Valentin Schneider <valentin.schneider@....com>,
Peter Zijlstra <peterz@...radead.org>,
Qian Cai <cai@...hat.com>,
Vincent Donnefort <vincent.donnefort@....com>,
Lai Jiangshan <laijs@...ux.alibaba.com>,
Hillf Danton <hdanton@...a.com>, Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH -tip V2 00/10] workqueue: break affinity initiatively
On Wed, Dec 23, 2020 at 5:39 AM Dexuan-Linux Cui <dexuan.linux@...il.com> wrote:
>
> On Fri, Dec 18, 2020 at 8:11 AM Lai Jiangshan <jiangshanlai@...il.com> wrote:
> >
> > From: Lai Jiangshan <laijs@...ux.alibaba.com>
> >
> > 06249738a41a ("workqueue: Manually break affinity on hotplug")
> > said that scheduler will not force break affinity for us.
> >
> > But workqueue highly depends on the old behavior. Many parts of the codes
> > relies on it, 06249738a41a ("workqueue: Manually break affinity on hotplug")
> > is not enough to change it, and the commit has flaws in itself too.
> >
> > It doesn't handle for worker detachment.
> > It doesn't handle for worker attachement, mainly worker creation
> > which is handled by Valentin Schneider's patch [1].
> > It doesn't handle for unbound workers which might be possible
> > per-cpu-kthread.
> >
> > We need to thoroughly update the way workqueue handles affinity
> > in cpu hot[un]plug, what is this patchset intends to do and
> > replace the Valentin Schneider's patch [1]. The equivalent patch
> > is patch 10.
> >
> > Patch 1 fixes a flaw reported by Hillf Danton <hdanton@...a.com>.
> > I have to include this fix because later patches depends on it.
> >
> > The patchset is based on tip/master rather than workqueue tree,
> > because the patchset is a complement for 06249738a41a ("workqueue:
> > Manually break affinity on hotplug") which is only in tip/master by now.
> >
> > And TJ acked to route the series through tip.
> >
> > Changed from V1:
> > Add TJ's acked-by for the whole patchset
> >
> > Add more words to the comments and the changelog, mainly derived
> > from discussion with Peter.
> >
> > Update the comments as TJ suggested.
> >
> > Update a line of code as Valentin suggested.
> >
> > Add Valentin's ack for patch 10 because "Seems alright to me." and
> > add Valentin's comments to the changelog which is integral.
> >
> > [1]: https://lore.kernel.org/r/ff62e3ee994efb3620177bf7b19fab16f4866845.camel@redhat.com
> > [V1 patcheset]: https://lore.kernel.org/lkml/20201214155457.3430-1-jiangshanlai@gmail.com/
> >
> > Cc: Hillf Danton <hdanton@...a.com>
> > Cc: Valentin Schneider <valentin.schneider@....com>
> > Cc: Qian Cai <cai@...hat.com>
> > Cc: Peter Zijlstra <peterz@...radead.org>
> > Cc: Vincent Donnefort <vincent.donnefort@....com>
> > Cc: Tejun Heo <tj@...nel.org>
> >
> > Lai Jiangshan (10):
> > workqueue: restore unbound_workers' cpumask correctly
> > workqueue: use cpu_possible_mask instead of cpu_active_mask to break
> > affinity
> > workqueue: Manually break affinity on pool detachment
> > workqueue: don't set the worker's cpumask when kthread_bind_mask()
> > workqueue: introduce wq_online_cpumask
> > workqueue: use wq_online_cpumask in restore_unbound_workers_cpumask()
> > workqueue: Manually break affinity on hotplug for unbound pool
> > workqueue: reorganize workqueue_online_cpu()
> > workqueue: reorganize workqueue_offline_cpu() unbind_workers()
> > workqueue: Fix affinity of kworkers when attaching into pool
> >
> > kernel/workqueue.c | 214 ++++++++++++++++++++++++++++-----------------
> > 1 file changed, 132 insertions(+), 82 deletions(-)
> >
> > --
> > 2.19.1.6.gb485710b
>
> Hi,
> I tested this patchset on today's tip.git's master branch
> (981316394e35 ("Merge branch 'locking/urgent'")).
>
> Every time the kernel boots with 32 CPUs (I'm running the Linux VM on
> Hyper-V), I get the below warning.
> (BTW, with 8 or 16 CPUs, I don't see the warning).
> By printing the cpumasks with "%*pbl", I know the warning happens because:
> new_mask = 16-31
> cpu_online_mask= 0-16
> cpu_active_mask= 0-15
> p->nr_cpus_allowed=16
>
> 2374 if (p->flags & PF_KTHREAD) {
> 2375 /*
> 2376 * For kernel threads that do indeed end up on online &&
> 2377 * !active we want to ensure they are strict
> per-CPU threads.
> 2378 */
> 2379 WARN_ON(cpumask_intersects(new_mask, cpu_online_mask) &&
> 2380 !cpumask_intersects(new_mask, cpu_active_mask) &&
> 2381 p->nr_cpus_allowed != 1);
> 2382 }
> 2383
>
Hello, Dexuan
Could you omit patch4 of the patchset and test it again, please?
("workqueue: don't set the worker's cpumask when kthread_bind_mask()")
kthread_bind_mask() set the worker task to the pool's cpumask without
any check. And set_cpus_allowed_ptr() finds that the task's cpumask
is unchanged (already set by kthread_bind_mask()) and skips all the checks.
And I found that numa=fake=2U seems broken on cpumask_of_node() in my box.
Thanks,
Lai
Powered by blists - more mailing lists