[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201226145239.GJ2657@paulmck-ThinkPad-P72>
Date: Sat, 26 Dec 2020 06:52:39 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Hillf Danton <hdanton@...a.com>
Cc: Lai Jiangshan <jiangshanlai@...il.com>,
linux-kernel@...r.kernel.org,
Valentin Schneider <valentin.schneider@....com>,
Peter Zijlstra <peterz@...radead.org>,
Qian Cai <cai@...hat.com>,
Vincent Donnefort <vincent.donnefort@....com>,
Lai Jiangshan <laijs@...ux.alibaba.com>,
Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH -tip V2 00/10] workqueue: break affinity initiatively
On Sat, Dec 26, 2020 at 06:34:21PM +0800, Hillf Danton wrote:
> On Wed, 23 Dec 2020 11:49:51 -0800 "Paul E. McKenney" wrote:
> >On Sat, Dec 19, 2020 at 01:09:09AM +0800, Lai Jiangshan wrote:
> >> From: Lai Jiangshan <laijs@...ux.alibaba.com>
> >>
> >> 06249738a41a ("workqueue: Manually break affinity on hotplug")
> >> said that scheduler will not force break affinity for us.
> >>
> >> But workqueue highly depends on the old behavior. Many parts of the codes
> >> relies on it, 06249738a41a ("workqueue: Manually break affinity on hotplug")
> >> is not enough to change it, and the commit has flaws in itself too.
> >>
> >> It doesn't handle for worker detachment.
> >> It doesn't handle for worker attachement, mainly worker creation
> >> which is handled by Valentin Schneider's patch [1].
> >> It doesn't handle for unbound workers which might be possible
> >> per-cpu-kthread.
> >>
> >> We need to thoroughly update the way workqueue handles affinity
> >> in cpu hot[un]plug, what is this patchset intends to do and
> >> replace the Valentin Schneider's patch [1]. The equivalent patch
> >> is patch 10.
> >>
> >> Patch 1 fixes a flaw reported by Hillf Danton <hdanton@...a.com>.
> >> I have to include this fix because later patches depends on it.
> >>
> >> The patchset is based on tip/master rather than workqueue tree,
> >> because the patchset is a complement for 06249738a41a ("workqueue:
> >> Manually break affinity on hotplug") which is only in tip/master by now.
> >>
> >> And TJ acked to route the series through tip.
> >>
> >> Changed from V1:
> >> Add TJ's acked-by for the whole patchset
> >>
> >> Add more words to the comments and the changelog, mainly derived
> >> from discussion with Peter.
> >>
> >> Update the comments as TJ suggested.
> >>
> >> Update a line of code as Valentin suggested.
> >>
> >> Add Valentin's ack for patch 10 because "Seems alright to me." and
> >> add Valentin's comments to the changelog which is integral.
> >>
> >> [1]: https://lore.kernel.org/r/ff62e3ee994efb3620177bf7b19fab16f4866845.camel@redhat.com
> >> [V1 patcheset]: https://lore.kernel.org/lkml/20201214155457.3430-1-jiangshanlai@gmail.com/
> >>
> >> Cc: Hillf Danton <hdanton@...a.com>
> >> Cc: Valentin Schneider <valentin.schneider@....com>
> >> Cc: Qian Cai <cai@...hat.com>
> >> Cc: Peter Zijlstra <peterz@...radead.org>
> >> Cc: Vincent Donnefort <vincent.donnefort@....com>
> >> Cc: Tejun Heo <tj@...nel.org>
> >
> >And rcutorture hits this, so thank you for the fix!
>
> Can you please specify a bit what you encountered in rcutorture
> before this patchset? You know we cant have a correct estimation
> of the fix diameter without your help.
It triggers the following in sched_cpu_dying() in kernel/sched/core.c,
exactly the same as for Lai Jiangshan:
BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq))
Which is in fact the "this" in my earlier "rcutorture hits this". ;-)
Thanx, Paul
> >Tested-by: Paul E. McKenney <paulmck@...nel.org>
> >
> >> Lai Jiangshan (10):
> >> workqueue: restore unbound_workers' cpumask correctly
> >> workqueue: use cpu_possible_mask instead of cpu_active_mask to break
> >> affinity
> >> workqueue: Manually break affinity on pool detachment
> >> workqueue: don't set the worker's cpumask when kthread_bind_mask()
> >> workqueue: introduce wq_online_cpumask
> >> workqueue: use wq_online_cpumask in restore_unbound_workers_cpumask()
> >> workqueue: Manually break affinity on hotplug for unbound pool
> >> workqueue: reorganize workqueue_online_cpu()
> >> workqueue: reorganize workqueue_offline_cpu() unbind_workers()
> >> workqueue: Fix affinity of kworkers when attaching into pool
> >>
> >> kernel/workqueue.c | 214 ++++++++++++++++++++++++++++-----------------
> >> 1 file changed, 132 insertions(+), 82 deletions(-)
> >>
> >> --
> >> 2.19.1.6.gb485710b
Powered by blists - more mailing lists