[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161017181556.GB6248@htj.duckdns.org>
Date: Mon, 17 Oct 2016 14:15:56 -0400
From: Tejun Heo <tj@...nel.org>
To: Michael Ellerman <mpe@...erman.id.au>
Cc: torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
jiangshanlai@...il.com, akpm@...ux-foundation.org,
kernel-team@...com,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
Balbir Singh <bsingharora@...il.com>
Subject: Re: Oops on Power8 (was Re: [PATCH v2 1/7] workqueue: make workqueue
available early during boot)
Hello, Michael.
On Mon, Oct 17, 2016 at 11:24:34PM +1100, Michael Ellerman wrote:
> The bad case (where we hit the BUG_ON I added above) is where we are
> creating a wq for node 1.
>
> In wq_calc_node_cpumask() we do:
>
> cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
> return !cpumask_equal(cpumask, attrs->cpumask);
>
> Which with the arguments inserted is:
>
> cpumask_and(tmp_attrs->cpumask, new_attrs->cpumask, wq_numa_possible_cpumask[1]);
> return !cpumask_equal(tmp_attrs->cpumask, new_attrs->cpumask);
>
> And that results in tmp_attrs->cpumask being empty, because
> wq_numa_possible_cpumask[1] is an empty cpumask.
Ah, should have read this before replying to the previous mail, so
it's the numa mask, not the cpu_possible_mask.
> The reason wq_numa_possible_cpumask[1] is an empty mask is because in
> wq_numa_init() we did:
>
> for_each_possible_cpu(cpu) {
> node = cpu_to_node(cpu);
> if (WARN_ON(node == NUMA_NO_NODE)) {
> pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu);
> /* happens iff arch is bonkers, let's just proceed */
> return;
> }
> cpumask_set_cpu(cpu, tbl[node]);
> }
>
> And cpu_to_node() returned node 0 for every CPU in the system, despite there
> being multiple nodes.
>
> That happened because we haven't yet called set_cpu_numa_node() for the non-boot
> cpus, because that happens in smp_prepare_cpus(), and
> workqueue_init_early() is called much earlier than that.
>
> This doesn't trigger on x86 because it does set_cpu_numa_node() in
> setup_per_cpu_areas(), which is called prior to workqueue_init_early().
>
> We can (should) probably do the same on powerpc, I'll look at that
> tomorrow. But other arches may have a similar problem, and at the very
> least we need to document that workqueue_init_early() relies on
> cpu_to_node() working.
I should be able to move the numa part of initialization to the later
init function. Working on it.
Thanks.
--
tejun
Powered by blists - more mailing lists