lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170822165437.GG491396@devbig577.frc2.facebook.com>
Date:   Tue, 22 Aug 2017 09:54:37 -0700
From:   Tejun Heo <tj@...nel.org>
To:     Michael Ellerman <mpe@...erman.id.au>
Cc:     Laurent Vivier <lvivier@...hat.com>, linux-kernel@...r.kernel.org,
        linux-block@...r.kernel.org, Jens Axboe <axboe@...nel.dk>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH 1/2] powerpc/workqueue: update list of possible CPUs

Hello, Michael.

On Tue, Aug 22, 2017 at 11:41:41AM +1000, Michael Ellerman wrote:
> > This is something powerpc needs to fix.
> 
> There is no way for us to fix it.

I don't think that's true.  The CPU id used in kernel doesn't have to
match the physical one and arch code should be able to pre-map CPU IDs
to nodes and use the matching one when hotplugging CPUs.  I'm not
saying that's the best way to solve the problem tho.  It could be that
the best way forward is making cpu <-> node mapping dynamic and
properly synchronized.  However, please note that that does mean we
mess up node affinity for things like per-cpu memory which are
allocated before the cpu comes up, so there's some inherent benefits
to keeping the mapping static even if that involves indirection.

> > Workqueue isn't the only one making this assumption. mm as a whole
> > assumes that CPU <-> node mapping is stable regardless of hotplug
> > events.
> 
> At least in this case I don't think the mapping changes, it's just we
> don't know the mapping at boot.
> 
> Currently we have to report possible but not present CPUs as belonging
> to node 0, because otherwise we trip this helpful piece of code:
> 
> 	for_each_possible_cpu(cpu) {
> 		node = cpu_to_node(cpu);
> 		if (WARN_ON(node == NUMA_NO_NODE)) {
> 			pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu);
> 			/* happens iff arch is bonkers, let's just proceed */
> 			return;
> 		}
> 
> But if we remove that, we could then accurately report NUMA_NO_NODE at
> boot, and then update the mapping when the CPU is hotplugged.

If you think that making this dynamic is the right way to go, I have
no objection but we should be doing this properly instead of patching
up what seems to be crashing right now.  What synchronization and
notification mechanisms do we need to make cpu <-> node mapping
dynamic?  Do we need any synchronization in memory allocation paths?
If not, why would it be safe?

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ