linux-kernel - Re: [bisected] "sched: Allow per-cpu kernel threads to run on online && !active" causes warning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20160816075505.GB3896@osiris>
Date:	Tue, 16 Aug 2016 09:55:05 +0200
From:	Heiko Carstens <heiko.carstens@...ibm.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Ming Lei <tom.leiming@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Michael Holzheu <holzheu@...ux.vnet.ibm.com>,
	Martin Schwidefsky <schwidefsky@...ibm.com>
Subject: Re: [bisected] "sched: Allow per-cpu kernel threads to run on online
 && !active" causes warning

Hi Tejun,

> On Mon, Aug 15, 2016 at 01:19:08PM +0200, Heiko Carstens wrote:
> > I can imagine several ways to fix this for s390, but before doing that I'm
> > wondering if the workqueue code is correct with
> > 
> > a) assuming that the cpu_to_node() mapping is valid for all _possible_ cpus
> >    that early
> 
> This can be debatable and making it "first registration sticks" is
> likely easy enough.
> 
> > and
> > 
> > b) that the cpu_to_node() mapping does never change
> 
> However, this part isn't just from workqueue.  It just hits in a more
> obvious way.  For example, memory allocation has the same problem and
> we would have to synchronize memory allocations against cpu <-> node
> mapping changing.  It'd be silly to add the complexity and overhead of
> making the mapping dynamic when that there's nothing inherently
> dynamic about it.  The surface area is pretty big here.
> 
> I have no idea how s390 fakenuma works.  Is that very difficult from
> x86's?  IIRC, x86's fakenuma isn't all that dynamic.

I'm not asking to make the cpu <-> node completely dynamic. We have already
code in place to keep the cpu <-> node mapping static, however currently
this happens too late, but can be fixed quite easily.

Unfortunately we do not always know to which node a cpu belongs when we
register it, currently all cpus will be registered to node 0 and only when
a cpu is brought online this will be corrected.

The problem we have are "standby" cpus on s390, for which we know they are
present but can't use them currently. The mechanism is the following:

We detect a standby cpu and register it via register_cpu(); since the node
isn't known yet for this cpu, the cpu_to_node() function will return 0,
therefore all standby cpus will be registered under node 0.

The new standby cpu will have a "configure" sysfs attribute. If somebody
writes "1" to it we signal the hypervisor that we want to use the cpu and
it allocates one. If this request succeeds we finally know where the cpu is
located topology wise and can fix up everything (and can also make the cpu
to node mapping static).
Note: as long as cpu isn't configured it cannot be brought online.

If the cpu now is finally brought online the change_cpu_under_node() code
within drivers/base/cpu.c fixes up the node symlinks so at least the sysfs
representation is also correct.

If later on the cpu is brought offline, deconfigured, etc. we do not change
the cpu_to_node mapping anymore.

So the question is how to define "first registration sticks". :)