lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 2 Jul 2014 22:21:24 -0700 From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> To: Mike Galbraith <umgwanakikbuti@...il.com> Cc: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org, riel@...hat.com, mingo@...nel.org, laijs@...fujitsu.com, dipankar@...ibm.com, akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com, josh@...htriplett.org, niv@...ibm.com, tglx@...utronix.de, rostedt@...dmis.org, dhowells@...hat.com, edumazet@...gle.com, dvhart@...ux.intel.com, fweisbec@...il.com, oleg@...hat.com, sbw@....edu Subject: Re: [PATCH RFC tip/core/rcu] Parallelize and economize NOCB kthread wakeups On Thu, Jul 03, 2014 at 05:31:19AM +0200, Mike Galbraith wrote: > On Wed, 2014-07-02 at 10:08 -0700, Paul E. McKenney wrote: > > On Wed, Jul 02, 2014 at 06:04:12PM +0200, Peter Zijlstra wrote: > > > On Wed, Jul 02, 2014 at 08:39:15AM -0700, Paul E. McKenney wrote: > > > > On Wed, Jul 02, 2014 at 02:34:12PM +0200, Peter Zijlstra wrote: > > > > > On Fri, Jun 27, 2014 at 07:20:38AM -0700, Paul E. McKenney wrote: > > > > > > An 80-CPU system with a context-switch-heavy workload can require so > > > > > > many NOCB kthread wakeups that the RCU grace-period kthreads spend several > > > > > > tens of percent of a CPU just awakening things. This clearly will not > > > > > > scale well: If you add enough CPUs, the RCU grace-period kthreads would > > > > > > get behind, increasing grace-period latency. > > > > > > > > > > > > To avoid this problem, this commit divides the NOCB kthreads into leaders > > > > > > and followers, where the grace-period kthreads awaken the leaders each of > > > > > > whom in turn awakens its followers. By default, the number of groups of > > > > > > kthreads is the square root of the number of CPUs, but this default may > > > > > > be overridden using the rcutree.rcu_nocb_leader_stride boot parameter. > > > > > > This reduces the number of wakeups done per grace period by the RCU > > > > > > grace-period kthread by the square root of the number of CPUs, but of > > > > > > course by shifting those wakeups to the leaders. In addition, because > > > > > > the leaders do grace periods on behalf of their respective followers, > > > > > > the number of wakeups of the followers decreases by up to a factor of two. > > > > > > Instead of being awakened once when new callbacks arrive and again > > > > > > at the end of the grace period, the followers are awakened only at > > > > > > the end of the grace period. > > > > > > > > > > > > For a numerical example, in a 4096-CPU system, the grace-period kthread > > > > > > would awaken 64 leaders, each of which would awaken its 63 followers > > > > > > at the end of the grace period. This compares favorably with the 79 > > > > > > wakeups for the grace-period kthread on an 80-CPU system. > > > > > > > > > > Urgh, how about we kill the entire nocb nonsense and try again? This is > > > > > getting quite rediculous. > > > > > > > > Sure thing, Peter. > > > > > > So you don't think this has gotten a little out of hand? The NOCB stuff > > > has lead to these masses of rcu threads and now you're adding extra > > > cache misses to the perfectly sane and normal code paths just to deal > > > with so many threads. > > > > Indeed it appears to have gotten a bit out of hand. But let's please > > attack the real problem rather than the immediate irritant. > > > > And in this case, the real problem is that users are getting callback > > offloading even when there is no reason for it. > > > > > And all to support a feature that nearly nobody uses. And you were > > > talking about making nocb the default rcu... > > > > As were others, not that long ago. Today is the first hint that I got > > that you feel otherwise. But it does look like the softirq approach to > > callback processing needs to stick around for awhile longer. Nice to > > hear that softirq is now "sane and normal" again, I guess. ;-) > > > > Please see my patch in reply to Rik's email. The idea is to neither > > rip callback offloading from the kernel nor to keep callback offloading > > as the default, but instead do callback offloading only for those CPUs > > specifically marked as NO_HZ_FULL CPUs, or when specifically requested > > at build time or at boot time. In other words, only do it when it is > > needed. > > Exactly! Like dynamically, when the user isolates CPUs via the cpuset > interface, none of it making much sense without that particular property > of a set of CPUs, and cpuset being the manager of CPU set properties. Glad you like it! ;-) > NO_HZ_FULL is a property of a set of CPUs. isolcpus is supposed to go > away as being a redundant interface to manage a single property of a set > of CPUs, but it's perfectly fine for NO_HZ_FULL to add an interface to > manage a single property of a set of CPUs. What am I missing? Well, for now, it can only be specified at build time or at boot time. In theory, it is possible to change a CPU from being callback-offloaded to not at runtime, but there would need to be an extremely good reason for adding that level of complexity. Lots of "fun" races in there... Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists