lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1404358279.5137.63.camel@marge.simpson.net>
Date:	Thu, 03 Jul 2014 05:31:19 +0200
From:	Mike Galbraith <umgwanakikbuti@...il.com>
To:	paulmck@...ux.vnet.ibm.com
Cc:	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, riel@...hat.com, mingo@...nel.org,
	laijs@...fujitsu.com, dipankar@...ibm.com,
	akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
	josh@...htriplett.org, niv@...ibm.com, tglx@...utronix.de,
	rostedt@...dmis.org, dhowells@...hat.com, edumazet@...gle.com,
	dvhart@...ux.intel.com, fweisbec@...il.com, oleg@...hat.com,
	sbw@....edu
Subject: Re: [PATCH RFC tip/core/rcu] Parallelize and economize NOCB kthread
 wakeups

On Wed, 2014-07-02 at 10:08 -0700, Paul E. McKenney wrote: 
> On Wed, Jul 02, 2014 at 06:04:12PM +0200, Peter Zijlstra wrote:
> > On Wed, Jul 02, 2014 at 08:39:15AM -0700, Paul E. McKenney wrote:
> > > On Wed, Jul 02, 2014 at 02:34:12PM +0200, Peter Zijlstra wrote:
> > > > On Fri, Jun 27, 2014 at 07:20:38AM -0700, Paul E. McKenney wrote:
> > > > > An 80-CPU system with a context-switch-heavy workload can require so
> > > > > many NOCB kthread wakeups that the RCU grace-period kthreads spend several
> > > > > tens of percent of a CPU just awakening things.  This clearly will not
> > > > > scale well: If you add enough CPUs, the RCU grace-period kthreads would
> > > > > get behind, increasing grace-period latency.
> > > > > 
> > > > > To avoid this problem, this commit divides the NOCB kthreads into leaders
> > > > > and followers, where the grace-period kthreads awaken the leaders each of
> > > > > whom in turn awakens its followers.  By default, the number of groups of
> > > > > kthreads is the square root of the number of CPUs, but this default may
> > > > > be overridden using the rcutree.rcu_nocb_leader_stride boot parameter.
> > > > > This reduces the number of wakeups done per grace period by the RCU
> > > > > grace-period kthread by the square root of the number of CPUs, but of
> > > > > course by shifting those wakeups to the leaders.  In addition, because
> > > > > the leaders do grace periods on behalf of their respective followers,
> > > > > the number of wakeups of the followers decreases by up to a factor of two.
> > > > > Instead of being awakened once when new callbacks arrive and again
> > > > > at the end of the grace period, the followers are awakened only at
> > > > > the end of the grace period.
> > > > > 
> > > > > For a numerical example, in a 4096-CPU system, the grace-period kthread
> > > > > would awaken 64 leaders, each of which would awaken its 63 followers
> > > > > at the end of the grace period.  This compares favorably with the 79
> > > > > wakeups for the grace-period kthread on an 80-CPU system.
> > > > 
> > > > Urgh, how about we kill the entire nocb nonsense and try again? This is
> > > > getting quite rediculous.
> > > 
> > > Sure thing, Peter.
> > 
> > So you don't think this has gotten a little out of hand? The NOCB stuff
> > has lead to these masses of rcu threads and now you're adding extra
> > cache misses to the perfectly sane and normal code paths just to deal
> > with so many threads.
> 
> Indeed it appears to have gotten a bit out of hand.  But let's please
> attack the real problem rather than the immediate irritant.
> 
> And in this case, the real problem is that users are getting callback
> offloading even when there is no reason for it.
> 
> > And all to support a feature that nearly nobody uses. And you were
> > talking about making nocb the default rcu...
> 
> As were others, not that long ago.  Today is the first hint that I got
> that you feel otherwise.  But it does look like the softirq approach to
> callback processing needs to stick around for awhile longer.  Nice to
> hear that softirq is now "sane and normal" again, I guess.  ;-)
> 
> Please see my patch in reply to Rik's email.  The idea is to neither
> rip callback offloading from the kernel nor to keep callback offloading
> as the default, but instead do callback offloading only for those CPUs
> specifically marked as NO_HZ_FULL CPUs, or when specifically requested
> at build time or at boot time.  In other words, only do it when it is
> needed.

Exactly!  Like dynamically, when the user isolates CPUs via the cpuset
interface, none of it making much sense without that particular property
of a set of CPUs, and cpuset being the manager of CPU set properties.

NO_HZ_FULL is a property of a set of CPUs.  isolcpus is supposed to go
away as being a redundant interface to manage a single property of a set
of CPUs, but it's perfectly fine for NO_HZ_FULL to add an interface to
manage a single property of a set of CPUs.  What am I missing? 

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ