linux-kernel - Re: [PATCH RFC tip/core/rcu] Parallelize and economize NOCB kthread wakeups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140702170838.GS4603@linux.vnet.ibm.com>
Date:	Wed, 2 Jul 2014 10:08:38 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel@...r.kernel.org, riel@...hat.com, mingo@...nel.org,
	laijs@...fujitsu.com, dipankar@...ibm.com,
	akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
	josh@...htriplett.org, niv@...ibm.com, tglx@...utronix.de,
	rostedt@...dmis.org, dhowells@...hat.com, edumazet@...gle.com,
	dvhart@...ux.intel.com, fweisbec@...il.com, oleg@...hat.com,
	sbw@....edu
Subject: Re: [PATCH RFC tip/core/rcu] Parallelize and economize NOCB kthread
 wakeups

On Wed, Jul 02, 2014 at 06:04:12PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 02, 2014 at 08:39:15AM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 02, 2014 at 02:34:12PM +0200, Peter Zijlstra wrote:
> > > On Fri, Jun 27, 2014 at 07:20:38AM -0700, Paul E. McKenney wrote:
> > > > An 80-CPU system with a context-switch-heavy workload can require so
> > > > many NOCB kthread wakeups that the RCU grace-period kthreads spend several
> > > > tens of percent of a CPU just awakening things.  This clearly will not
> > > > scale well: If you add enough CPUs, the RCU grace-period kthreads would
> > > > get behind, increasing grace-period latency.
> > > > 
> > > > To avoid this problem, this commit divides the NOCB kthreads into leaders
> > > > and followers, where the grace-period kthreads awaken the leaders each of
> > > > whom in turn awakens its followers.  By default, the number of groups of
> > > > kthreads is the square root of the number of CPUs, but this default may
> > > > be overridden using the rcutree.rcu_nocb_leader_stride boot parameter.
> > > > This reduces the number of wakeups done per grace period by the RCU
> > > > grace-period kthread by the square root of the number of CPUs, but of
> > > > course by shifting those wakeups to the leaders.  In addition, because
> > > > the leaders do grace periods on behalf of their respective followers,
> > > > the number of wakeups of the followers decreases by up to a factor of two.
> > > > Instead of being awakened once when new callbacks arrive and again
> > > > at the end of the grace period, the followers are awakened only at
> > > > the end of the grace period.
> > > > 
> > > > For a numerical example, in a 4096-CPU system, the grace-period kthread
> > > > would awaken 64 leaders, each of which would awaken its 63 followers
> > > > at the end of the grace period.  This compares favorably with the 79
> > > > wakeups for the grace-period kthread on an 80-CPU system.
> > > 
> > > Urgh, how about we kill the entire nocb nonsense and try again? This is
> > > getting quite rediculous.
> > 
> > Sure thing, Peter.
> 
> So you don't think this has gotten a little out of hand? The NOCB stuff
> has lead to these masses of rcu threads and now you're adding extra
> cache misses to the perfectly sane and normal code paths just to deal
> with so many threads.

Indeed it appears to have gotten a bit out of hand.  But let's please
attack the real problem rather than the immediate irritant.

And in this case, the real problem is that users are getting callback
offloading even when there is no reason for it.

> And all to support a feature that nearly nobody uses. And you were
> talking about making nocb the default rcu...

As were others, not that long ago.  Today is the first hint that I got
that you feel otherwise.  But it does look like the softirq approach to
callback processing needs to stick around for awhile longer.  Nice to
hear that softirq is now "sane and normal" again, I guess.  ;-)

Please see my patch in reply to Rik's email.  The idea is to neither
rip callback offloading from the kernel nor to keep callback offloading
as the default, but instead do callback offloading only for those CPUs
specifically marked as NO_HZ_FULL CPUs, or when specifically requested
at build time or at boot time.  In other words, only do it when it is
needed.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/