[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130515173142.GL4442@linux.vnet.ibm.com>
Date: Wed, 15 May 2013 10:31:42 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Josh Triplett <josh@...htriplett.org>,
linux-kernel@...r.kernel.org, mingo@...e.hu, laijs@...fujitsu.com,
dipankar@...ibm.com, akpm@...ux-foundation.org,
mathieu.desnoyers@...ymtl.ca, niv@...ibm.com, tglx@...utronix.de,
rostedt@...dmis.org, Valdis.Kletnieks@...edu, dhowells@...hat.com,
edumazet@...gle.com, darren@...art.com, fweisbec@...il.com,
sbw@....edu
Subject: Re: [PATCH tip/core/rcu 6/7] rcu: Drive quiescent-state-forcing
delay from HZ
On Wed, May 15, 2013 at 11:02:34AM +0200, Peter Zijlstra wrote:
> On Wed, May 15, 2013 at 10:56:39AM +0200, Peter Zijlstra wrote:
> > On Tue, May 14, 2013 at 08:47:28AM -0700, Paul E. McKenney wrote:
> > > On Tue, May 14, 2013 at 04:51:20PM +0200, Peter Zijlstra wrote:
> > > > > In theory, yes. In practice, this requires lots of lock acquisitions
> > > > > and releases on large systems, including some global locks. The weight
> > > > > could be reduced, but...
> > > > >
> > > > > What I would like to do instead would be to specify expedited grace
> > > > > periods during boot.
> > > >
> > > > But why, surely going idle without any RCU callbacks isn't completely unheard
> > > > of, even outside of the boot process?
> > >
> > > Yep, and RCU has special-cased that for quite some time.
> > >
> > > > Being able to quickly drop out of the RCU state machinery would be a good thing IMO.
> > >
> > > And this is currently possible -- this is the job of rcu_idle_enter()
> > > and friends. And it works well, at least when I get my "if" statements
> > > set up correctly (hence the earlier patch).
> > >
> > > Or are you seeing a slowdown even with that earlier patch applied? If so,
> > > please let me know what you are seeing.
> >
> > I'm not running anything in particular, except maybe a broken mental
> > model of RCU ;-)
> >
> > So what I'm talking about is the !rcu_cpu_has_callbacks() case, where
> > there's absolutely nothing for RCU to do except tell the state machine
> > its no longer participating.
> >
> > Your patch to rcu_needs_cpu() frobbing the lazy condition is after that
> > and thus irrelevant for this AFAICT.
> >
> > Now as far as I can see, rcu_needs_cpu() will return false in this case;
> > allowing the cpu to enter NO_HZ state. We then call rcu_idle_enter()
> > which would call rcu_eqs_enter(). Which should put the CPU in extended
> > quiescent state.
> >
> > However, you're still running into these FQSs delaying boot. Why is
> > that? Is that because rcu_eqs_enter() doesn't really do enough?
> >
> > The thing is, if all other CPUs are idle, detecting the end of a grace
> > period should be rather trivial and not involve FQSs and thus be tons
> > faster.
> >
> > Clearly I'm missing something obvious and not communicating right or so.
>
> Earlier you said that improving EQS behaviour was expensive in that it
> would require taking (global) locks or somesuch.
>
> Would it not be possible to have the cpu performing a FQS finish this
> work; that way the first FQS would be a little slow, but after that no
> FQS would be needed anymore, right? Since we'd no longer require the
> other CPUs to end a grace period.
It is not just the first FQS that would be slow, it would also be slow
the next time that this CPU transitioned from idle to non-idle, which
is when this work would need to be undone.
Furthermore, in this approach, RCU would still need to scan all the CPUs
to see if any did the first part of the transition to idle. And if we
have to scan either way, why not keep the idle-nonidle transitions cheap
and continue to rely on the scan? Here are the rationales I can think
of and what I am thinking in terms of doing instead:
1. The scan could become a scalability bottleneck. There is one
way to handle this today, and one possible future change. The way
to handle this today is to increas rcutree.jiffies_till_first_fqs,
for example, the SGI guys set it to 20 or thereabouts. If this
becomes problematic, I could easily create multiple kthreads to
carry out the FQS scan in parallel for large systems.
2. Someone could demonstrate that RCU's grace periods were significantly
delaying boot. There are several ways of dealing with this:
a. Set rcupdate.rcu_expedited=1 at boot, and set it back
after boot completes.
b. Set rcutree.jiffies_till_first_fqs=0.
c. As (b) above, but modifying RCU to use additional
kthreads for the per-CPU grace-period operations.
Make sense?
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists