[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1314717753.5812.7.camel@twins>
Date: Tue, 30 Aug 2011 17:22:33 +0200
From: Peter Zijlstra <a.p.zijlstra@...llo.nl>
To: Frederic Weisbecker <fweisbec@...il.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Anton Blanchard <anton@....ibm.com>,
Avi Kivity <avi@...hat.com>, Ingo Molnar <mingo@...e.hu>,
Lai Jiangshan <laijs@...fujitsu.com>,
"Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
Paul Menage <menage@...gle.com>,
Stephen Hemminger <shemminger@...tta.com>,
Thomas Gleixner <tglx@...utronix.de>,
Tim Pepper <lnxninja@...ux.vnet.ibm.com>
Subject: Re: [PATCH 05/32] nohz: Move rcu dynticks idle mode handling to
idle enter/exit APIs
On Tue, 2011-08-30 at 16:26 +0200, Frederic Weisbecker wrote:
> On Tue, Aug 30, 2011 at 01:19:18PM +0200, Peter Zijlstra wrote:
> > On Tue, 2011-08-30 at 01:35 +0200, Frederic Weisbecker wrote:
> > >
> > > OTOH it is needed to find non-critical sections when asked to cooperate
> > > in a grace period completion. But if no callback have been enqueued on
> > > the whole system we are fine.
> >
> > Its that 'whole system' clause that I have a problem with. It would be
> > perfectly fine to have a number of cpus very busy generating rcu
> > callbacks, however this should not mean our adaptive nohz cpu should be
> > bothered to complete grace periods.
> >
> > Requiring it to participate in the grace period state machine is a fail,
> > plain and simple.
>
> We need those nohz CPUs to participate because they may use read side
> critical section themselves. So we need them to delay running grace period
> until the end of their running rcu read side critical sections, like any
> other CPUs. Otherwise their supposed rcu read side critical section wouldn't
> be effective.
>
> Either that or we need to only stop the tick when we are in userspace.
> I'm not sure it would be a good idea.
Well the simple fact is that rcu, when considered system-wide, is pretty
much always busy, voiding any and all benefit you might want to gain.
> We discussed this problem, I believe the problem mostly resides in rcu sched.
> Because finding quiescent states for rcu bh is easy, but rcu sched needs
> the tick or context switches. (For rcu preempt I have no idea.)
> So for now that's the sanest way we found amongst:
>
> - Having explicit hooks in preempt_disable() and local_irq_restore()
> to notice end of rcu sched critical section. So that we don't need the tick
> anymore to find quiescent states. But that's going to be costly. And we may
> miss some more implicitly non-preemptable code path.
>
> - Rely on context switches only. I believe in practice it should be fine.
> But in theory this delays the grace period completion for an unbounded
> amount of time.
Right, so what we can do is keep a per-cpu context switch counter (I'm
sure we have one someplace and we already have the
rcu_note_context_switch() callback in case we need another) and have
another cpu (outside of our extended nohz domain) drive our state
machine.
But I'm sure Paul can say more sensible things than me here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists