[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110830184556.GA15953@somewhere.redhat.com>
Date: Tue, 30 Aug 2011 20:45:59 +0200
From: Frederic Weisbecker <fweisbec@...il.com>
To: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc: LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Anton Blanchard <anton@....ibm.com>,
Avi Kivity <avi@...hat.com>, Ingo Molnar <mingo@...e.hu>,
Lai Jiangshan <laijs@...fujitsu.com>,
"Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
Paul Menage <menage@...gle.com>,
Stephen Hemminger <shemminger@...tta.com>,
Thomas Gleixner <tglx@...utronix.de>,
Tim Pepper <lnxninja@...ux.vnet.ibm.com>
Subject: Re: [PATCH 05/32] nohz: Move rcu dynticks idle mode handling to idle
enter/exit APIs
On Tue, Aug 30, 2011 at 05:22:33PM +0200, Peter Zijlstra wrote:
> On Tue, 2011-08-30 at 16:26 +0200, Frederic Weisbecker wrote:
> > On Tue, Aug 30, 2011 at 01:19:18PM +0200, Peter Zijlstra wrote:
> > > On Tue, 2011-08-30 at 01:35 +0200, Frederic Weisbecker wrote:
> > > >
> > > > OTOH it is needed to find non-critical sections when asked to cooperate
> > > > in a grace period completion. But if no callback have been enqueued on
> > > > the whole system we are fine.
> > >
> > > Its that 'whole system' clause that I have a problem with. It would be
> > > perfectly fine to have a number of cpus very busy generating rcu
> > > callbacks, however this should not mean our adaptive nohz cpu should be
> > > bothered to complete grace periods.
> > >
> > > Requiring it to participate in the grace period state machine is a fail,
> > > plain and simple.
> >
> > We need those nohz CPUs to participate because they may use read side
> > critical section themselves. So we need them to delay running grace period
> > until the end of their running rcu read side critical sections, like any
> > other CPUs. Otherwise their supposed rcu read side critical section wouldn't
> > be effective.
> >
> > Either that or we need to only stop the tick when we are in userspace.
> > I'm not sure it would be a good idea.
>
> Well the simple fact is that rcu, when considered system-wide, is pretty
> much always busy, voiding any and all benefit you might want to gain.
With my testcase, a stupid userspace loop on a single CPU among 4, I actually
see only few RCU activity. Especially as any other CPU is pretty much idle.
There are some cases where it's not so pointless.
> > We discussed this problem, I believe the problem mostly resides in rcu sched.
> > Because finding quiescent states for rcu bh is easy, but rcu sched needs
> > the tick or context switches. (For rcu preempt I have no idea.)
> > So for now that's the sanest way we found amongst:
> >
> > - Having explicit hooks in preempt_disable() and local_irq_restore()
> > to notice end of rcu sched critical section. So that we don't need the tick
> > anymore to find quiescent states. But that's going to be costly. And we may
> > miss some more implicitly non-preemptable code path.
> >
> > - Rely on context switches only. I believe in practice it should be fine.
> > But in theory this delays the grace period completion for an unbounded
> > amount of time.
>
> Right, so what we can do is keep a per-cpu context switch counter (I'm
> sure we have one someplace and we already have the
> rcu_note_context_switch() callback in case we need another) and have
> another cpu (outside of our extended nohz domain) drive our state
> machine.
>
> But I'm sure Paul can say more sensible things than me here.
Yeah I hope we can find some solution to minimize these IPIs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists