linux-kernel - Re: [PATCH 05/32] nohz: Move rcu dynticks idle mode handling to idle enter/exit APIs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110830184556.GA15953@somewhere.redhat.com>
Date:	Tue, 30 Aug 2011 20:45:59 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Anton Blanchard <anton@....ibm.com>,
	Avi Kivity <avi@...hat.com>, Ingo Molnar <mingo@...e.hu>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	"Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
	Paul Menage <menage@...gle.com>,
	Stephen Hemminger <shemminger@...tta.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Tim Pepper <lnxninja@...ux.vnet.ibm.com>
Subject: Re: [PATCH 05/32] nohz: Move rcu dynticks idle mode handling to idle
 enter/exit APIs

On Tue, Aug 30, 2011 at 05:22:33PM +0200, Peter Zijlstra wrote:
> On Tue, 2011-08-30 at 16:26 +0200, Frederic Weisbecker wrote:
> > On Tue, Aug 30, 2011 at 01:19:18PM +0200, Peter Zijlstra wrote:
> > > On Tue, 2011-08-30 at 01:35 +0200, Frederic Weisbecker wrote:
> > > > 
> > > > OTOH it is needed to find non-critical sections when asked to cooperate
> > > > in a grace period completion. But if no callback have been enqueued on
> > > > the whole system we are fine. 
> > > 
> > > Its that 'whole system' clause that I have a problem with. It would be
> > > perfectly fine to have a number of cpus very busy generating rcu
> > > callbacks, however this should not mean our adaptive nohz cpu should be
> > > bothered to complete grace periods.
> > > 
> > > Requiring it to participate in the grace period state machine is a fail,
> > > plain and simple.
> > 
> > We need those nohz CPUs to participate because they may use read side
> > critical section themselves. So we need them to delay running grace period
> > until the end of their running rcu read side critical sections, like any
> > other CPUs. Otherwise their supposed rcu read side critical section wouldn't
> > be effective.
> > 
> > Either that or we need to only stop the tick when we are in userspace.
> > I'm not sure it would be a good idea.
> 
> Well the simple fact is that rcu, when considered system-wide, is pretty
> much always busy, voiding any and all benefit you might want to gain.

With my testcase, a stupid userspace loop on a single CPU among 4, I actually
see only few RCU activity. Especially as any other CPU is pretty much idle.
There are some cases where it's not so pointless.

> > We discussed this problem, I believe the problem mostly resides in rcu sched.
> > Because finding quiescent states for rcu bh is easy, but rcu sched needs
> > the tick or context switches. (For rcu preempt I have no idea.)
> > So for now that's the sanest way we found amongst:
> > 
> > - Having explicit hooks in preempt_disable() and local_irq_restore()
> > to notice end of rcu sched critical section. So that we don't need the tick
> > anymore to find quiescent states. But that's going to be costly. And we may
> > miss some more implicitly non-preemptable code path.
> > 
> > - Rely on context switches only. I believe in practice it should be fine.
> > But in theory this delays the grace period completion for an unbounded
> > amount of time.
> 
> Right, so what we can do is keep a per-cpu context switch counter (I'm
> sure we have one someplace and we already have the
> rcu_note_context_switch() callback in case we need another) and have
> another cpu (outside of our extended nohz domain) drive our state
> machine.
> 
> But I'm sure Paul can say more sensible things than me here.

Yeah I hope we can find some solution to minimize these IPIs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/