linux-kernel - Re: [PATCH 05/32] nohz: Move rcu dynticks idle mode handling to idle enter/exit APIs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 31 Aug 2011 15:37:58 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Anton Blanchard <anton@....ibm.com>,
	Avi Kivity <avi@...hat.com>, Ingo Molnar <mingo@...e.hu>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	"Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
	Stephen Hemminger <shemminger@...tta.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Tim Pepper <lnxninja@...ux.vnet.ibm.com>,
	Paul Menage <paul@...lmenage.org>
Subject: Re: [PATCH 05/32] nohz: Move rcu dynticks idle mode handling to idle
 enter/exit APIs

On Wed, Aug 31, 2011 at 11:17:25AM +0200, Peter Zijlstra wrote:
> On Wed, 2011-08-31 at 00:24 +0200, Frederic Weisbecker wrote:
> > On Tue, Aug 30, 2011 at 10:58:38PM +0200, Peter Zijlstra wrote:
> > > On Tue, 2011-08-30 at 17:42 +0200, Peter Zijlstra wrote:
> > > > On Tue, 2011-08-30 at 17:33 +0200, Frederic Weisbecker wrote:
> > > > > > See all that is still kernelspace ;-) I think I know what you mean to
> > > > > > say though, but seeing as you note there is even now a known shortcoming
> > > > > > I'm not very confident its a solid construction. What will help us find
> > > > > > such holes?
> > > > > 
> > > > > This: https://lkml.org/lkml/2011/6/23/744
> > > > > 
> > > > > It's in one of Paul's branches and should make it for the next merge window.
> > > > > This should detect any of such holes. I made that on purpose for the nohz cpusets
> > > > > when I saw how much error prone that can be with rcu :)
> > > > 
> > > > OK, good ;-)
> > > > 
> > > > > > I would much rather we not rely on such fragile things too much.. this
> > > > > > RCU stuff wants way more thought, as it stands your patch-set doesn't do
> > > > > > anything useful IMO.
> > > > > 
> > > > > Not sure what you mean. Well that Rcu thing for sure is fragile but we have
> > > > > the tools ready to find the problems. 
> > > > 
> > > > Right that thing you linked above does catch abuse, still your current
> > > > proposal means that due to RCU it will basically never disable the tick.
> > > 
> > > So how about something like:
> > > 
> > > Assuming we are in rcu_nohz state; on kernel enter we leave rcu_nohz but
> > > don't start the tick, instead we assign another cpu to run our state
> > > machine.
> > 
> > The nohz CPU still has to notice its own quiescent states. 
> 
> Why? rcu-sched can use a context-switch counter, rcu-preempt doesn't
> even need that. Remote cpus can notice those just fine.

If that's fine to only rely on context switches, which don't happen in
a bounded time in theory, then ok.

Would be nice to hear about Paul's opinion on that.
 
> > Now it could be
> > an optimization to ask another CPU to handle all the rest once that quiescent
> > state is found. That doesn't solve our main problem though which is to
> > reliably report quiescent states when asked for.
> 
> No, seriously, RCU should not, ever, need to re-enable the tick. Imagine
> a HPC workload where the system cores are also responsible for all IO
> and all the adaptive-nohz cores are simply crunching numbers. In that
> scenario you'll have a very high rcu usage because the system cores are
> all very busy arranging work for the computation cores.

Of course if we find a better way than having to restart this tick I'm
all for doing that way.

That said if it requires some significant changes this should be done
outside this patchset, as an optimization afterward may be, the patchset
is already big while still missing very important features for now that
the timer handles.

> > > On kernel exit we 'donate' all our rcu state to a willing victim (the
> > > same that earlier was kind enough to drive our state) and undo our
> > > entire GP accounting and re-enter rcu_nohz state.
> > 
> > That's already what does rcu_enter_nohz().
> 
> Almost but not quite, it doesn't donate the callbacks for example
> (something it does do on hotplug -- and therefore any assumption the
> callback will in fact run on the cpu you submit it on is already
> broken).

Good to know, so that would avoid to restart the tick on call_rcu() ?
Sounds good but again I think this should be done later.

> 
> > > If between that time we did restart the tick, we take back our rcu state
> > > and skip the donate and rcu_nohz enter on kernel exit.
> > 
> > That's also what is done in this patchset. 
> 
> Its not, since you don't hand of the grace period detectoring you don't
> take it back now do you..

So you are talking about grace period started locally due to local
callbacks enqueued, right?


> > As soon as we re-enter the kernel
> > or the tick had to be restarted before we re-enter the kernel,
> 
> Another impossibility, you can only restart the tick from the kernel.

Ok I meant it can be restarted from an interrupt interrupting userspace.
I was talking about kernel enter/exit considering the new hooks brought
(syscalls and exceptions).

> >  we call
> > rcu_exit_nohz() that pulls back the CPU to the whole RCU machinery.
> 
> But you then also start the tick again..

When we enter kernel? (minus interrupts)
No we only call rcu_exit_nohz().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/