linux-kernel - Re: [PATCH 05/32] nohz: Move rcu dynticks idle mode handling to idle enter/exit APIs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1314897180.1485.12.camel@twins>
Date:	Thu, 01 Sep 2011 19:13:00 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	paulmck@...ux.vnet.ibm.com
Cc:	Frederic Weisbecker <fweisbec@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Anton Blanchard <anton@....ibm.com>,
	Avi Kivity <avi@...hat.com>, Ingo Molnar <mingo@...e.hu>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Stephen Hemminger <shemminger@...tta.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Tim Pepper <lnxninja@...ux.vnet.ibm.com>,
	Paul Menage <paul@...lmenage.org>
Subject: Re: [PATCH 05/32] nohz: Move rcu dynticks idle mode handling to
 idle enter/exit APIs

On Thu, 2011-09-01 at 09:40 -0700, Paul E. McKenney wrote:
> On Wed, Aug 31, 2011 at 04:41:00PM +0200, Peter Zijlstra wrote:
> > On Wed, 2011-08-31 at 15:37 +0200, Frederic Weisbecker wrote:
> > > > Why? rcu-sched can use a context-switch counter, rcu-preempt doesn't
> > > > even need that. Remote cpus can notice those just fine.
> > > 
> > > If that's fine to only rely on context switches, which don't happen in
> > > a bounded time in theory, then ok.
> > 
> > But (!PREEMPT) rcu already depends on that, and suffers this lack of
> > time-bounds. What it does to expedite matters is force context switches,
> > but nowhere is it written the GP is bounded by anything sane.
> 
> Ah, but it really is written, among other things, by the OOM killer.  ;-)

Well there is that of course :-) But I think the below argument relies
on what we already have without requiring more.

> > > > But you then also start the tick again..
> > > 
> > > When we enter kernel? (minus interrupts)
> > > No we only call rcu_exit_nohz(). 
> > 
> > So thinking more about all this:
> > 
> > rcu_exit_nohz() will make remote cpus wait for us, this is exactly what
> > is needed because we might have looked at pointers. Lacking a tick we
> > don't progress our own state but that is fine, !PREEMPT RCU wouldn't
> > have been able to progress our state anyway since we haven't scheduled
> > (there's nothing to schedule to except idle, see below).
> 
> Lacking a tick, the CPU also fails to respond to state updates from
> other CPUs.

I'm sure I'll have to go re-read your documents, but does that matter?
If we would have had a tick we still couldn't have progressed since we
wouldn't have scheduled etc.. so we would hold up GP completion any way.

> > Then when we leave the kernel (or go idle) we re-enter rcu_nohz state,
> > and the other cpus will ignore our contribution (since we have entered a
> > QS and can't be holding any pointers) the other CPUs can continue and
> > complete the GP and run the callbacks.
> 
> This is true.

So suppose all other CPUs completed the GP and our CPU is the one
holding things up, now I don't see rcu_enter_nohz() doing anything much
at all, who is responsible for GP completion?

> > I haven't fully considered PREEMPT RCU quite yet, but I'm thinking we
> > can get away with something similar.
> 
> All the ways I know of to make PREEMPT_RCU live without a scheduling
> clock tick while not in some form of dyntick-idle mode require either
> IPIs or read-side memory barriers.  The special case where all CPUs
> are in dyntick-idle mode and something needs to happen also needs to
> be handled correctly.
> 
> Or are you saying that PREEMPT_RCU does not need a CPU to take
> scheduling-clock interrupts while that CPU is in dyntick-idle mode?
> That is true enough.

I'm not saying anything much about PREEMPT_RCU, I voiced an
ill-considered suspicion :-)

So in the nr_running=[0,1] case we're in rcu_nohz state when idle or
when in userspace. The only interesting part is being in kernel space
where we cannot be in rcu_nohz state because we might actually use
pointers and thus have to stop callbacks from destroying state etc..

The only PREEMPT_RCU implementation I can recall is the counting one,
and that one does indeed want a tick, because even in kernel space it
could move things forward if the 'old' index counter reaches 0.

Now we could possibly add magic to rcu_read_unlock_special() to restart
the tick in that case.

Now clearly all that might be non-applicable to the current one, will
have to wrap my head around the current PREEMPT_RCU implementation some
more.

> > So per the above we don't need the tick at all (for the case of
> > nr_running=[0,1]), RCU will sort itself out.
> > 
> > Now I forgot where all you send IPIs from, and I'll go look at these
> > patches once more.
> > 
> > As for call_rcu() for that we can indeed wake the tick (on leaving
> > kernel space or entering idle, no need to IPI since we can't process
> > anything before that anyway) or we could hand off our call list to a
> > 'willing' victim.
> > 
> > But yeah, input from Paul would be nice...
> 
> In the call_rcu() case, I do have some code in preparation that allows
> CPUs to have non-empty callback queues and still be tickless.  There
> are some tricky corner cases, but it does look possible.  (Famous last
> words...)

Hand your callback to someone else is one solution, but I'm not overly
worried about re-starting the tick if we do call_rcu().

> The reason for doing this is that people are enabling
> CONFIG_RCU_FAST_NO_HZ on systems that have no business enabling it.
> Bad choice of names on my part.

hehe :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/