lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110929044909.GA5432@linux.vnet.ibm.com>
Date:	Wed, 28 Sep 2011 21:49:09 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Frederic Weisbecker <fweisbec@...il.com>
Cc:	"Kirill A. Shutemov" <kirill@...temov.name>,
	linux-kernel@...r.kernel.org, Dipankar Sarma <dipankar@...ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Lai Jiangshan <laijs@...fujitsu.com>
Subject: Re: linux-next-20110923: warning kernel/rcutree.c:1833

On Wed, Sep 28, 2011 at 05:55:45PM -0700, Paul E. McKenney wrote:
> On Thu, Sep 29, 2011 at 01:46:36AM +0200, Frederic Weisbecker wrote:
> > On Wed, Sep 28, 2011 at 11:40:25AM -0700, Paul E. McKenney wrote:
> > > On Wed, Sep 28, 2011 at 02:31:21PM +0200, Frederic Weisbecker wrote:
> > > > On Tue, Sep 27, 2011 at 11:01:42AM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Sep 27, 2011 at 02:16:50PM +0200, Frederic Weisbecker wrote:
> > > 
> > > [ . . . ]
> > > 
> > > > > > > But all of this stuff looks to me to be called from the context
> > > > > > > of the idle task, so that idle_cpu() will always return "true"...
> > > > > > 
> > > > > > I meant "idle_cpu() && !in_interrupt()" that should return false in
> > > > > > rcu_read_lock_sched_held().
> > > > > 
> > > > > The problem is that the idle tasks now seem to make quite a bit of use
> > > > > of RCU on entry to and exit from the idle loop itself, for example,
> > > > > via tracing.  So it seems like it is time to have the idle loop
> > > > > explictly tell RCU when the idle extended quiescent state is in effect.
> > > > > 
> > > > > An experimental patch along these lines is included below.  Does this
> > > > > approach seem reasonable, or am I missing something subtle (or even
> > > > > not so subtle) here?
> > > > > 
> > > > > 							Thanx, Paul
> > > > > 
> > > > > ------------------------------------------------------------------------
> > > > > 
> > > > > rcu: Explicitly track idle CPUs.
> > > > > 
> > > > > In the good old days, RCU simply checked to see if it was running in
> > > > > the context of an idle task to determine whether or not it was in the
> > > > > idle extended quiescent state.  However, the entry to and exit from
> > > > > idle has become more ornate over the years, and some of this processing
> > > > > now uses RCU while running in the context of the idle task.  It is
> > > > > therefore no longer reasonable to assume that anything running in the
> > > > > context of one of the idle tasks is in an extended quiscent state.
> > > > > 
> > > > > This commit therefore explicitly tracks whether each CPU is in the
> > > > > idle loop, allowing the idle task to use RCU anywhere except in those
> > > > > portions of the idle loops where RCU has been explicitly informed that
> > > > > it is in a quiescent state.
> > > > > 
> > > > > Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> > > > 
> > > > I fear we indeed need that now.
> > > 
> > > And we probably need to factor this patch stack.  Given the number
> > > of warnings and errors due to RCU's confusion about what means "idle",
> > > we simply are not bisectable as is.
> > 
> > Not sure what you mean. You want to split that specific patch or
> > others?
> 
> It looks to me that having my pair of patches on top of yours is
> really ugly.  If we are going to introduce the per-CPU idle variable,
> we should make a patch stack that uses that from the start.  This allows
> me to bisect to track down the failures I am seeing on Power.

And I should hasten to add that I am not blaming you for these problems.
You have been finding at least as many problems in other code than in
your own, in some cases making other problems easier to reproduce.  But
either way, they need to be fixed to make the upcoming merge window.

							Thanx, Paul

> If you are too busy, I can take this on, but we might get better results
> if you did it.  (And I certainly cannot complain about the large amount
> of time and energy that you have put into this -- plus the reduction in
> OS jitter will be really cool to have!)
> 
> > > > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > > > index 375e7d8..cd9e2d1 100644
> > > > > --- a/include/linux/tick.h
> > > > > +++ b/include/linux/tick.h
> > > > > @@ -131,8 +131,16 @@ extern ktime_t tick_nohz_get_sleep_length(void);
> > > > >  extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> > > > >  extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> > > > >  # else
> > > > > -static inline void tick_nohz_idle_enter(bool rcu_ext_qs) { }
> > > > > -static inline void tick_nohz_idle_exit(void) { }
> > > > > +static inline void tick_nohz_idle_enter(bool rcu_ext_qs)
> > > > > +{
> > > > > +	if (rcu_ext_qs())
> > > > > +		rcu_idle_enter();
> > > > > +}
> > > > 
> > > > rcu_ext_qs is not a function.
> > > 
> > > Ooooh...  Good catch.  Would you believe that gcc didn't complain?
> > > Or maybe my scripts are missing some gcc complaints.  But I would
> > > expect the following to catch them:
> > > 
> > > 	egrep -q "Stop|Error|error:|warning:|improperly set"
> > > 
> > > Anything I am missing?
> > 
> > No idea :)
> > 
> > > > Although idle and rcu/nohz are still close notions, it sounds
> > > > more logical the other way around in the ordering:
> > > > 
> > > > tick_nohz_idle_enter() {
> > > > 	rcu_idle_enter() {
> > > > 		rcu_enter_nohz();
> > > > 	}
> > > > }
> > > > 
> > > > tick_nohz_irq_exit() {
> > > >         rcu_idle_enter() {
> > > >                 rcu_enter_nohz();
> > > >         }
> > > > }
> > > > 
> > > > Because rcu ext qs is something used by idle, not the opposite.
> 
> Re-reading this makes me realize that I would instead say that idle
> is an example of an RCU extended quiescent state, or that the rcu_ext_qs
> argument to the various functions is used to indicate whether or not
> we are immediately entering/leaving idle from RCU's viewpoint.
> 
> So what were you really trying to say here?  ;-)
> 
> > > The problem I have with this is that it is rcu_enter_nohz() that tracks
> > > the irq nesting required to correctly decide whether or not we are going
> > > to really go to idle state.  Furthermore, there are cases where we
> > > do enter idle but do not enter nohz, and that has to be handled correctly
> > > as well.
> > > 
> > > Now, it is quite possible that I am suffering a senior moment and just
> > > failing to see how to structure this in the design where rcu_idle_enter()
> > > invokes rcu_enter_nohz(), but regardless, I am failing to see how to
> > > structure this so that it works correctly.
> > > 
> > > Please feel free to enlighten me!
> > 
> > Ah I realize that you want to call rcu_idle_exit() when we enter
> > the first level interrupt and rcu_idle_enter() when we exit it
> > to return to idle loop.
> > 
> > But we use that check:
> > 
> > 	if (user ||
> > 	    (rcu_is_cpu_idle() &&
> >  	     !in_softirq() &&
> >  	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> >  		rcu_sched_qs(cpu);
> > 
> > So we ensure that by the time we call rcu_check_callbacks(), we are not nesting
> > in another interrupt.
> 
> But I would like to enable checks for entering/exiting idle while
> within an RCU read-side critical section.  The idea is to move
> the checks from their currently somewhat problematic location in
> rcu_needs_cpu_quick_check() to somewhere more sensible.  My current
> thought is to move them rcu_enter_nohz() and rcu_exit_nohz() near the
> calls to rcu_idle_enter() and rcu_idle_exit(), respectively.
> 
> This would mean that they operated only in NO_HZ kernels with lockdep
> enabled, but I am good with that because to do otherwise would require
> adding nesting-level counters to the non-NO_HZ case, which I would like
> to avoid, expecially for TINY_RCU.
> 
> > That said we found RCU uses after we decrement the hardirq offset and until
> > we reach rcu_irq_exit(). So rcu_check_callbacks() may miss these places
> > and account spurious quiescent states.
> > 
> > But between sub_preempt_count() and rcu_irq_exit(), irqs are disabled
> > AFAIK so we can't be interrupted by rcu_check_callbacks(), except during the
> > softirqs processing. But we have that ordering:
> > 
> > add_preempt_count(SOTFIRQ_OFFSET)
> > local_irq_enable()
> > 
> > do softirqs
> > 
> > local_irq_disable()
> > sub_preempt_count(SOTFIRQ_OFFSET)
> > 
> > So the !in_softirq() check covers us during the time we process softirqs.
> > 
> > The only assumption we need is that there is no place between
> > sub_preempt_count(IRQ_EXIT_OFFSET) and rcu_irq_ext() that has
> > irqs enabled and that is an rcu read side critical section.
> > 
> > I'm not aware of any automatic check to ensure that though.
> 
> Nor am I, which is why I am looking to the checks in
> rcu_enter_nohz() and rcu_exit_nohz() called out above.
> 
> > Anyway, the delta patch looks good.
> 
> OK, my current plans are to start forward-porting to -rc8, and I would
> like to have this pair of delta patches or something like them pulled
> into your stack.
> 
> >                                     Just a little thing:
> > 
> > > -void tick_nohz_idle_exit(void)
> > > +void tick_nohz_idle_exit(bool rcu_ext_qs)
> > 
> > It becomes weird to have both idle_enter/idle_exit having
> > that parameter.
> > 
> > Would it make sense to have tick_nohz_idle_[exit|enter]_norcu()
> > and a version without norcu?
> 
> Given that we need to make this work in CONFIG_NO_HZ=n kernels, I believe
> that the current API is OK.  But if you would like to change the API
> during the forward-port to -rc8, I am also OK with the alternative API
> you suggest.
> 
> 							Thanx, Paul
> 
> > >  {
> > >  	int cpu = smp_processor_id();
> > >  	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> > > @@ -559,7 +559,7 @@ void tick_nohz_idle_exit(void)
> > >  
> > >  	ts->inidle = 0;
> > >  
> > > -	if (ts->rcu_ext_qs) {
> > > +	if (rcu_ext_qs) {
> > >  		rcu_exit_nohz();
> > >  		ts->rcu_ext_qs = 0;
> > >  	}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ