[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100225213633.GA5936@linux.vnet.ibm.com>
Date: Thu, 25 Feb 2010 13:36:33 -0800
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Ingo Molnar <mingo@...e.hu>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
linux-kernel@...r.kernel.org, laijs@...fujitsu.com,
dipankar@...ibm.com, akpm@...ux-foundation.org,
mathieu.desnoyers@...ymtl.ca, josh@...htriplett.org,
dvhltc@...ibm.com, niv@...ibm.com, tglx@...utronix.de,
peterz@...radead.org, rostedt@...dmis.org, Valdis.Kletnieks@...edu,
dhowells@...hat.com
Subject: Re: [PATCH tip/core/rcu 0/21] v6 add lockdep-based diagnostics to
rcu_dereference()
On Thu, Feb 25, 2010 at 10:18:30AM -0800, Paul E. McKenney wrote:
> On Thu, Feb 25, 2010 at 01:04:44PM +0100, Ingo Molnar wrote:
> >
> > another, different warning is:
> >
> > PM: Adding info for No Bus:vcsa6
> > ------------[ cut here ]------------
> > WARNING: at kernel/softirq.c:143 local_bh_enable_ip+0xba/0xf0()
> > Hardware name: System Product Name
> > Modules linked in:
> > Pid: 0, comm: swapper Not tainted 2.6.33-tip-00730-gacec70d-dirty #18737
> > Call Trace:
> > [<ffffffff8104e0eb>] warn_slowpath_common+0x7b/0xc0
> > [<ffffffff8166cd60>] ? __dst_free+0x60/0xd0
> > [<ffffffff8104e144>] warn_slowpath_null+0x14/0x20
> > [<ffffffff81055c7a>] local_bh_enable_ip+0xba/0xf0
> > [<ffffffff817c16d9>] _raw_spin_unlock_bh+0x19/0x20
> > [<ffffffff8166cd60>] __dst_free+0x60/0xd0
> > [<ffffffff8169de14>] dst_rcu_free+0x34/0x40
> > [<ffffffff8109c80d>] rcu_do_batch+0xcd/0x290
> > [<ffffffff8109ca3e>] __rcu_process_callbacks+0x6e/0xe0
> > [<ffffffff8109cbca>] rcu_needs_cpu+0x11a/0x170
> > [<ffffffff8107c14e>] tick_nohz_stop_sched_tick+0x15e/0x440
> > [<ffffffff81001dc9>] cpu_idle+0x79/0x120
> > [<ffffffff817bb627>] start_secondary+0xa0/0xa2
> > ---[ end trace 155c62ea9b561096 ]---
> >
> > Config attached.
>
> Color me confused!
>
> rcu_needs_cpu() is supposed to be called with irqs disabled, and
> tick_nohz_stop_sched_tick() does in fact disable them with
> local_irq_save() near the beginning of the function. Doing a quick
> inspection, starting at that point in tick_nohz_stop_sched_tick():
>
> o smp_processor_id() does not mess with irq, nor does per_cpu().
>
> o tick_nohz_start_idle() calls a bunch of things.
> sched_clock_cpu() checks for irqs being disabled, but
> only if CONFIG_HAVE_UNSTABLE_SCHED_CLOCK. Which you have
> set. So we know irqs remained disabled at this point.
>
> And I don't see anything re-enabling irqs in the subsequent
> code path in this function.
>
> o need_resched() just checks the TIF_NEED_RESCHED flag.
>
> o Neither local_softirq_pending() and cpu_online() mess
> with irq enabling.
>
> o The code path containing the printk() was apparently not
> taken, as there is no message in your log.
>
> o read_seqbegin() and read_seqretry() leave irqs alone, as
> does timekeeping_max_deferment().
>
> And that puts us at the call to rcu_needs_cpu(). You have the
> new CONFIG_RCU_FAST_NO_HZ config variable set, and are not running
> preemptible RCU, so we are in the one at line 993 of rcutree_plugin.c.
> The fact that __rcu_process_callbacks() is on the stack means that all
> other CPUs were in dyntick-idle mode, so we went through the loop.
>
> o rcu_sched_qs() doesn't mess with irqs.
>
> o force_quiescent_state() does mess with irqs, but puts them
> back the way it found them.
>
> o Ditto for __rcu_process_callbacks().
>
> So I am reduced to putting together a diagnostic patch for you. :-/
-EICANTREAD
Commit 8bd93a2c ("Accelerate grace period...") is busted. I will work
out how to fix it.
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists