linux-kernel - Re: RCU lockup in the SMP idle thread, help...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120913165844.GW4257@linux.vnet.ibm.com>
Date:	Thu, 13 Sep 2012 09:58:44 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	John Stultz <john.stultz@...aro.org>
Cc:	Linus Walleij <linus.walleij@...aro.org>,
	Daniel Lezcano <daniel.lezcano@...aro.org>,
	linux-kernel@...r.kernel.org
Subject: Re: RCU lockup in the SMP idle thread, help...

On Thu, Sep 13, 2012 at 09:49:14AM -0700, John Stultz wrote:
> On 09/13/2012 05:36 AM, Linus Walleij wrote:
> >Hi Paul et al,
> >
> >I have this sporadic lockup in the SMP idle thread on ARM U8500:
> >
> >root@ME:/
> >root@ME:/
> >root@ME:/ INFO: rcu_preempt detected stalls on CPUs/tasks: { 0}
> >(detected by 1, t=23190 jiffies)
> >[<c0014710>] (unwind_backtrace+0x0/0xf8) from [<c0068624>]
> >(rcu_check_callbacks+0x69c/0x6e0)
> >[<c0068624>] (rcu_check_callbacks+0x69c/0x6e0) from [<c0029cbc>]
> >(update_process_times+0x38/0x4c)
> >[<c0029cbc>] (update_process_times+0x38/0x4c) from [<c0055088>]
> >(tick_sched_timer+0x80/0xe4)
> >[<c0055088>] (tick_sched_timer+0x80/0xe4) from [<c003c120>]
> >(__run_hrtimer.isra.18+0x44/0xd0)
> >[<c003c120>] (__run_hrtimer.isra.18+0x44/0xd0) from [<c003cae0>]
> >(hrtimer_interrupt+0x118/0x2b4)
> >[<c003cae0>] (hrtimer_interrupt+0x118/0x2b4) from [<c0013658>]
> >(twd_handler+0x30/0x44)
> >[<c0013658>] (twd_handler+0x30/0x44) from [<c0063834>]
> >(handle_percpu_devid_irq+0x80/0xa0)
> >[<c0063834>] (handle_percpu_devid_irq+0x80/0xa0) from [<c00601ec>]
> >(generic_handle_irq+0x2c/0x40)
> >[<c00601ec>] (generic_handle_irq+0x2c/0x40) from [<c000ef58>]
> >(handle_IRQ+0x4c/0xac)
> >[<c000ef58>] (handle_IRQ+0x4c/0xac) from [<c00084bc>] (gic_handle_irq+0x24/0x58)
> >[<c00084bc>] (gic_handle_irq+0x24/0x58) from [<c000dc80>] (__irq_svc+0x40/0x70)
> >Exception stack(0xcf851f88 to 0xcf851fd0)
> >1f80:                   00000020 c05d5920 00000001 00000000 cf850000 cf850000
> >1fa0: c05f4d48 c02de0b4 c05d8d90 412fc091 cf850000 00000000 01000000 cf851fd0
> >1fc0: c000f234 c000f238 60000013 ffffffff
> >[<c000dc80>] (__irq_svc+0x40/0x70) from [<c000f238>] (default_idle+0x28/0x30)
> >[<c000f238>] (default_idle+0x28/0x30) from [<c000f438>] (cpu_idle+0x98/0xe4)
> >[<c000f438>] (cpu_idle+0x98/0xe4) from [<002d2ef4>] (0x2d2ef4)
> >
> >The hangup has been there in the v3.6-rc series for a while (probably
> >since the merge window).
> >
> >I haven't been able to bisect out why this is happening, because the bug
> >is pretty hazardous to check - you have to boot the system and leave it alone
> >or use it sporadically for a while. Then all of a sudden it happens.
> >
> >So: reproducible, but not deterministically reproducible (I hate this kind
> >of thing...)
> >
> >The code involved seems to be generic kernel code apart from the
> >ARM GIC and TWD timer drivers.
> >
> >Any hints or debug options I should switch on?
> 
> I saw this once as well testing the fix to Daniel's deep idle hang
> issue (also on 32 bit).
> 
> Really briefly looking at the code in rcutree.c, I'm curious if
> we're hitting a false positive on the 5 minute jiffies overflow?

Hmmm...  Might be.  Does the patch below help?

							Thanx, Paul

------------------------------------------------------------------------

rcu: Avoid spurious RCU CPU stall warnings

If a given CPU avoids the idle loop but also avoids starting a new
RCU grace period for a full minute, RCU can issue spurious RCU CPU
stall warnings.  This commit fixes this issue by adding a check for
ongoing grace period to avoid these spurious stall warnings.

Reported-by: Becky Bruce <bgillbruce@...il.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@...aro.org>
Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@...htriplett.org>

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 3d63d1c..aea3157 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -819,7 +819,8 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
 	j = ACCESS_ONCE(jiffies);
 	js = ACCESS_ONCE(rsp->jiffies_stall);
 	rnp = rdp->mynode;
-	if ((ACCESS_ONCE(rnp->qsmask) & rdp->grpmask) && ULONG_CMP_GE(j, js)) {
+	if (rcu_gp_in_progress(rsp) &&
+	    (ACCESS_ONCE(rnp->qsmask) & rdp->grpmask) && ULONG_CMP_GE(j, js)) {
 
 		/* We haven't checked in, so go dump stack. */
 		print_cpu_stall(rsp);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/