lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 21 Sep 2012 17:05:37 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Paul Walmsley <paul@...an.com>
Cc:	"Bruce, Becky" <bbruce@...com>,
	"Paul E. McKenney" <paul.mckenney@...aro.org>,
	"<linux-kernel@...r.kernel.org>" <linux-kernel@...r.kernel.org>,
	"<linux-omap@...r.kernel.org>" <linux-omap@...r.kernel.org>,
	"<linux-arm-kernel@...ts.infradead.org>" 
	<linux-arm-kernel@...ts.infradead.org>,
	"Hilman, Kevin" <khilman@...com>,
	"Shilimkar, Santosh" <santosh.shilimkar@...com>,
	"Hunter, Jon" <jon-hunter@...com>,
	"<snijsure@...d-net.com>" <snijsure@...d-net.com>
Subject: Re: rcu self-detected stall messages on OMAP3, 4 boards

On Fri, Sep 21, 2012 at 10:41:14PM +0000, Paul Walmsley wrote:
> On Fri, 21 Sep 2012, Paul E. McKenney wrote:
> 
> > On Fri, Sep 21, 2012 at 05:47:31PM +0000, Paul Walmsley wrote:
> >
> > > I built an OMAP kernel from Linus' commit 
> > > 4651afbbae968772efd6dc4ba461cba9b49bb9d8 ("Merge branch 'for-3.6-fixes' of 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq").  The config used 
> > > was 'omap2plus_defconfig', and enabled CONFIG_CPU_IDLE by hand.  Booted it 
> > > on a Pandaboard (OMAP4430ES2) into a very minimal Debian rootfs.
> > 
> > Did you have the patch at https://lkml.org/lkml/2012/8/30/290 applied?
> 
> No, it's just as described above.
> 
> > If not, could you please try it?  (This patch cleared up a similar
> > problem for Becky, also on OMAP.)
> 
> Did not seem to help, either with or without CONFIG_CPU_IDLE.

I was hoping!  ;-)

And my init=/bin/sh kernel ran idle for more than an hour without
any RCU CPU stall warnings...

I am wondering if your system somehow figured out how to start a grace
period that had no RCU callbacks waiting for it.  If that happened,
then a CONFIG_NO_HZ=y system could in theory get into a state where all
CPUs are in dyntick-idle mode, so that none of them is doing anything
to force the grace period to complete.

That should be easy to diagnose, anyway.  Please see below, which
includes the earlier diagnostic patch.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 307caf1..696f189 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -879,6 +879,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
 	unsigned long flags;
 	int ndetected = 0;
 	struct rcu_node *rnp = rcu_get_root(rsp);
+	long totqlen = 0;
 
 	/* Only let one CPU complain about others per time interval. */
 
@@ -923,8 +924,11 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 
 	print_cpu_stall_info_end();
-	printk(KERN_CONT "(detected by %d, t=%ld jiffies)\n",
-	       smp_processor_id(), (long)(jiffies - rsp->gp_start));
+	for_each_possible_cpu(cpu)
+		totqlen += per_cpu_ptr(rsp->rda, cpu)->qlen;
+	pr_cont("(detected by %d, t=%ld jiffies, g=%lu, c=%lu, q=%lu)\n",
+	       smp_processor_id(), (long)(jiffies - rsp->gp_start),
+	       rsp->gpnum, rsp->completed, totqlen);
 	if (ndetected == 0)
 		printk(KERN_ERR "INFO: Stall ended before state dump start\n");
 	else if (!trigger_all_cpu_backtrace())
@@ -939,8 +943,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
 
 static void print_cpu_stall(struct rcu_state *rsp)
 {
+	int cpu;
 	unsigned long flags;
 	struct rcu_node *rnp = rcu_get_root(rsp);
+	long totqlen = 0;
 
 	/*
 	 * OK, time to rat on ourselves...
@@ -951,7 +957,10 @@ static void print_cpu_stall(struct rcu_state *rsp)
 	print_cpu_stall_info_begin();
 	print_cpu_stall_info(rsp, smp_processor_id());
 	print_cpu_stall_info_end();
-	printk(KERN_CONT " (t=%lu jiffies)\n", jiffies - rsp->gp_start);
+	for_each_possible_cpu(cpu)
+		totqlen += per_cpu_ptr(rsp->rda, cpu)->qlen;
+	pr_cont(" (t=%lu jiffies g=%lu c=%lu q=%lu)\n",
+		jiffies - rsp->gp_start, rsp->gpnum, rsp->completed, totqlen);
 	if (!trigger_all_cpu_backtrace())
 		dump_stack();
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists