linux-kernel - Re: rcu self-detected stall messages on OMAP3, 4 boards

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120922201043.GE2934@linux.vnet.ibm.com>
Date:	Sat, 22 Sep 2012 13:10:44 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Paul Walmsley <paul@...an.com>
Cc:	"Bruce, Becky" <bbruce@...com>,
	"Paul E. McKenney" <paul.mckenney@...aro.org>,
	"<linux-kernel@...r.kernel.org>" <linux-kernel@...r.kernel.org>,
	"<linux-omap@...r.kernel.org>" <linux-omap@...r.kernel.org>,
	"<linux-arm-kernel@...ts.infradead.org>" 
	<linux-arm-kernel@...ts.infradead.org>,
	"Hilman, Kevin" <khilman@...com>,
	"Shilimkar, Santosh" <santosh.shilimkar@...com>,
	"Hunter, Jon" <jon-hunter@...com>,
	"<snijsure@...d-net.com>" <snijsure@...d-net.com>,
	fweisbec@...il.com
Subject: Re: rcu self-detected stall messages on OMAP3, 4 boards

On Sat, Sep 22, 2012 at 06:42:08PM +0000, Paul Walmsley wrote:
> On Fri, 21 Sep 2012, Paul E. McKenney wrote:
> 
> > Could you please point me to a recipe for creating a minimal userspace?
> > Just in case it is the userspac erather than the architecture/hardware
> > that makes the difference.
> 
> Tony's suggestion is pretty good.  Note that there may also be differences 
> in kernel timers -- either between x86 and ARM architectures, or loaded 
> device drivers -- that may confound the problem.

For example, there must be at least one RCU callback outstanding after
the boot sequence quiets down.  Of course, the last time I tried Tony's
approach, I was doing it on top of my -rcu stack, so am retrying on
v3.6-rc6.

> > Just to make sure I understand the combinations:
> > 
> > o	All stalls have happened when running a minimal userspace.
> > o	CONFIG_NO_HZ=n suppresses the stalls.
> > o	CONFIG_RCU_FAST_NO_HZ (which depends on CONFIG_NO_HZ=y) has
> > 	no observable effect on the stalls.
> > 
> > Did I get that right, or am I missing a combination?
> 
> That's correct.
> 
> > Indeed, rcu_idle_gp_timer_func() is a bit strange in that it is 
> > cancelled upon exit from idle, and therefore should (almost) never 
> > actually execute. Its sole purpose is to wake up the CPU.  ;-)
> 
> Right.  Just curious, what would wake up the kernel from idle to handle a 
> grace period expiration when CONFIG_RCU_FAST_NO_HZ=n?  On a very idle 
> system, the time between timer ticks could potentially be several tens of 
> seconds.

If CONFIG_RCU_FAST_NO_HZ=n, then CPUs with RCU callbacks are not permitted
to shut off the scheduling-clock tick, so any CPU with RCU callbacks will
be awakened every jiffy.  The problem is that there appears to be a way
to get an RCU grace period started without any CPU having any callbacks,
which, as you surmise, would result in all the CPUs going to sleep and
the grace period never ending.  So if a CPU is awakened for any reason
after this everlasting grace period has extended for more than a minute,
the first thing that CPU will do is print an RCU CPU stall warning.

I believe that I see how to prevent callback-free grace periods from
ever starting.  (Famous last words...)

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/