linux-kernel - Re: [PATCH tip/core/rcu 11/15] rcu: Avoid spurious RCU CPU stall warnings

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120906215838.GM2448@linux.vnet.ibm.com>
Date:	Thu, 6 Sep 2012 14:58:38 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, mingo@...e.hu, laijs@...fujitsu.com,
	dipankar@...ibm.com, akpm@...ux-foundation.org,
	mathieu.desnoyers@...ymtl.ca, josh@...htriplett.org,
	niv@...ibm.com, tglx@...utronix.de, Valdis.Kletnieks@...edu,
	dhowells@...hat.com, eric.dumazet@...il.com, darren@...art.com,
	fweisbec@...il.com, sbw@....edu, patches@...aro.org,
	"Paul E. McKenney" <paul.mckenney@...aro.org>
Subject: Re: [PATCH tip/core/rcu 11/15] rcu: Avoid spurious RCU CPU stall
 warnings

On Thu, Sep 06, 2012 at 05:41:01PM -0400, Steven Rostedt wrote:
> On Thu, 2012-09-06 at 14:03 -0700, Paul E. McKenney wrote:
> 
> > Here are a few other ways that stalls can happen:
> > 
> > o	A CPU looping in an RCU read-side critical section.
> 
> For a minute? That's a bug.
> 
> > 	
> > o	A CPU looping with interrupts disabled.  This condition can
> > 	result in RCU-sched and RCU-bh stalls.
> 
> Also a bug.
> 
> > 
> > o	A CPU looping with preemption disabled.  This condition can
> > 	result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh
> > 	stalls.
> 
> Bug as well.
> 
> > 
> > o	A CPU looping with bottom halves disabled.  This condition can
> > 	result in RCU-sched and RCU-bh stalls.
> 
> Bug too.
> 
> > 
> > o	For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
> > 	without invoking schedule().
> 
> Another bug.
> 
> > 
> > o	A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
> > 	happen to preempt a low-priority task in the middle of an RCU
> > 	read-side critical section.   This is especially damaging if
> > 	that low-priority task is not permitted to run on any other CPU,
> > 	in which case the next RCU grace period can never complete, which
> > 	will eventually cause the system to run out of memory and hang.
> > 	While the system is in the process of running itself out of
> > 	memory, you might see stall-warning messages.
> 
> Buggy system.
> 
> > 
> > o	A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
> > 	is running at a higher priority than the RCU softirq threads.
> > 	This will prevent RCU callbacks from ever being invoked,
> > 	and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent
> > 	RCU grace periods from ever completing.  Either way, the
> > 	system will eventually run out of memory and hang.  In the
> > 	CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
> > 	messages.
> 
> Not really a bug, but the developers need a spanking.

And RCU does what it can, which is limited to a splat on the console.

> > o	A hardware or software issue shuts off the scheduler-clock
> > 	interrupt on a CPU that is not in dyntick-idle mode.  This
> > 	problem really has happened, and seems to be most likely to
> > 	result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.
> 
> Driving the bug.
> 
> > 
> > o	A bug in the RCU implementation.
> 
> Bug in the name.
> 
> > 
> > o	A hardware failure.  This is quite unlikely, but has occurred
> > 	at least once in real life.  A CPU failed in a running system,
> > 	becoming unresponsive, but not causing an immediate crash.
> > 	This resulted in a series of RCU CPU stall warnings, eventually
> > 	leading the realization that the CPU had failed.
> 
> Hardware bug.
> 
> So, where's the "spurious RCU CPU stall warnings"?

I figured that would count as a bug in the RCU implementation.  ;-)

> All these cases deserve a warning.

Agreed, and that is the whole purpose of the stall warnings.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/