[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1346967661.1680.52.camel@gandalf.local.home>
Date: Thu, 06 Sep 2012 17:41:01 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: paulmck@...ux.vnet.ibm.com
Cc: Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, mingo@...e.hu, laijs@...fujitsu.com,
dipankar@...ibm.com, akpm@...ux-foundation.org,
mathieu.desnoyers@...ymtl.ca, josh@...htriplett.org,
niv@...ibm.com, tglx@...utronix.de, Valdis.Kletnieks@...edu,
dhowells@...hat.com, eric.dumazet@...il.com, darren@...art.com,
fweisbec@...il.com, sbw@....edu, patches@...aro.org,
"Paul E. McKenney" <paul.mckenney@...aro.org>
Subject: Re: [PATCH tip/core/rcu 11/15] rcu: Avoid spurious RCU CPU stall
warnings
On Thu, 2012-09-06 at 14:03 -0700, Paul E. McKenney wrote:
> Here are a few other ways that stalls can happen:
>
> o A CPU looping in an RCU read-side critical section.
For a minute? That's a bug.
>
> o A CPU looping with interrupts disabled. This condition can
> result in RCU-sched and RCU-bh stalls.
Also a bug.
>
> o A CPU looping with preemption disabled. This condition can
> result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh
> stalls.
Bug as well.
>
> o A CPU looping with bottom halves disabled. This condition can
> result in RCU-sched and RCU-bh stalls.
Bug too.
>
> o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
> without invoking schedule().
Another bug.
>
> o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
> happen to preempt a low-priority task in the middle of an RCU
> read-side critical section. This is especially damaging if
> that low-priority task is not permitted to run on any other CPU,
> in which case the next RCU grace period can never complete, which
> will eventually cause the system to run out of memory and hang.
> While the system is in the process of running itself out of
> memory, you might see stall-warning messages.
Buggy system.
>
> o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
> is running at a higher priority than the RCU softirq threads.
> This will prevent RCU callbacks from ever being invoked,
> and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent
> RCU grace periods from ever completing. Either way, the
> system will eventually run out of memory and hang. In the
> CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
> messages.
Not really a bug, but the developers need a spanking.
>
> o A hardware or software issue shuts off the scheduler-clock
> interrupt on a CPU that is not in dyntick-idle mode. This
> problem really has happened, and seems to be most likely to
> result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.
Driving the bug.
>
> o A bug in the RCU implementation.
Bug in the name.
>
> o A hardware failure. This is quite unlikely, but has occurred
> at least once in real life. A CPU failed in a running system,
> becoming unresponsive, but not causing an immediate crash.
> This resulted in a series of RCU CPU stall warnings, eventually
> leading the realization that the CPU had failed.
Hardware bug.
So, where's the "spurious RCU CPU stall warnings"?
All these cases deserve a warning.
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists