linux-kernel - Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160322174011.GM4287@linux.vnet.ibm.com>
Date:	Tue, 22 Mar 2016 10:40:11 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	"Chatre, Reinette" <reinette.chatre@...el.com>
Cc:	Jacob Pan <jacob.jun.pan@...ux.intel.com>,
	Josh Triplett <josh@...htriplett.org>,
	Ross Green <rgkernel@...il.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	John Stultz <john.stultz@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	lkml <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	"dipankar@...ibm.com" <dipankar@...ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	rostedt <rostedt@...dmis.org>,
	David Howells <dhowells@...hat.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Darren Hart <dvhart@...ux.intel.com>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	pranith kumar <bobby.prani@...il.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17

On Tue, Mar 22, 2016 at 04:35:32PM +0000, Chatre, Reinette wrote:
> Hi Paul,

Hello, Reinette!

> On 2016-03-21, Paul E. McKenney wrote:
> > On Mon, Mar 21, 2016 at 09:22:30AM -0700, Jacob Pan wrote:
> >> On Fri, 18 Mar 2016 16:56:41 -0700
> >> "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> wrote:
> >>> On Fri, Mar 18, 2016 at 02:00:11PM -0700, Josh Triplett wrote:
> >>>> On Thu, Feb 25, 2016 at 04:56:38PM -0800, Paul E. McKenney wrote:
> > 
> > [ . . . ]
> > 
> >>>> We're seeing a similar stall (~60 seconds) on an x86 development
> >>>> system here.  Any luck tracking down the cause of this?  If not, any
> >>>> suggestions for traces that might be helpful?
> >>> 
> >>> The dmesg containing the stall, the kernel version, and the .config
> >>> would be helpful!  Working on a torture test specific to this bug...

And thank you for the .config.  Your kenrle version looks to be 4.5.0.

> >> +Reinette, she has the system that can reproduce the issue. I
> >> believe she is having some other problems with it at the moment. But
> >> the .config should be available. Version is v4.5.
> > 
> > A couple of additional questions:
> > 
> > 1.	Is the test running on bare metal or virtualized?  If the
> > 	latter, what is the host?
> 
> Bare metal.

OK, you are ahead of me.  Mine is virtualized.

> > 2.	Does the workload involve CPU hotplug?
> 
> No.

Again, you are ahead of me.  Mine makes extremely heavy use of CPU hotplug.

> > 3.	Are you seeing things like this in dmesg?
> > 
> > 	"rcu_preempt kthread starved for 21033 jiffies"
> > 	"rcu_sched kthread starved for 32103 jiffies"
> > 	"rcu_bh kthread starved for 84031 jiffies"
> > 
> > 	If not, you are probably facing some other bug, and should
> > 	proceed debugging as described in Documentation/RCU/stallwarn.txt.
> 
> Below is a sample of what I see as captured with v4.5. The kernel configuration is attached.
> 
> [  135.456197] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  135.457729]  3-...: (0 ticks this GP) idle=722/0/0 softirq=5532/5532 fqs=0 
> [  135.459604]  (detected by 2, t=60004 jiffies, g=2105, c=2104, q=165)
> [  135.461318] Task dump for CPU 3:
> [  135.461321] swapper/3       R  running task        0     0      1 0x00200000
> [  135.461325]  00000078560040e5 ffff88017846fed0 ffffffff818af2cc ffff880100000000
> [  135.461330]  0000000600000003 ffff880178470000 ffff880072f32200 ffffffff822dcec0
> [  135.461334]  ffff88017846c000 ffff88017846c000 ffff88017846fee0 ffffffff818af517
> [  135.461338] Call Trace:
> [  135.461345]  [<ffffffff818af2cc>] ? cpuidle_enter_state+0xfc/0x310
> [  135.461349]  [<ffffffff818af517>] ? cpuidle_enter+0x17/0x20
> [  135.461353]  [<ffffffff811515aa>] ? call_cpuidle+0x2a/0x40
> [  135.461355]  [<ffffffff8115197d>] ? cpu_startup_entry+0x28d/0x360
> [  135.461360]  [<ffffffff8108c874>] ? start_secondary+0x114/0x140
> [  135.461365] rcu_preempt kthread starved for 60004 jiffies! g2105 c2104 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1

And yes, it looks like you are seeing the same bug that I am tracing.

The kthread is blocked on a schedule_timeout_interruptible().  Given
default configuration, this would have a three-jiffy timeout.

You set CONFIG_RCU_CPU_STALL_TIMEOUT=60, which matches the 60004 jiffies
above.  Is that value due to a distro setting or something?  Mainline
uses CONFIG_RCU_CPU_STALL_TIMEOUT=21.

> [  135.463965] rcu_preempt     S ffff88017844fd68     0     7      2 0x00000000
> [  135.463969]  ffff88017844fd68 ffff88017dd8cc80 ffff880177ff0000 ffff880178443b80
> [  135.463973]  ffff880178450000 ffff88017844fda0 ffff88017dd8cc80 ffff88017dd8cc80
> [  135.463977]  0000000000000003 ffff88017844fd80 ffffffff81ab031f 0000000100031504
> [  135.463981] Call Trace:
> [  135.463986]  [<ffffffff81ab031f>] schedule+0x3f/0xa0
> [  135.463989]  [<ffffffff81ab42d7>] schedule_timeout+0x127/0x270
> [  135.463993]  [<ffffffff81171a50>] ? detach_if_pending+0x120/0x120
> [  135.463997]  [<ffffffff8116da5d>] rcu_gp_kthread+0x6bd/0xa30
> [  135.464000]  [<ffffffff81151390>] ? wake_atomic_t_function+0x70/0x70
> [  135.464003]  [<ffffffff8116d3a0>] ? force_qs_rnp+0x1b0/0x1b0
> [  135.464006]  [<ffffffff8112f846>] kthread+0xe6/0x100
> [  135.464009]  [<ffffffff8112f760>] ? kthread_worker_fn+0x190/0x190
> [  135.464012]  [<ffffffff81ab5c0f>] ret_from_fork+0x3f/0x70
> [  135.464015]  [<ffffffff8112f760>] ? kthread_worker_fn+0x190/0x190

How long does it take to reproduce this?  If it reproduces in minutes
or hours, could you please boot with the following on the kernel command
line and dump the trace buffer shortly after the stall?

ftrace trace_event=sched_waking,sched_wakeup,sched_wake_idle_without_ipi

If dumping manually shortly after the stall is at all non-trivial
(for example, if your reproduction time is many minute or hours),
I can supply some patches that automate this.  Or you can pick
them up from -rcu:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git

Branch rcu/dev has these patches (and much else besides).

							Thanx, Paul

PS:  In case you are curious, when I enable those tracepoints, it
     shows me that the timer is firing every three jiffies, as it
     should, but that something happens between the sched_waking
     and the IPI handler that should actually do the wakeup.
     However, adding the traces significantly slows reproduction,
     so I am writing a stress test specific to this bug to try to
     speed things up, hopefully allowing more tracing to be added
     while still retaining non-geologic reproduction times.