linux-kernel - Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160321172616.GU4287@linux.vnet.ibm.com>
Date:	Mon, 21 Mar 2016 10:26:16 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Jacob Pan <jacob.jun.pan@...ux.intel.com>
Cc:	Josh Triplett <josh@...htriplett.org>,
	Ross Green <rgkernel@...il.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	John Stultz <john.stultz@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	lkml <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Lai Jiangshan <jiangshanlai@...il.com>, dipankar@...ibm.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	rostedt <rostedt@...dmis.org>,
	David Howells <dhowells@...hat.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Darren Hart <dvhart@...ux.intel.com>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	pranith kumar <bobby.prani@...il.com>,
	"Chatre, Reinette" <reinette.chatre@...el.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17

On Mon, Mar 21, 2016 at 09:22:30AM -0700, Jacob Pan wrote:
> On Fri, 18 Mar 2016 16:56:41 -0700
> "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> wrote:
> > On Fri, Mar 18, 2016 at 02:00:11PM -0700, Josh Triplett wrote:
> > > On Thu, Feb 25, 2016 at 04:56:38PM -0800, Paul E. McKenney wrote:

[ . . . ]

> > > We're seeing a similar stall (~60 seconds) on an x86 development
> > > system here.  Any luck tracking down the cause of this?  If not, any
> > > suggestions for traces that might be helpful?
> > 
> > The dmesg containing the stall, the kernel version, and the .config
> > would be helpful!  Working on a torture test specific to this bug...
> > 
> > 							Thanx, Paul
> > 
> +Reinette, she has the system that can reproduce the issue. I
> believe she is having some other problems with it at the moment. But
> the .config should be available. Version is v4.5.

A couple of additional questions:

1.	Is the test running on bare metal or virtualized?  If the
	latter, what is the host?

2.	Does the workload involve CPU hotplug?

3.	Are you seeing things like this in dmesg?

	"rcu_preempt kthread starved for 21033 jiffies"
	"rcu_sched kthread starved for 32103 jiffies"
	"rcu_bh kthread starved for 84031 jiffies"

	If not, you are probably facing some other bug, and should
	proceed debugging as described in Documentation/RCU/stallwarn.txt.

							Thanx, Paul