lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160403081853.GA32220@linux.vnet.ibm.com>
Date:	Sun, 3 Apr 2016 01:18:53 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	"Chatre, Reinette" <reinette.chatre@...el.com>,
	Jacob Pan <jacob.jun.pan@...ux.intel.com>,
	Josh Triplett <josh@...htriplett.org>,
	Ross Green <rgkernel@...il.com>,
	John Stultz <john.stultz@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	lkml <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Lai Jiangshan <jiangshanlai@...il.com>, dipankar@...ibm.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	rostedt <rostedt@...dmis.org>,
	David Howells <dhowells@...hat.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Darren Hart <dvhart@...ux.intel.com>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	pranith kumar <bobby.prani@...il.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17

On Thu, Mar 31, 2016 at 08:42:55AM -0700, Paul E. McKenney wrote:
> On Wed, Mar 30, 2016 at 07:55:47AM -0700, Paul E. McKenney wrote:
> > On Tue, Mar 29, 2016 at 06:49:08AM -0700, Paul E. McKenney wrote:
> > > On Mon, Mar 28, 2016 at 05:28:14PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Mar 28, 2016 at 05:25:18PM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Mar 28, 2016 at 06:08:41AM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, Mar 28, 2016 at 08:25:47AM +0200, Peter Zijlstra wrote:
> > > > > > > On Sun, Mar 27, 2016 at 02:06:41PM -0700, Paul E. McKenney wrote:
> > > > > 
> > > > > [ . . . ]
> > > > > 
> > > > > > > > OK, so I should instrument migration_call() if I get the repro rate up?
> > > > > > > 
> > > > > > > Can do, maybe try the below first. (yes I know how long it all takes :/)
> > > > > > 
> > > > > > OK, will run this today, then run calibration for last night's run this
> > > > > > evening.
> > > 
> > > And of 18 two-hour runs, there were five failures, or about 28%.
> > > That said, I don't have even one significant digit on the failure rate,
> > > as 5 of 18 is within the 95% confidence limits for a failure probability
> > > as low as 12.5% and as high as 47%.
> > 
> > And after last night's run, this is narrowed down to between 23% and 38%,
> > which is close enough.  Average is 30%, 18 failures in 60 runs.
> > 
> > Next step is to test Peter's patch some more.  Might take a couple of
> > night's worth of runs to get statistical significance.  After which
> > it will be time to rebase to 4.6-rc1.
> 
> And the first night was not so good: 6 failures out of 24 runs.  Adding
> this to the 1-of-10 earlier gets 7 failures of 34.  Here are how things
> stack up given the range of base failure estimates:
> 
> Low 95% bound of 23%:		84% confidence.
> 
> Actual measurement of 30%:	92% confidence.
> 
> High 95% bound of 38%:		98% confidence.
> 
> So there is still some chance that Peter's patch is helping.  I will
> run for one more evening, after which it will be time to move forward
> to 4.6-rc1.

And no luck reducing bounds.  However, moving to 4.6-rc1 did get some
of the trace_printk() to print.  The ftrace_dump()s resulted in RCU
CPU stall warnings, and the dumps were truncated due to test timeouts
in my scripting.  (I need to make my scripts more patient when they
see an ftrace dump in progress, I guess.)

Here are the results:

http://www2.rdrop.com/users/paulmck/submission/TREE03.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.1.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.2.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.3.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.4.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.5.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.6.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.7.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.8.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.9.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.11.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.12.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.13.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.14.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.15.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.16.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.17.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.18.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.19.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.20.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.21.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.22.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.23.console.log.tgz
http://www2.rdrop.com/users/paulmck/submission/TREE03.24.console.log.tgz

The config is here:

http://www2.rdrop.com/users/paulmck/submission/config.tgz

More runs to measure 4.6-rc1 base error rate...

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ