lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <686568926.5862.1456259651418.JavaMail.zimbra@efficios.com>
Date:	Tue, 23 Feb 2016 20:34:11 +0000 (UTC)
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Ross Green <rgkernel@...il.com>,
	John Stultz <john.stultz@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	lkml <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Lai Jiangshan <jiangshanlai@...il.com>, dipankar@...ibm.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	Josh Triplett <josh@...htriplett.org>,
	rostedt <rostedt@...dmis.org>,
	David Howells <dhowells@...hat.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Darren Hart <dvhart@...ux.intel.com>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	pranith kumar <bobby.prani@...il.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17

----- On Feb 21, 2016, at 1:15 PM, Ross Green rgkernel@...il.com wrote:

> On Sun, Feb 21, 2016 at 4:04 PM, Ross Green <rgkernel@...il.com> wrote:
>> On Sat, Feb 20, 2016 at 5:32 PM, Paul E. McKenney
>> <paulmck@...ux.vnet.ibm.com> wrote:
>>> On Sat, Feb 20, 2016 at 03:34:30PM +1100, Ross Green wrote:
>>>> On Sat, Feb 20, 2016 at 4:33 AM, Paul E. McKenney
>>>> <paulmck@...ux.vnet.ibm.com> wrote:
>>>> > On Thu, Feb 18, 2016 at 08:13:18PM -0800, John Stultz wrote:
>>>> >> On Thu, Feb 18, 2016 at 7:56 PM, Ross Green <rgkernel@...il.com> wrote:
>>>> >> > Well a bonus extra!
>>>> >> > Kept everything running and there was another stall.
>>>> >> > So i have included the demsg output for perusal.
>>>> >> >
>>>> >> > Just to clear things up there is no hotplug involved in this system.
>>>> >> > It is a standard Pandaboard ES Ti4460 two processor system.
>>>> >> > I use this for testing as a generic armv7 processor, plus can keep it
>>>> >> > just running along for testing for a long time. the system has a total
>>>> >> > of 23-25 process running on average. Mainly standard daemons. There is
>>>> >> > certainly no heavy processing going on. I run a series of benchmarks
>>>> >> > that are cpu intensive for the first 20 miinutes after boot and then
>>>> >> > just leave it idle away. checking every so often to see how it has
>>>> >> > gone.
>>>> >> > As mentioned I have observed these stalls going back to 3.17 kernel.
>>>> >> > It will often take up to a week to record such a stall. I will
>>>> >> > typically test every new release kernel, so the -rc? series will get
>>>> >> > around a weeks testing.
>>>> >>
>>>> >> Sorry. Kind of hopping in a bit late here. Is this always happening
>>>> >> with just the pandaboard? Or are you seeing this on different
>>>> >> machines?
>>>> >>
>>>> >> Have you tried enabling CONFIG_DEBUG_TIMEKEEPING just in case
>>>> >> something is going awry there?
>>>> >
>>>> > Excellent point -- timekeeping issues have caused this sort of issue
>>>> > in the past.
>>>> >
>>>> > Ross, on your next test, could you please enable CONFIG_DEBUG_TIMEKEEPING
>>>> > as John suggests?
>>>> >
>>>> >                                                         Thanx, Paul
>>>> >
>>>> As John has suggested have already enabled CONFIG_DEBUG_TIMEKEEPING.
>>>>
>>>> So far just on 1 day running.
>>>>
>>>> Sigh...!! Nothing to report as yet, only one day on the clock.
>>>> Its like watching grass grow!
>>>
>>> I hear you!  Though I was thinking in terms of watching paint dry...
>>>
>>>                                                         Thanx, Paul
>>>
>> Yes,
>>
>> but with paint drying there is an end point!
>> Grass just keeps on growing ...
>>
>> More like the children in the back of the car ...
>> Are we there yet? ...
>>
>> Well still nothing .. to report. I have just built a 4.5-rc5, but will
>> wait till I get some outcome from the previous test. That can't be too
>> much longer!
>>
>> In hope,
>>
>> Ross Green
> Patience little ones ...
> 
> Well after 2 days plus running pulled another stall.
> This is with 4.5-rc4 and CONFIG_DEBUG_TIMEKEEPING set.
> 
> Can't see anything related to the TIMEKEEPING.
> 
> Anyway here is the dmesg output for people to look at.
> 
> Paul, I was going to move onto 4.5-rc5 kernel, is there something else
> that you want me to test with that, Anyone else have any suggestions
> or ideas?

Starting from kernel 3.17, if we request a e.g. 1000 jiffies schedule_timeout
on a HZ=1000 kernel, is there an upper bound to the number of jiffies it
can actually take before the timeout happens, especially on idle systems ?
I remember a talk from Thomas Gleixner on the new timer wheel which could
add some imprecision to those timeouts. Not sure in which kernel version it
got in though.

My thinking is that it might be a good idea to try using hrtimers rather than
a jiffies-based timeout to awaken the RCU thread if it's really important to
run in a bounded amount of jiffies. Or else the jiffies-based sanity check
that triggers the warning is perhaps too strict.

Thoughts ?

Thanks,

Mathieu


> 
> Regards,
> 
> Ross Green

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Download attachment "dmesg-4.5-rc4-3" of type "application/octet-stream" (25170 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ