lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 25 Feb 2016 16:56:38 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Ross Green <rgkernel@...il.com>
Cc:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	John Stultz <john.stultz@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	lkml <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Lai Jiangshan <jiangshanlai@...il.com>, dipankar@...ibm.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	Josh Triplett <josh@...htriplett.org>,
	rostedt <rostedt@...dmis.org>,
	David Howells <dhowells@...hat.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Darren Hart <dvhart@...ux.intel.com>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	pranith kumar <bobby.prani@...il.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17

On Thu, Feb 25, 2016 at 04:13:11PM +1100, Ross Green wrote:
> On Wed, Feb 24, 2016 at 8:28 AM, Ross Green <rgkernel@...il.com> wrote:
> > On Wed, Feb 24, 2016 at 7:55 AM, Paul E. McKenney
> > <paulmck@...ux.vnet.ibm.com> wrote:

[ . . . ]

> >> Still working on getting decent traces...

And I might have succeeded, see below.

> >>
> >>                                                         Thanx, Paul
> >>
> >
> > G'day all,
> >
> > Here is another dmesg output for 4.5-rc5 showing another rcu_preempt stall.
> > This one appeared after only a day of running. CONFIG_DEBUG_TIMING is
> > turned on, but can't see any output that shows from this.
> >
> > Again testing as before,
> >
> > Boot, run a series of small benchmarks, then just let the system be
> > and idle away.
> >
> > I notice in the stack trace there is mention of hrtimer_run_queues and
> > hrtimer_interrupt.
> >
> > Anyway, leave this for a few more eyes to look at.
> >
> > Open to any other suggestions of things to test.
> >
> > Regards,
> >
> > Ross Green
> 
> 
> G'day Paul,
> 
> I left the pandaboard running and captured another stall.
> 
> the attachment is the dmesg output.
> 
> Again there is no apparent output from any CONFIG_DEBUG_TIMING so I
> assume there is nothing happening there.

I agree, looks like this is not due to time skew.

> I just saw the updates for 4.6 RCU code.
> Is the patch in [PATCH tip/core/rcu 04/13] valid here?

I doubt that it will help, but you never know.

> do you want me try the new patch set with this configuration?

Even better would be to try Daniel Wagner's swait patchset.  I have
attached them in UNIX mbox format, or you can get them from the
-tip tree.

And I -finally- got some tracing that -might- be useful.  The dmesg, all
67MB of it, is here:

	http://www.rdrop.com/~paulmck/submission/console.2016.02.23a.log

This failure mode is less likely to happen, and looks a bit different
than the ones that I was seeing before enabling tracing.  Then, an
additional wakeup would actually wake the task up.  In contrast, with
tracing enabled, the RCU grace-period kthread goes into "teenager mode",
refusing to wake up despite repeated attempts.  However, this might
be a side-effect of the ftrace dump.

On line 525,132, we see that the rcu_preempt grace-period kthread has
been starved for 1,188,154 jiffies, or about 20 minutes.  This seems
unlikely...  The kthread is waiting for no more than a three-jiffy
timeout ("RCU_GP_WAIT_FQS(3)") and is in TASK_INTERRUPTIBLE state
("0x1").  I have replaced the old event-waiting with a straight
schedule_timeout_interruptible() as an (ineffective) desperation move.

But this dmesg also contains a trace with sched_waking and sched_wakeup
enabled.  Here are the entries corresponding to the grace-period kthread
in reverse time order:

[ 1205.567414] rcu_pree-7       5d..2 1175375682us : sched_waking: comm=rcub/9 pid=34 prio=97 target_cpu=005
[ 1205.567414] rcu_pree-7       5dN.3 1175375685us : sched_wakeup: comm=rcub/9 pid=34 prio=97 target_cpu=005

	This is the last sched_wakeup trace entry, 29 seconds before
	the end of the trace.  After this, we see a sched_waking event
	about every three milliseconds.

[ 1205.567414] rcu_pree-7       4d..2 1175381643us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175384644us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175387662us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175390643us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175393644us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175396651us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175399665us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175402646us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175405647us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175408650us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175411669us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175414659us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175417654us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175420661us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175423679us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175426657us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175429660us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175432660us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175435681us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175460700us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175464687us : sched_waking: comm=rcub/14 pid=89 prio=97 target_cpu=014
[ 1205.567414] rcu_pree-7       4d..2 1175467728us : sched_waking: comm=rcub/14 pid=89 prio=97 target_cpu=014
[ 1205.567414] rcu_pree-7       4d..2 1175477679us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175477696us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175493687us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175506686us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175516696us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175516714us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175529700us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175536704us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175539737us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175542707us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175545704us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175548709us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175551740us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175554709us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175566714us : sched_waking: comm=rcub/7 pid=11 prio=97 target_cpu=000
[ 1205.567414] rcu_pree-7       4d..2 1175566732us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175569724us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175572719us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175575750us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175578717us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175581718us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175584721us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175587748us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175590728us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175593724us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175596724us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175599761us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175602726us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175605728us : sched_waking: comm=rcub/10 pid=45 prio=97 target_cpu=006
[ 1205.567414] rcu_pree-7       4d..2 1175612737us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175615753us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175618737us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175621736us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175624738us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175627757us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175630741us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175633741us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175636745us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175639765us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175642744us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175645745us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175648750us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175651771us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175654751us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175657750us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175660752us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175663779us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175666752us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175669753us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175672757us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175675782us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175678756us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175681761us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175684761us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175687778us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175690762us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175693763us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010
[ 1205.567414] rcu_pree-7       4d..2 1175696766us : sched_waking: comm=rcub/12 pid=67 prio=97 target_cpu=010

We might have lost some events, but losing that many in that pattern
seems unlikely.

OK, so what wakeup path omits the sched_wakeup event?

The sched_waking event looks to occur once in try_to_wake_up() and
once in try_to_wake_up_local().  Starting with try_to_wake_up():

o	If the task is ->on_rq, ttwu_remote() is invoked:

	o	This acquires the runqueue lock, then if
		task_on_rq_queued() invokes ttwu_do_wakeup().  This
		unconditionally does sched_wakeup, so we didn't go that
		way.  (And this path skips the bulk of try_to_wake_up()
		on return.)

	o	Otherwise, we release the runqueu lock and returns zero.

o	There is some ordering checking, runqueue selection, and then
	p->state is set to TASK_WAKING.  And we apparently are not getting
	here, either.  But I don't see any other way out.

	Ignoring this for the moment...

	We eventually reach to the call to ttwu_queue().

	o	Here the TTWU_QUEUE path seems to avoid doing a
		sched_wakeup event -- and since we are trying to wake
		CPU 0 from CPU 4, so they don't share cache (x86).

	o	This invokes ttwu_queue_remote(), which sends an IPI
		unless polling is in effect.  I would need to enable
		trace_sched_wake_idle_without_ipi() to see whether or
		not the IPI was actually sent.

		If the target CPU was offline, we should have seen the
		cpu_is_offline() WARN_ON().  I suppose that the CPU might
		go offline between the check and the ->send_IPI_mask(),
		but only once.	And we are trying to wakeup on CPU 0
		quite a few times.

	Any thoughts on what to look for?

Next, try_to_wake_up_local():

o	After doing several checks, it does the sched_waking event.

o	If the task is already queued, it calls ttwu_activate().

o	It then invokes ttwu_do_wakeup(), which unconditionally
	does the sched_wakeup() event.

	So this path looks unlikely, even ignoring the fact that
	the waking CPU in the traces above is always different than
	the CPU to be awakened on.

Any thoughts?

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ