lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 19 Dec 2014 12:09:48 +0100 (CET)
From:	Miroslav Benes <mbenes@...e.cz>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: rcu_sched stall detected, but no state dump

On Mon, 15 Dec 2014, Miroslav Benes wrote:

> On Fri, 12 Dec 2014, Paul E. McKenney wrote:
> 
> > On Fri, Dec 12, 2014 at 03:06:20PM +0100, Miroslav Benes wrote:
> > 

[...]

> > > One last thing. Is there a reason that there is not any similar INFO in 
> > > the console for tiny implementation? I enabled RCU_TRACE and still got 
> > > nothing. I have only shallow knowledge of RCU, though.
> > 
> > It is supposed to work given CONFIG_RCU_TRACE=y, and it looks like
> > it should, at least as long as scheduling-clock interrupts are being
> > delivered.  One thing to try for diagnosis is to have every (say) 1000th
> > call to check_cpu_stall() (in kernel/rcu/tiny_plugin.h) do a printk()
> > and same for rcu_check_callbacks() (in kernel/rcu/tiny.c).
> 
> Unfortunately nothing is there. I'll look into it when I have more time 
> later this week and let you know.

So I looked into it. Scheduling-clock interrupts are delivered and 
check_cpu_stall() in tiny_plugin.h is called twice each tick, once for 
rcu_bh and once for rcu_sched. What happens is the following (part of the 
printk output in check_cpu_stall)...

...
[  276.049053] name: rcu_bh, jiffies: 4294943345, jiffies_stall: 4294964342, rcp->curtail:           (null)
[  276.050046] name: rcu_sched, jiffies: 4294943345, jiffies_stall: 4294943357, rcp->curtail:           (null)
[  276.051312] name: rcu_bh, jiffies: 4294943347, jiffies_stall: 4294964345, rcp->curtail:           (null)
[  276.052305] name: rcu_sched, jiffies: 4294943347, jiffies_stall: 4294943357, rcp->curtail:           (null)
[  276.054634] name: rcu_bh, jiffies: 4294943356, jiffies_stall: 4294964347, rcp->curtail:           (null)
[  276.054634] name: rcu_sched, jiffies: 4294943356, jiffies_stall: 4294943357, rcp->curtail:           (null)
[  276.062989] name: rcu_bh, jiffies: 4294943358, jiffies_stall: 4294964356, rcp->curtail:           (null)
[  276.063982] name: rcu_sched, jiffies: 4294943358, jiffies_stall: 4294943357, rcp->curtail:           (null)
[  276.065285] name: rcu_bh, jiffies: 4294943361, jiffies_stall: 4294964358, rcp->curtail:           (null)
[  276.066278] name: rcu_sched, jiffies: 4294943361, jiffies_stall: 4294964358, rcp->curtail:           (null)
[  276.068566] name: rcu_bh, jiffies: 4294943370, jiffies_stall: 4294964361, rcp->curtail:           (null)
[  276.068566] name: rcu_sched, jiffies: 4294943370, jiffies_stall: 4294964358, rcp->curtail:           (null)
[  276.076867] name: rcu_bh, jiffies: 4294943372, jiffies_stall: 4294964370, rcp->curtail:           (null)
[  276.077861] name: rcu_sched, jiffies: 4294943372, jiffies_stall: 4294964358, rcp->curtail:           (null)
[  276.079199] name: rcu_bh, jiffies: 4294943375, jiffies_stall: 4294964372, rcp->curtail:           (null)
[  276.080193] name: rcu_sched, jiffies: 4294943375, jiffies_stall: 4294964358, rcp->curtail:           (null)
[  276.082565] name: rcu_bh, jiffies: 4294943384, jiffies_stall: 4294964375, rcp->curtail:           (null)
[  276.082565] name: rcu_sched, jiffies: 4294943384, jiffies_stall: 4294964358, rcp->curtail:           (null)
[  276.090984] name: rcu_bh, jiffies: 4294943386, jiffies_stall: 4294964384, rcp->curtail:           (null)
[  276.091976] name: rcu_sched, jiffies: 4294943386, jiffies_stall: 4294964358, rcp->curtail:           (null)
[  276.093296] name: rcu_bh, jiffies: 4294943389, jiffies_stall: 4294964386, rcp->curtail:           (null)
[  276.094289] name: rcu_sched, jiffies: 4294943389, jiffies_stall: 4294964358, rcp->curtail:           (null)
...

In case of rcu_bh jiffies_stall is regularly updated and no stall is thus 
detected (if I understand the code correctly). In case of rcu_sched the 
situation is different. jiffies_stall is updated every 21 seconds (my 
RCU_CPU_STALL_TIMEOUT). So I would say the stall should be detected there. 
Unfortunately it is bound to *rcp->curtail being not NULL, and it is in my 
case (in printk output there is rcp->curtail, but it is *rcp->curtail in 
fact). The patch below did the trick for me and cpu stall was detected and 
info printed.

...
[  138.534004] INFO: rcu_sched stall on CPU (41980 ticks this GP) idle=140000000000001 (t=42000 jiffies q=64)
[  138.534004] CPU: 0 PID: 4140 Comm: test_thread Tainted: G           O   3.18.0-next-20141218-rcu_tiny_patched+ #1
[  138.534004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20141015_183547-build31 04/01/2014
[  138.534004]  0000000000000000 ffffffff81e2ce28 ffffffff81881b77 ffffffff81e2ce48
[  138.534004]  ffffffff810925c1 ffffffff81e2ce68 0000000000000000 ffffffff81e2ce68
[  138.534004]  ffffffff810932c2 0000000000000000 ffff8801395bc650 ffffffff81e2ce88
[  138.534004] Call Trace:
[  138.534004]  <IRQ>  [<ffffffff81881b77>] dump_stack+0x19/0x1b
[  138.534004]  [<ffffffff810925c1>] check_cpu_stall.part.6+0x61/0xb0
[  138.534004]  [<ffffffff810932c2>] rcu_check_callbacks+0x82/0x90
[  138.534004]  [<ffffffff81097258>] update_process_times+0x38/0x60
[  138.534004]  [<ffffffff810a6635>] tick_sched_handle.isra.22+0x35/0x50
[  138.534004]  [<ffffffff810a6ab3>] tick_sched_timer+0x43/0x70
[  138.534004]  [<ffffffff81097e1a>] __run_hrtimer+0xaa/0x240
[  138.534004]  [<ffffffff810981b6>] ? hrtimer_interrupt+0x66/0x250
[  138.534004]  [<ffffffff810a6a70>] ? tick_sched_do_timer+0x30/0x30
[  138.534004]  [<ffffffff8109822f>] hrtimer_interrupt+0xdf/0x250
[  138.534004]  [<ffffffffa0000000>] ? 0xffffffffa0000000
[  138.534004]  [<ffffffff8102d986>] local_apic_timer_interrupt+0x26/0x50
[  138.534004]  [<ffffffff8189026f>] smp_apic_timer_interrupt+0x3f/0x60
[  138.534004]  [<ffffffff8188e91d>] apic_timer_interrupt+0x6d/0x80
[  138.534004]  <EOI>  [<ffffffff8188e75c>] ? retint_restore_args+0xe/0xe
[  138.534004]  [<ffffffff81062c9d>] ? kthread_should_stop+0x1d/0x30
[  138.534004]  [<ffffffffa0000015>] ? test_thread_fn+0x15/0x20 [kthread_mod]
[  138.534004]  [<ffffffff81063139>] kthread+0xf9/0x110
[  138.534004]  [<ffffffff81063040>] ? kthread_create_on_node+0x230/0x230
[  138.534004]  [<ffffffff8188da2a>] ret_from_fork+0x7a/0xb0
[  138.534004]  [<ffffffff81063040>] ? kthread_create_on_node+0x230/0x230
...

But I'm not sure whether it is possible that the list of pending callback 
is empty. Maybe the bug is elsewhere. My understanding of RCU is limited 
but I'll try to look around. Also the commit log might be imprecise :/

Best regards,
Miroslav

-- >8 --
>From be3495519513356daa366517bb458bc4f05ecf9b Mon Sep 17 00:00:00 2001
From: Miroslav Benes <mbenes@...e.cz>
Date: Fri, 19 Dec 2014 11:11:04 +0100
Subject: [PATCH] rcu: fix rcu stall detection in tiny implementation

The current tiny RCU stall detection depends on presence of pending callbacks in
the list (*rcp->curtail et al.). But they are none even in case of real CPU
stalls prohibiting the detection. When the check for callback presence is
removed, INFO about the stall is correctly printed.

Signed-off-by: Miroslav Benes <mbenes@...e.cz>
---
 kernel/rcu/tiny_plugin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tiny_plugin.h b/kernel/rcu/tiny_plugin.h
index 858c565..3f743fa 100644
--- a/kernel/rcu/tiny_plugin.h
+++ b/kernel/rcu/tiny_plugin.h
@@ -145,7 +145,7 @@ static void check_cpu_stall(struct rcu_ctrlblk *rcp)
 	rcp->ticks_this_gp++;
 	j = jiffies;
 	js = ACCESS_ONCE(rcp->jiffies_stall);
-	if (*rcp->curtail && ULONG_CMP_GE(j, js)) {
+	if (ULONG_CMP_GE(j, js)) {
 		pr_err("INFO: %s stall on CPU (%lu ticks this GP) idle=%llx (t=%lu jiffies q=%ld)\n",
 		       rcp->name, rcp->ticks_this_gp, rcu_dynticks_nesting,
 		       jiffies - rcp->gp_start, rcp->qlen);
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ