lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 11 Nov 2016 10:28:56 -0800
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Guenter Roeck <linux@...ck-us.net>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: RCU related tracebacks in various boot tests in -next

On Fri, Nov 11, 2016 at 06:32:43AM -0800, Guenter Roeck wrote:
> Hi,
> 
> I see a number of RCU related tracebacks in recent versions of -next.
> The problem was first seen in next-20161109.
> 
> I didn't bisect, but my wild guess is to blame Paul's related changes.
> Please let me know if a bisect is needed.
> 
> Detailed logs are at http://kerneltests.org/builders.
> Build/test scripts are in the various subdirectories of
> https://github.com/groeck/linux-build-test/tree/master/rootfs/. 
> 
> Thanks,
> Guenter
> 
> --------

This is a failure on my part to convert fully to the new scheme, will fix.
(The third trace makes the situation clear, this one at first glance
looks like a failure to invoke irq_enter(), but it is most likely another
symptom of the problem in the third trace.)

> arm:
> 
> [    1.699096] ===============================
> [    1.699188] suspicious RCU usage. ]
> [    1.699340] 4.9.0-rc4-next-20161111 #4 Not tainted
> [    1.699462] -------------------------------
> [    1.699554] ./include/trace/events/irq.h:69 suspicious rcu_dereference_check() usage!
> [    1.699707] 
> [    1.699707] other info that might help us debug this:
> [    1.699707] 
> [    1.699890] 
> [    1.699890] RCU used illegally from idle CPU!
> [    1.699890] rcu_scheduler_active = 1, debug_locks = 0
> [    1.700103] RCU used illegally from extended quiescent state!
> [    1.700256] no locks held by swapper/0/0.
> [    1.700317] 
> [    1.700317] stack backtrace:
> [    1.700500] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc4-next-20161111 #4
> [    1.700653] Hardware name: Generic OMAP3 (Flattened Device Tree)
> [    1.700988] [<c01100ac>] (unwind_backtrace) from [<c010c290>] (show_stack+0x10/0x14)
> [    1.701171] [<c010c290>] (show_stack) from [<c048f688>] (dump_stack+0xb0/0xe4)
> [    1.701354] [<c048f688>] (dump_stack) from [<c01a319c>] (__handle_irq_event_percpu+0x2fc/0x3d0)
> [    1.701538] [<c01a319c>] (__handle_irq_event_percpu) from [<c01a328c>] (handle_irq_event_percpu+0x1c/0x58)
> [    1.701751] [<c01a328c>] (handle_irq_event_percpu) from [<c01a3300>] (handle_irq_event+0x38/0x5c)
> [    1.701934] [<c01a3300>] (handle_irq_event) from [<c01a65d0>] (handle_level_irq+0xb8/0x14c)
> [    1.702087] [<c01a65d0>] (handle_level_irq) from [<c01a2208>] (generic_handle_irq+0x20/0x34)
> [    1.702270] [<c01a2208>] (generic_handle_irq) from [<c01a2748>] (__handle_domain_irq+0x64/0xe0)
> [    1.702453] [<c01a2748>] (__handle_domain_irq) from [<c07dc7b0>] (__irq_svc+0x70/0x98)
> [    1.702606] [<c07dc7b0>] (__irq_svc) from [<c0108360>] (arch_cpu_idle+0x20/0x3c)
> [    1.702789] [<c0108360>] (arch_cpu_idle) from [<c01882ec>] (cpu_startup_entry+0x180/0x224)
> [    1.702941] [<c01882ec>] (cpu_startup_entry) from [<c0b00c10>] (start_kernel+0x344/0x3b8)
> [    1.703186] [<c0b00c10>] (start_kernel) from [<8000807c>] (0x8000807c)
> 
> arm64:

For this second one....

> [    1.247359] ------------[ cut here ]------------
> [    1.247761] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:380 rcu_momentary_dyntick_idle+0xc0/0xc8
> [    1.247884] Modules linked in:
> [    1.247987] 
> [    1.248150] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc4-next-20161111 #1
> [    1.248241] Hardware name: ZynqMP EP108 (DT)
> [    1.248350] task: ffff80007c4d1900 task.stack: ffff80007c4e0000
> [    1.248445] PC is at rcu_momentary_dyntick_idle+0xc0/0xc8
> [    1.248523] LR is at rcu_note_context_switch+0x1b4/0x2c0
> [    1.248595] pc : [<ffff000008114df0>] lr : [<ffff000008119a0c>] pstate: 600000c5
> [    1.248676] sp : ffff80007c4e3ea0
> [    1.248738] x29: ffff80007c4e3ea0 x28: ffff000008d88000 
> [    1.248842] x27: ffff80007c4e0000 x26: ffff000008da4803 
> [    1.248919] x25: 0000000000000000 x24: ffff80007c4e0000 
> [    1.248992] x23: ffff0000088b3544 x22: ffff80007c4d1900 
> [    1.249065] x21: ffff80007efb1880 x20: ffff000008d88000 
> [    1.249138] x19: ffff80007c4d1900 x18: 000000000000000e 
> [    1.249211] x17: 0000000000000007 x16: 0000000000000001 
> [    1.249285] x15: 000000000000004c x14: 0000000000000000 
> [    1.249362] x13: 000000000000000b x12: 0000000000000006 
> [    1.249435] x11: ffff0000088e0ad8 x10: ffff80007efb18b0 
> [    1.249586] x9 : ffff0000088e0ad4 x8 : 00000000029f6300 
> [    1.249715] x7 : 7fffffffffffffff x6 : ffff80007efabbd0 
> [    1.249839] x5 : 0000000000000004 x4 : ffff000008da0688 
> [    1.249963] x3 : 0000000000000004 x2 : ffff80007efabbdc 
> [    1.250088] x1 : 0000000000000001 x0 : ffff000008da4811 
> [    1.250213] 
> [    1.250325] ---[ end trace 2930a57464ba65da ]---
> [    1.250551] Call trace:
> [    1.251134] Exception stack(0xffff80007c4e3cd0 to 0xffff80007c4e3e00)
> [    1.251396] 3cc0:                                   ffff80007c4d1900 0001000000000000
> [    1.251605] 3ce0: ffff80007c4e3ea0 ffff000008114df0 ffff000008e6c388 ffff80007c4e3f80
> [    1.251790] 3d00: ffff80007c4e3d90 ffff00000838b804 ffff000008e6c3ac ffff80007c4e3fd0
> [    1.251972] 3d20: 00000000ffffffd8 ffff000008910b8c 00000000000003e0 ffff000008b4c1c1
> [    1.252151] 3d40: ffff000008e6c388 ffff80007c4e3fd0 ffff000008b4c1bd ffff000008107264
> [    1.252331] 3d60: ffff80007c4e3d90 ffff00000838b4e0 ffff000008da4811 0000000000000001
> [    1.252509] 3d80: ffff80007efabbdc 0000000000000004 ffff000008da0688 0000000000000004
> [    1.252688] 3da0: ffff80007efabbd0 7fffffffffffffff 00000000029f6300 ffff0000088e0ad4
> [    1.252866] 3dc0: ffff80007efb18b0 ffff0000088e0ad8 0000000000000006 000000000000000b
> [    1.253043] 3de0: 0000000000000000 000000000000004c 0000000000000001 0000000000000007
> [    1.253269] [<ffff000008114df0>] rcu_momentary_dyntick_idle+0xc0/0xc8
> [    1.253442] [<ffff0000088b3000>] __schedule+0x68/0x570
> [    1.253598] [<ffff0000088b3544>] schedule+0x3c/0xa8
> [    1.253726] [<ffff0000088b3998>] schedule_preempt_disabled+0x20/0x38
> [    1.253884] [<ffff0000080f893c>] cpu_startup_entry+0xd4/0x1d8
> [    1.254026] [<ffff00000808e820>] secondary_start_kernel+0x158/0x198
> [    1.254268] [<00000000008bd1a4>] 0x8bd1a4

Looks like I got too excited about adding WARN_ON_ONCE(), will fix.

> s390:
> 
> ===============================
> 
> suspicious RCU usage. ]
> 4.9.0-rc4-next-20161111 #1 Not tainted
> -------------------------------
> ./include/trace/events/rcu.h:454 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> RCU used illegally from extended quiescent state!
> no locks held by swapper/0/0.
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc4-next-20161111 #1
> Hardware name: QEMU             QEMU                  QEMU             (KVM)
> Stack:
> 000000001fffbcb0 000000001fffbd40 0000000000000003 0000000000000000
> 000000001fffbde0 000000001fffbd58 000000001fffbd58 0000000000000020
> 0000000001e20b70 ffffffffffffffff 0000000000000000 0404c0010000000a
> 000000000000000c 000000001fffbda8 0000000000000000 0000000000000000
> 0000000000000000 00000000001168f4 000000001fffbd40 000000001fffbd88
> 
> Call Trace:
> ([<00000000001167b0>] show_trace+0x100/0x104)
> [<0000000000116896>] show_stack+0x6e/0xec 
> [<00000000007985e0>] dump_stack+0x94/0xd4 
> [<000000000021882e>] rcu_eqs_exit_common.isra.49+0x1ee/0x21c 
> [<000000000022172a>] rcu_irq_enter+0x1ca/0x284 
> [<0000000000182178>] irq_enter+0x40/0xc0 
> [<000000000010dcca>] do_IRQ+0x5a/0x94 [<0000000000ad234e>] ext_int_handler+0x11e/0x124 
> [<0000000000103c50>] enabled_wait+0x64/0x110 
> ([<0000000000103c3c>] enabled_wait+0x50/0x110)
> [<0000000000103eca>] arch_cpu_idle+0x42/0x54 
> [<0000000000ad0576>] default_idle_call+0x4e/0x64 
> [<00000000001ddcb0>] cpuidle_idle_call+0x54/0xa4 
> [<00000000001dde62>] cpu_startup_entry+0xea/0x12c 
> [<0000000000eaec82>] start_kernel+0x3ee/0x3fc 
> [<0000000000100020>] _stext+0x20/0x80 

And here I failed to convert one of the accesses to the ->dynticks
counter to the new scheme.  Again, will fix, patch below for the
-next you were using.

And thank you for your testing efforts!

							Thanx, Paul

------------------------------------------------------------------------

commit ee17957e96a196d3ea70a4b1c20ca09f8394ce66
Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Date:   Fri Nov 11 10:24:56 2016 -0800

    fixup! rcu: Abstract the dynticks snapshot operation

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c1e3d4a333b2..03f7bcbc2e99 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1139,7 +1139,9 @@ void rcu_nmi_exit(void)
  */
 bool notrace __rcu_is_watching(void)
 {
-	return atomic_read(this_cpu_ptr(&rcu_dynticks.dynticks)) & 0x1;
+	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
+
+	return rcu_dynticks_in_eqs(rcu_dynticks_snap(rdtp));
 }
 
 /**

Powered by blists - more mailing lists