lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 15 Apr 2011 17:52:21 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Valdis.Kletnieks@...edu
Cc:	akpm@...ux-foundation.org, Ingo Molnar <mingo@...e.hu>,
	linux-kernel@...r.kernel.org
Subject: Re: mmotm 2011-04-14 - lockdep splats in sched.c during boot

On Fri, 2011-04-15 at 10:57 -0400, Valdis.Kletnieks@...edu wrote:
> On Thu, 14 Apr 2011 15:08:47 PDT, akpm@...ux-foundation.org said:
> > The mm-of-the-moment snapshot 2011-04-14-15-08 has been uploaded to
> > 
> >    http://userweb.kernel.org/~akpm/mmotm/
> 
> This throws at least two complaints about lockdep on the way up.  I've had
> several complete hangs as well last night during boot following a WARN in
> sched.c, but didn't have netconsole or a camera handy at the time.  Will follow up if I
> catch one. 

That would be most appreciated, I merged two large series of scheduler
patches.

> Both whinges point at a 'for_each_domain()'. Not sure why I
> haven't seen mention on lkml before - what am I doing different?

Probably running a very fresh kernel..

> Splat number 1:
> [    0.044382] smpboot cpu 1: start_ip = 99000
> [    0.002999] calibrate_delay_direct() timer_rate_max=2526877 timer_rate_min=2526840 pre_start=520283431585 pre_end=520308700132
> [    0.002999] calibrate_delay_direct() timer_rate_max=2526857 timer_rate_min=2526829 pre_start=520313753438 pre_end=520339021871
> [    0.002999] calibrate_delay_direct() timer_rate_max=2526851 timer_rate_min=2526824 pre_start=520344075709 pre_end=520369344094
> [    0.002999] calibrate_delay_direct() timer_rate_max=2526862 timer_rate_min=2526834 pre_start=520374397819 pre_end=520399666308
> [    0.002999] calibrate_delay_direct() timer_rate_max=2526864 timer_rate_min=2526836 pre_start=520404719957 pre_end=520429988465
> [    0.116010] 
> [    0.116011] ===================================================
> [    0.116989] [ INFO: suspicious rcu_dereference_check() usage. ]
> [    0.116989] ---------------------------------------------------
> [    0.116989] kernel/sched.c:2426 invoked rcu_dereference_check() without protection!
> [    0.116989] 
> [    0.116989] other info that might help us debug this:
> [    0.116989] 
> [    0.116989] 
> [    0.116989] rcu_scheduler_active = 1, debug_locks = 1
> [    0.116989] 2 locks held by swapper/1:
> [    0.116989]  #0:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810394d2>] cpu_maps_update_begin+0x12/0x14
> [    0.116989]  #1:  (&p->pi_lock){-.....}, at: [<ffffffff81032959>] try_to_wake_up+0x29/0x1aa
> [    0.116989] 
> [    0.116989] stack backtrace:
> [    0.116989] Pid: 1, comm: swapper Not tainted 2.6.39-rc3-mmotm0414 #1
> [    0.116989] Call Trace:
> [    0.116989]  [<ffffffff81065bfc>] lockdep_rcu_dereference+0x9b/0xa4
> [    0.116989]  [<ffffffff8102acd0>] ttwu_stat+0xcc/0xf5
> [    0.116989]  [<ffffffff81032ab5>] try_to_wake_up+0x185/0x1aa
> [    0.116989]  [<ffffffff81b5540a>] ? migration_call+0x9e/0xd0
> [    0.116989]  [<ffffffff81564643>] ? _raw_spin_unlock_irqrestore+0x46/0x80
> [    0.116989]  [<ffffffff81032b06>] wake_up_process+0x10/0x12
> [    0.116989]  [<ffffffff81b56207>] cpu_stop_cpu_callback+0xe5/0x11b
> [    0.116989]  [<ffffffff81567abe>] notifier_call_chain+0x54/0x81
> [    0.116989]  [<ffffffff810596bc>] __raw_notifier_call_chain+0x9/0xb
> [    0.116989]  [<ffffffff815434d1>] __cpu_notify+0x1b/0x2d
> [    0.116989]  [<ffffffff81b55709>] _cpu_up.constprop.0+0xd1/0xe5
> [    0.116989]  [<ffffffff81b55757>] cpu_up+0x3a/0x47
> [    0.116989]  [<ffffffff81b2f3d2>] smp_init+0x41/0x93
> [    0.116989]  [<ffffffff81b1dbc5>] kernel_init+0x9d/0x15b
> [    0.116989]  [<ffffffff8156bb94>] kernel_thread_helper+0x4/0x10
> [    0.116989]  [<ffffffff81564d84>] ? retint_restore_args+0xe/0xe
> [    0.116989]  [<ffffffff81b1db28>] ? start_kernel+0x394/0x394
> [    0.116989]  [<ffffffff8156bb90>] ? gs_change+0xb/0xb
> [    0.117089] NMI watchdog enabled, takes one hw-pmu counter.
> [    0.119006] Brought up 2 CPUs
> 
> Splat number 2:
> [    1.179319] netconsole: remote ethernet address 00:b0:d0:c3:bd:a7
> [    1.179430] netconsole: device eth0 not up yet, forcing it
> [    1.247705] e1000e 0000:00:19.0: irq 46 for MSI/MSI-X
> [    1.298111] e1000e 0000:00:19.0: irq 46 for MSI/MSI-X
> [    1.298312] 
> [    1.298313] ===================================================
> [    1.298516] [ INFO: suspicious rcu_dereference_check() usage. ]
> [    1.298623] ---------------------------------------------------
> [    1.298731] kernel/sched.c:1211 invoked rcu_dereference_check() without protection!
> [    1.298858] 
> [    1.298858] other info that might help us debug this:
> [    1.298859] 
> [    1.299152] 
> [    1.299152] rcu_scheduler_active = 1, debug_locks = 1
> [    1.299294] 1 lock held by swapper/0:
> [    1.299294]  #0:  (&(&base->lock)->rlock){-.-.-.}, at: [<ffffffff810443fd>] lock_timer_base+0x49/0x92
> [    1.299294] 
> [    1.299294] stack backtrace:
> [    1.299294] Pid: 0, comm: swapper Not tainted 2.6.39-rc3-mmotm0414 #1
> [    1.299294] Call Trace:
> [    1.299294]  <IRQ>  [<ffffffff81065bfc>] lockdep_rcu_dereference+0x9b/0xa4
> [    1.299294]  [<ffffffff810337a7>] get_nohz_timer_target+0x79/0xbe
> [    1.299294]  [<ffffffff810452ec>] __mod_timer+0xc7/0x16d
> [    1.299294]  [<ffffffff810454bf>] mod_timer+0x87/0x8e
> [    1.299294]  [<ffffffff8130814c>] e1000_intr_msi+0xa2/0xef
> [    1.299294]  [<ffffffff8108acab>] handle_irq_event_percpu+0xba/0x29f
> [    1.299294]  [<ffffffff8108aecc>] handle_irq_event+0x3c/0x5c
> [    1.299294]  [<ffffffff810193c6>] ? ack_APIC_irq+0x10/0x12
> [    1.299294]  [<ffffffff8108d197>] handle_edge_irq+0xf4/0x121
> [    1.299294]  [<ffffffff810031aa>] handle_irq+0x122/0x133
> [    1.299294]  [<ffffffff81002fdf>] do_IRQ+0x48/0xa0
> [    1.299294]  [<ffffffff81564cd3>] common_interrupt+0x13/0x13
> [    1.299294]  <EOI>  [<ffffffff81008009>] ? default_idle+0x52/0x89
> [    1.299294]  [<ffffffff81008007>] ? default_idle+0x50/0x89
> [    1.299294]  [<ffffffff8100084c>] cpu_idle+0x87/0x102
> [    1.299294]  [<ffffffff81535587>] rest_init+0xcb/0xd2
> [    1.299294]  [<ffffffff815354bc>] ? csum_partial_copy_generic+0x16c/0x16c
> [    1.299294]  [<ffffffff81b1db1d>] start_kernel+0x389/0x394
> [    1.299294]  [<ffffffff81b1d29f>] x86_64_start_reservations+0xaf/0xb3
> [    1.299294]  [<ffffffff81b1d393>] x86_64_start_kernel+0xf0/0xf7
> [    1.309814] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> 


The below should cure those two I think.

---
 kernel/sched.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 0cfe031..cd06b53 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1208,11 +1208,13 @@ int get_nohz_timer_target(void)
 	int i;
 	struct sched_domain *sd;
 
+	rcu_read_lock();
 	for_each_domain(cpu, sd) {
 		for_each_cpu(i, sched_domain_span(sd))
 			if (!idle_cpu(i))
 				return i;
 	}
+	rcu_read_unlock();
 	return cpu;
 }
 /*
@@ -2415,12 +2417,14 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
 		struct sched_domain *sd;
 
 		schedstat_inc(p, se.statistics.nr_wakeups_remote);
+		rcu_read_lock();
 		for_each_domain(this_cpu, sd) {
 			if (cpumask_test_cpu(cpu, sched_domain_span(sd))) {
 				schedstat_inc(sd, ttwu_wake_remote);
 				break;
 			}
 		}
+		rcu_read_unlock();
 	}
 #endif /* CONFIG_SMP */
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ