linux-kernel - Re: mmotm 2010-08-11 - RCU whinge during very early boot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101006230400.GR2433@linux.vnet.ibm.com>
Date:	Wed, 6 Oct 2010 16:04:00 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Zdenek Kabelac <zdenek.kabelac@...il.com>
Cc:	Valdis.Kletnieks@...edu, Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel@...r.kernel.org
Subject: Re: mmotm 2010-08-11 - RCU whinge during very early boot

On Tue, Oct 05, 2010 at 12:05:13PM +0200, Zdenek Kabelac wrote:
> 2010/8/12  <Valdis.Kletnieks@...edu>:
> > On Wed, 11 Aug 2010 16:10:49 PDT, akpm@...ux-foundation.org said:
> >> The mm-of-the-moment snapshot 2010-08-11-16-10 has been uploaded to
> >>
> >>    http://userweb.kernel.org/~akpm/mmotm/
> >
> > Throws a RCU complaint.  Hopefully somebody on the cc: list knows what it is about...
> >
> > [    0.026136] CPU0: Intel(R) Core(TM)2 Duo CPU     P8700  @ 2.53GHz stepping 0a
> > [    0.028399] NMI watchdog enabled, takes one hw-pmu counter.
> > [    0.030019] lockdep: fixing up alternatives.
> > [    0.031178]
> > [    0.031179] ===================================================
> > [    0.031182] [ INFO: suspicious rcu_dereference_check() usage. ]
> > [    0.031184] ---------------------------------------------------
> > [    0.031187] kernel/sched.c:618 invoked rcu_dereference_check() without protection!
> > [    0.031189]
> > [    0.031189] other info that might help us debug this:
> > [    0.031190]
> > [    0.031192]
> > [    0.031193] rcu_scheduler_active = 1, debug_locks = 1
> > [    0.031195] 3 locks held by kworker/0:0/4:
> > [    0.031197]  #0:  (events){+.+.+.}, at: [<ffffffff810504ca>] process_one_work+0x1b6/0x37d
> > [    0.031210]  #1:  ((&c_idle.work)){+.+.+.}, at: [<ffffffff810504ca>] process_one_work+0x1b6/0x37d
> > [    0.031217]  #2:  (&rq->lock){-.-...}, at: [<ffffffff81b5f9b8>] init_idle+0x2b/0x114
> > [    0.031225]
> > [    0.031226] stack backtrace:
> > [    0.031229] Pid: 4, comm: kworker/0:0 Not tainted 2.6.35-mmotm0811 #1
> > [    0.031232] Call Trace:
> > [    0.031237]  [<ffffffff810661eb>] lockdep_rcu_dereference+0x9d/0xa5
> > [    0.031242]  [<ffffffff8102b751>] task_group+0x7b/0x8a
> > [    0.031246]  [<ffffffff81b5f9b8>] ? init_idle+0x2b/0x114
> > [    0.031250]  [<ffffffff8102b775>] set_task_rq+0x15/0x6e
> > [    0.031253]  [<ffffffff81b5fa5e>] init_idle+0xd1/0x114
> > [    0.031257]  [<ffffffff81b5fb44>] fork_idle+0x8e/0x9d
> > [    0.031261]  [<ffffffff81b5de6f>] do_fork_idle+0x17/0x28
> > [    0.031265]  [<ffffffff8105052b>] process_one_work+0x217/0x37d
> > [    0.031269]  [<ffffffff810504ca>] ? process_one_work+0x1b6/0x37d
> > [    0.031273]  [<ffffffff81b5de58>] ? do_fork_idle+0x0/0x28
> > [    0.031277]  [<ffffffff81051775>] worker_thread+0x17e/0x251
> > [    0.031281]  [<ffffffff810515f7>] ? worker_thread+0x0/0x251
> > [    0.031285]  [<ffffffff8105544a>] kthread+0x7d/0x85
> > [    0.031290]  [<ffffffff81003554>] kernel_thread_helper+0x4/0x10
> > [    0.031295]  [<ffffffff81558d80>] ? restore_args+0x0/0x30
> > [    0.031299]  [<ffffffff810553cd>] ? kthread+0x0/0x85
> > [    0.031303]  [<ffffffff81003550>] ? kernel_thread_helper+0x0/0x10
> > [    0.031333] Booting Node   0, Processors  #1 Ok.
> > [    0.103111] NMI watchdog enabled, takes one hw-pmu counter.
> > [    0.104013] Brought up 2 CPUs
> >
> 
> I'm still seeing this INFO message on my vanilla 2.6.36-rc kernel.
> 
> ----------------------
> 
> ftrace: converting mcount calls to 0f 1f 44 00 00
> ftrace: allocating 16045 entries in 63 pages
> Setting APIC routing to flat
> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> CPU0: Intel(R) Core(TM)2 Duo CPU     T7500  @ 2.20GHz stepping 0a
> NMI watchdog enabled, takes one hw-pmu counter.
> lockdep: fixing up alternatives.
> 
> ===================================================
> [ INFO: suspicious rcu_dereference_check() usage. ]
> ---------------------------------------------------
> kernel/sched.c:618 invoked rcu_dereference_check() without protection!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 1, debug_locks = 0
> 3 locks held by kworker/0:0/4:
>  #0:  (events){+.+.+.}, at: [<ffffffff8106e78e>] process_one_work+0x12e/0x560
>  #1:  ((&c_idle.work)){+.+.+.}, at: [<ffffffff8106e78e>]
> process_one_work+0x12e/0x560
>  #2:  (&rq->lock){......}, at: [<ffffffff814772c3>] init_idle+0x30/0x12c
> 
> stack backtrace:
> Pid: 4, comm: kworker/0:0 Not tainted 2.6.36-rc6-00085-g6e34025 #1
> Call Trace:
>  [<ffffffff81089c1b>] lockdep_rcu_dereference+0xbb/0xc0
>  [<ffffffff8103e7d5>] set_task_rq+0x2f5/0x300
>  [<ffffffff81477374>] init_idle+0xe1/0x12c
>  [<ffffffff81477769>] fork_idle+0x90/0x9f
>  [<ffffffff81048dfa>] ? enqueue_entity+0x13a/0x430
>  [<ffffffff81482789>] ? sub_preempt_count+0x59/0x60
>  [<ffffffff814752f5>] do_fork_idle+0x1c/0x2d
>  [<ffffffff8106e7fa>] process_one_work+0x19a/0x560
>  [<ffffffff8106e78e>] ? process_one_work+0x12e/0x560
>  [<ffffffff814752d9>] ? do_fork_idle+0x0/0x2d
>  [<ffffffff81070189>] worker_thread+0x169/0x340
>  [<ffffffff81070020>] ? worker_thread+0x0/0x340
>  [<ffffffff810752e6>] kthread+0xa6/0xb0
>  [<ffffffff81004014>] kernel_thread_helper+0x4/0x10
>  [<ffffffff8147efcb>] ? _raw_spin_unlock_irq+0x3b/0x60
>  [<ffffffff8147f600>] ? restore_args+0x0/0x30
>  [<ffffffff81075240>] ? kthread+0x0/0xb0
>  [<ffffffff81004010>] ? kernel_thread_helper+0x0/0x10
> Booting Node   0, Processors  #1 Ok.
> TSC synchronization [CPU#0 -> CPU#1]:
> Measured 399476 cycles TSC warp between CPUs, turning off TSC clock.
> Marking TSC unstable due to check_tsc_sync_source failed
> NMI watchdog enabled, takes one hw-pmu counter.
> Brought up 2 CPUs
> Total of 2 processors activated (8781.86 BogoMIPS).
> regulator: core version 0.5
> regulator: dummy:
> Time:  7:44:09  Date: 10/05/10

Hello, Zdenek,

I believe that the following patch from Peter Z. should address this.

							Thanx, Paul

------------------------------------------------------------------------

commit e3dd67d97b3c2aad366b845c797745a78efaf90d
Author: Peter Zijlstra <peterz@...radead.org>
Date:   Thu Sep 16 17:50:31 2010 +0200

    sched: fix RCU lockdep splat from task_group()
    
    This addresses the following RCU lockdep splat:
    
    [0.051203] CPU0: AMD QEMU Virtual CPU version 0.12.4 stepping 03
    [0.052999] lockdep: fixing up alternatives.
    [0.054105]
    [0.054106] ===================================================
    [0.054999] [ INFO: suspicious rcu_dereference_check() usage. ]
    [0.054999] ---------------------------------------------------
    [0.054999] kernel/sched.c:616 invoked rcu_dereference_check() without protection!
    [0.054999]
    [0.054999] other info that might help us debug this:
    [0.054999]
    [0.054999]
    [0.054999] rcu_scheduler_active = 1, debug_locks = 1
    [0.054999] 3 locks held by swapper/1:
    [0.054999]  #0:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff814be933>] cpu_up+0x42/0x6a
    [0.054999]  #1:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810400d8>] cpu_hotplug_begin+0x2a/0x51
    [0.054999]  #2:  (&rq->lock){-.-...}, at: [<ffffffff814be2f7>] init_idle+0x2f/0x113
    [0.054999]
    [0.054999] stack backtrace:
    [0.054999] Pid: 1, comm: swapper Not tainted 2.6.35 #1
    [0.054999] Call Trace:
    [0.054999]  [<ffffffff81068054>] lockdep_rcu_dereference+0x9b/0xa3
    [0.054999]  [<ffffffff810325c3>] task_group+0x7b/0x8a
    [0.054999]  [<ffffffff810325e5>] set_task_rq+0x13/0x40
    [0.054999]  [<ffffffff814be39a>] init_idle+0xd2/0x113
    [0.054999]  [<ffffffff814be78a>] fork_idle+0xb8/0xc7
    [0.054999]  [<ffffffff81068717>] ? mark_held_locks+0x4d/0x6b
    [0.054999]  [<ffffffff814bcebd>] do_fork_idle+0x17/0x2b
    [0.054999]  [<ffffffff814bc89b>] native_cpu_up+0x1c1/0x724
    [0.054999]  [<ffffffff814bcea6>] ? do_fork_idle+0x0/0x2b
    [0.054999]  [<ffffffff814be876>] _cpu_up+0xac/0x127
    [0.054999]  [<ffffffff814be946>] cpu_up+0x55/0x6a
    [0.054999]  [<ffffffff81ab562a>] kernel_init+0xe1/0x1ff
    [0.054999]  [<ffffffff81003854>] kernel_thread_helper+0x4/0x10
    [0.054999]  [<ffffffff814c353c>] ? restore_args+0x0/0x30
    [0.054999]  [<ffffffff81ab5549>] ? kernel_init+0x0/0x1ff
    [0.054999]  [<ffffffff81003850>] ? kernel_thread_helper+0x0/0x10
    [0.056074] Booting Node   0, Processors  #1lockdep: fixing up alternatives.
    [0.130045]  #2lockdep: fixing up alternatives.
    [0.203089]  #3 Ok.
    [0.275286] Brought up 4 CPUs
    [0.276005] Total of 4 processors activated (16017.17 BogoMIPS).
    
    The cgroup_subsys_state structures referenced by idle tasks are never
    freed, because the idle tasks should be part of the root cgroup,
    which is not removable.
    
    The problem is that while we do in-fact hold rq->lock, the newly spawned
    idle thread's cpu is not yet set to the correct cpu so the lockdep check
    in task_group():
    
      lockdep_is_held(&task_rq(p)->lock)
    
    will fail.
    
    But this is a chicken and egg problem.  Setting the CPU's runqueue requires
    that the CPU's runqueue already be set.  ;-)
    
    So insert an RCU read-side critical section to avoid the complaint.
    
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
    Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>

diff --git a/kernel/sched.c b/kernel/sched.c
index 09b574e..40e065e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5331,7 +5331,19 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
 	idle->se.exec_start = sched_clock();
 
 	cpumask_copy(&idle->cpus_allowed, cpumask_of(cpu));
+	/*
+	 * We're having a chicken and egg problem, even though we are
+	 * holding rq->lock, the cpu isn't yet set to this cpu so the
+	 * lockdep check in task_group() will fail.
+	 *
+	 * Similar case to sched_fork(). / Alternatively we could
+	 * use task_rq_lock() here and obtain the other rq->lock.
+	 *
+	 * Silence PROVE_RCU
+	 */
+	rcu_read_lock();
 	__set_task_cpu(idle, cpu);
+	rcu_read_unlock();
 
 	rq->curr = rq->idle = idle;
 #if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/