[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150311230133.GN5412@linux.vnet.ibm.com>
Date: Wed, 11 Mar 2015 16:01:34 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Sasha Levin <sasha.levin@...cle.com>
Cc: Josh Triplett <josh@...htriplett.org>,
Steven Rostedt <rostedt@...dmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Lai Jiangshan <laijs@...fujitsu.com>,
LKML <linux-kernel@...r.kernel.org>,
Dave Jones <davej@...hat.com>
Subject: Re: rcu: frequent rcu lockups
On Wed, Mar 11, 2015 at 06:18:57PM -0400, Sasha Levin wrote:
> On 03/11/2015 04:41 PM, Paul E. McKenney wrote:
> > Aha, I forgot to update rcu/next. I have now updated it, so it should
> > make it there today or tomorrow. In the meantime, does the following
> > commit help?
> >
> > Also, how quickly does your test setup reproduce this?
>
> I usually hit it within an hour.
I clearly need to work more on rcutorture -- took tens of hours to hit
it there.
> With the commit I didn't hit it yet, but I do see 4 different WARNings:
I wish that I could say that I am surprised, but the sad fact is that
I am still shaking the bugs out.
> [ 1002.577249] WARNING: CPU: 19 PID: 22311 at kernel/rcu/tree.c:2190 rcu_report_qs_rnp+0x41e/0x590()
I haven't seen this particular one, but it is another likely symptom.
> [ 1002.582438] Modules linked in:
> [ 1002.582438] CPU: 19 PID: 22311 Comm: trinity-c58 Not tainted 4.0.0-rc3-next-20150311-sasha-00035-g0a45b48 #2051
> [ 1002.582438] ffff88047f200000 00000000727b89c5 ffff880672807cc8 ffffffff9ea97ec9
> [ 1002.582438] 0000000000000000 0000000000000000 ffff880672807d18 ffffffff941e242a
> [ 1002.582438] 0000000000000282 ffffffff943339be dffffc0000000000 ffffffffa1721000
> [ 1002.582438] Call Trace:
> [ 1002.582438] <IRQ> dump_stack (lib/dump_stack.c:52)
> [ 1002.582438] warn_slowpath_common (kernel/panic.c:447)
> [ 1002.582438] ? rcu_report_qs_rnp (kernel/rcu/tree.c:2190 (discriminator 3))
> [ 1002.582438] warn_slowpath_null (kernel/panic.c:481)
> [ 1002.582438] rcu_report_qs_rnp (kernel/rcu/tree.c:2190 (discriminator 3))
> [ 1002.582438] rcu_process_callbacks (kernel/rcu/tree.c:2288 kernel/rcu/tree.c:2324 kernel/rcu/tree.c:2816 kernel/rcu/tree.c:2849)
> [ 1002.582438] ? __lock_is_held (kernel/locking/lockdep.c:3518)
> [ 1002.582438] __do_softirq (kernel/softirq.c:273 include/linux/jump_label.h:114 include/trace/events/irq.h:126 kernel/softirq.c:274)
> [ 1002.582438] irq_exit (kernel/softirq.c:350 kernel/softirq.c:391)
> [ 1002.582438] smp_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:918)
> [ 1002.582438] apic_timer_interrupt (arch/x86/kernel/entry_64.S:920)
> [ 1002.582438] <EOI> ? __lock_is_held (kernel/locking/lockdep.c:3518)
> [ 1002.582438] ? lock_acquire (./arch/x86/include/asm/paravirt.h:809 kernel/locking/lockdep.c:3605)
> [ 1002.582438] ? rmap_walk (mm/rmap.c:1569 mm/rmap.c:1594)
> [ 1002.582438] down_read (./arch/x86/include/asm/rwsem.h:65 kernel/locking/rwsem.c:24)
> [ 1002.582438] ? rmap_walk (mm/rmap.c:1569 mm/rmap.c:1594)
> [ 1002.582438] ? up_read (./arch/x86/include/asm/rwsem.h:156 kernel/locking/rwsem.c:81)
> [ 1002.582438] ? page_remove_rmap (mm/rmap.c:1209)
> [ 1002.582438] rmap_walk (mm/rmap.c:1569 mm/rmap.c:1594)
> [ 1002.582438] try_to_munlock (mm/rmap.c:1446)
> [ 1002.582438] ? try_to_unmap (mm/rmap.c:1446)
> [ 1002.582438] ? page_remove_rmap (mm/rmap.c:1209)
> [ 1002.582438] ? invalid_migration_vma (mm/rmap.c:1382)
> [ 1002.582438] ? anon_vma_prepare (mm/rmap.c:497)
> [ 1002.582438] __munlock_isolated_page (mm/mlock.c:132)
> [ 1002.582438] __munlock_pagevec (mm/mlock.c:332)
> [ 1002.582438] ? __munlock_isolate_lru_page (mm/mlock.c:277)
> [ 1002.582438] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:154 kernel/locking/spinlock.c:183)
> [ 1002.582438] ? __get_locked_pte (mm/memory.c:1432)
> [ 1002.582438] munlock_vma_pages_range (mm/mlock.c:477)
> [ 1002.582438] ? munlock_vma_page (mm/mlock.c:424)
> [ 1002.582438] ? __lock_acquire (kernel/locking/lockdep.c:2019 kernel/locking/lockdep.c:3184)
> [ 1002.582438] ? kfree (mm/slub.c:1300 mm/slub.c:2707 mm/slub.c:3416)
> [ 1002.582438] ? __lock_acquire (kernel/locking/lockdep.c:2019 kernel/locking/lockdep.c:3184)
> [ 1002.582438] ? vma_merge (mm/mmap.c:1085)
> [ 1002.582438] mlock_fixup (mm/mlock.c:549)
> [ 1002.582438] do_mlockall (mm/mlock.c:677)
> [ 1002.582438] ? do_mlock (mm/mlock.c:658)
> [ 1002.582438] sys_munlockall (mm/mlock.c:720)
> [ 1002.582438] ia32_do_call (arch/x86/ia32/ia32entry.S:486)
>
> [ 1034.418016] WARNING: CPU: 0 PID: 8 at kernel/rcu/tree.c:1936 rcu_gp_kthread+0x2a7c/0x4dd0()
I have seen this one, tracking it down.
> [ 1034.419933] Modules linked in:
> [ 1034.420802] CPU: 0 PID: 8 Comm: rcu_preempt Tainted: G W 4.0.0-rc3-next-20150311-sasha-00035-g0a45b48 #2051
> [ 1034.423329] ffff8800496f8000 000000005fe03e34 ffff880049ac7b28 ffffffff9ea97ec9
> [ 1034.426460] 0000000000000000 0000000000000000 ffff880049ac7b78 ffffffff941e242a
> [ 1034.426460] 0000000000022000 ffffffff9433cafc ffffffffa1725040 00000001000110de
> [ 1034.426460] Call Trace:
> [ 1034.426460] dump_stack (lib/dump_stack.c:52)
> [ 1034.426460] warn_slowpath_common (kernel/panic.c:447)
> [ 1034.426460] ? rcu_gp_kthread (kernel/rcu/tree.c:1936 kernel/rcu/tree.c:2061)
> [ 1034.426460] warn_slowpath_null (kernel/panic.c:481)
> [ 1034.426460] rcu_gp_kthread (kernel/rcu/tree.c:1936 kernel/rcu/tree.c:2061)
> [ 1034.426460] ? plist_check_head (lib/plist.c:62)
> [ 1034.426460] ? cond_synchronize_rcu (kernel/rcu/tree.c:1972)
> [ 1034.426460] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2554 kernel/locking/lockdep.c:2601)
> [ 1034.426460] ? _raw_spin_unlock_irq (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:171 kernel/locking/spinlock.c:199)
> [ 1034.426460] ? __schedule (kernel/sched/core.c:2803)
> [ 1034.426460] ? ___preempt_schedule (arch/x86/lib/thunk_64.S:51)
> [ 1034.426460] ? abort_exclusive_wait (kernel/sched/wait.c:292)
> [ 1034.426460] ? cond_synchronize_rcu (kernel/rcu/tree.c:1972)
> [ 1034.426460] kthread (kernel/kthread.c:207)
> [ 1034.426460] ? flush_kthread_work (kernel/kthread.c:176)
> [ 1034.426460] ? _raw_spin_unlock_irq (./arch/x86/include/asm/paravirt.h:819 include/linux/spinlock_api_smp.h:170 kernel/locking/spinlock.c:199)
> [ 1034.426460] ? flush_kthread_work (kernel/kthread.c:176)
> [ 1034.426460] ret_from_fork (arch/x86/kernel/entry_64.S:565)
> [ 1034.426460] ? flush_kthread_work (kernel/kthread.c:176)
>
> [ 1034.501448] WARNING: CPU: 0 PID: 8 at kernel/rcu/tree_plugin.h:479 rcu_gp_kthread+0x1e58/0x4dd0()
And this one as well.
> [ 1034.505431] Modules linked in:
> [ 1034.507242] CPU: 0 PID: 8 Comm: rcu_preempt Tainted: G W 4.0.0-rc3-next-20150311-sasha-00035-g0a45b48 #2051
> [ 1034.509348] ffff8800496f8000 000000005fe03e34 ffff880049ac7b28 ffffffff9ea97ec9
> [ 1034.509348] 0000000000000000 0000000000000000 ffff880049ac7b78 ffffffff941e242a
> [ 1034.509348] ffffffffa1723038 ffffffff9433bed8 ffffffffa1723040 ffff880050622000
> [ 1034.509348] Call Trace:
> [ 1034.509348] dump_stack (lib/dump_stack.c:52)
> [ 1034.509348] warn_slowpath_common (kernel/panic.c:447)
> [ 1034.509348] ? rcu_gp_kthread (kernel/rcu/tree_plugin.h:479 kernel/rcu/tree.c:1834 kernel/rcu/tree.c:1993)
> [ 1034.509348] warn_slowpath_null (kernel/panic.c:481)
> [ 1034.509348] rcu_gp_kthread (kernel/rcu/tree_plugin.h:479 kernel/rcu/tree.c:1834 kernel/rcu/tree.c:1993)
> [ 1034.509348] ? plist_check_head (lib/plist.c:62)
> [ 1034.509348] ? cond_synchronize_rcu (kernel/rcu/tree.c:1972)
> [ 1034.509348] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2554 kernel/locking/lockdep.c:2601)
> [ 1034.509348] ? _raw_spin_unlock_irq (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:171 kernel/locking/spinlock.c:199)
> [ 1034.509348] ? __schedule (kernel/sched/core.c:2803)
> [ 1034.509348] ? ___preempt_schedule (arch/x86/lib/thunk_64.S:51)
> [ 1034.509348] ? abort_exclusive_wait (kernel/sched/wait.c:292)
> [ 1034.509348] ? cond_synchronize_rcu (kernel/rcu/tree.c:1972)
> [ 1034.509348] kthread (kernel/kthread.c:207)
> [ 1034.509348] ? flush_kthread_work (kernel/kthread.c:176)
> [ 1034.509348] ? _raw_spin_unlock_irq (./arch/x86/include/asm/paravirt.h:819 include/linux/spinlock_api_smp.h:170 kernel/locking/spinlock.c:199)
> [ 1034.509348] ? flush_kthread_work (kernel/kthread.c:176)
> [ 1034.509348] ret_from_fork (arch/x86/kernel/entry_64.S:565)
> [ 1034.509348] ? flush_kthread_work (kernel/kthread.c:176)
>
> [ 2733.830899] WARNING: CPU: 0 PID: 8 at kernel/rcu/tree.c:1935 rcu_gp_kthread+0x2aa6/0x4dd0()
And this one, too.
> [ 2733.839509] Modules linked in:
> [ 2733.839509] CPU: 0 PID: 8 Comm: rcu_preempt Tainted: G W 4.0.0-rc3-next-20150311-sasha-00035-g0a45b48 #2051
> [ 2733.839509] ffff8800496f8000 000000005fe03e34 ffff880049ac7b28 ffffffff9ea97ec9
> [ 2733.839509] 0000000000000000 0000000000000000 ffff880049ac7b78 ffffffff941e242a
> [ 2733.839509] 0000000000022000 ffffffff9433cb26 ffffffffa1723040 0000000100038724
> [ 2733.839509] Call Trace:
> [ 2733.839509] dump_stack (lib/dump_stack.c:52)
> [ 2733.839509] warn_slowpath_common (kernel/panic.c:447)
> [ 2733.839509] ? rcu_gp_kthread (kernel/rcu/tree.c:1935 kernel/rcu/tree.c:2061)
> [ 2733.839509] warn_slowpath_null (kernel/panic.c:481)
> [ 2733.839509] rcu_gp_kthread (kernel/rcu/tree.c:1935 kernel/rcu/tree.c:2061)
> [ 2733.839509] ? plist_check_head (lib/plist.c:62)
> [ 2733.839509] ? cond_synchronize_rcu (kernel/rcu/tree.c:1972)
> [ 2733.839509] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2554 kernel/locking/lockdep.c:2601)
> [ 2733.839509] ? _raw_spin_unlock_irq (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:171 kernel/locking/spinlock.c:199)
> [ 2733.839509] ? __schedule (kernel/sched/core.c:2803)
> [ 2733.839509] ? ___preempt_schedule (arch/x86/lib/thunk_64.S:51)
> [ 2733.839509] ? abort_exclusive_wait (kernel/sched/wait.c:292)
> [ 2733.839509] ? cond_synchronize_rcu (kernel/rcu/tree.c:1972)
> [ 2733.839509] kthread (kernel/kthread.c:207)
> [ 2733.839509] ? flush_kthread_work (kernel/kthread.c:176)
> [ 2733.839509] ? _raw_spin_unlock_irq (./arch/x86/include/asm/paravirt.h:819 include/linux/spinlock_api_smp.h:170 kernel/locking/spinlock.c:199)
> [ 2733.839509] ? flush_kthread_work (kernel/kthread.c:176)
> [ 2733.839509] ret_from_fork (arch/x86/kernel/entry_64.S:565)
> [ 2733.839509] ? flush_kthread_work (kernel/kthread.c:176)
I did locate a similar bug in the expedited RCU code path, and perhaps
that will provide a clue to the other problems as well.
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists