linux-kernel - Re: [GIT PULL rcu/next] RCU commits for 4.6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160308181807.GA29849@linux.vnet.ibm.com>
Date:	Tue, 8 Mar 2016 10:18:07 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	linux-kernel@...r.kernel.org, yang.shi@...aro.org, tj@...nel.org,
	paul.gortmaker@...driver.com, boqun.feng@...il.com,
	tglx@...utronix.de, gang.chen.5i5j@...il.com, sj38.park@...il.com
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.6

On Tue, Mar 08, 2016 at 07:21:09AM -0800, Paul E. McKenney wrote:
> On Tue, Mar 08, 2016 at 09:53:42AM +0100, Ingo Molnar wrote:
> > * Paul E. McKenney <paulmck@...ux.vnet.ibm.com> wrote:

[ . . . ]

> > Pulled, thanks a lot Paul!
> > 
> > So I've done the conflict resolutions with tmp:smp/hotplug and tip:sched/core 
> > myself, and came up with a mostly identical resolution, except this difference 
> > with your resolution in wagi.2016.03.01a:
> > 
> > --- linux-next/kernel/rcu/tree.c
> > +++ tip/kernel/rcu/tree.c
> > @@ -2046,8 +2046,8 @@ static void rcu_gp_cleanup(struct rcu_st
> >  		/* smp_mb() provided by prior unlock-lock pair. */
> >  		nocb += rcu_future_gp_cleanup(rsp, rnp);
> >  		sq = rcu_nocb_gp_get(rnp);
> > -		raw_spin_unlock_irq_rcu_node(rnp);
> >  		rcu_nocb_gp_cleanup(sq);
> > +		raw_spin_unlock_irq_rcu_node(rnp);
> >  		cond_resched_rcu_qs();
> >  		WRITE_ONCE(rsp->gp_activity, jiffies);
> >  		rcu_gp_slow(rsp, gp_cleanup_delay);
> > 
> > but your resolution is better, rcu_nocb_gp_cleanup() can (and should) be done 
> > outside of the rcu_node lock.
> > 
> > So we have the same resolution now, which is good! ;-)
> 
> Glad we were close!
> 
> Just for purposes of satisfying curiosity, I am running rcutorture on your
> version.  ;-)

And for whatever it is worth, in one of the sixteen rcutorture scenarios
lockdep complained as shown below.

On the other hand, your version quite possibly makes a lost-wakeup bug
happen more frequently.  If my current quest to create a torture test
specific to this bug fails, I will revisit your patch.  So despite the
lockdep splat, it is quite possible that I will be thanking you for it
at some point.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

[    0.546319] =================================
[    0.547000] [ INFO: inconsistent lock state ]
[    0.547000] 4.5.0-rc6+ #1 Not tainted
[    0.547000] ---------------------------------
[    0.547000] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[    0.547000] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[    0.547000]  (rcu_node_2){+.?...}, at: [<ffffffff810bd9e4>] rcu_process_callbacks+0xf4/0x860
[    0.547000] {SOFTIRQ-ON-W} state was registered at:
[    0.547000]   [<ffffffff810a22c6>] mark_held_locks+0x66/0x90
[    0.547000]   [<ffffffff810a23e4>] trace_hardirqs_on_caller+0xf4/0x1c0
[    0.547000]   [<ffffffff810a24bd>] trace_hardirqs_on+0xd/0x10
[    0.547000]   [<ffffffff8197e987>] _raw_spin_unlock_irq+0x27/0x50
[    0.547000]   [<ffffffff8109cd36>] swake_up_all+0xb6/0xd0
[    0.547000]   [<ffffffff810bd375>] rcu_gp_kthread+0x835/0xaf0
[    0.547000]   [<ffffffff8107b04f>] kthread+0xdf/0x100
[    0.547000]   [<ffffffff8197f4ff>] ret_from_fork+0x3f/0x70
[    0.547000] irq event stamp: 34721
[    0.547000] hardirqs last  enabled at (34720): [<ffffffff810baf33>] note_gp_changes+0x43/0xa0
[    0.547000] hardirqs last disabled at (34721): [<ffffffff8197e767>] _raw_spin_lock_irqsave+0x17/0x60
[    0.547000] softirqs last  enabled at (34712): [<ffffffff8105d84c>] _local_bh_enable+0x1c/0x50
[    0.547000] softirqs last disabled at (34713): [<ffffffff8105edb5>] irq_exit+0xa5/0xb0
[    0.547000] 
[    0.547000] other info that might help us debug this:
[    0.547000]  Possible unsafe locking scenario:
[    0.547000] 
[    0.547000]        CPU0
[    0.547000]        ----
[    0.547000]   lock(rcu_node_2);
[    0.547000]   <Interrupt>
[    0.547000]     lock(rcu_node_2);
[    0.547000] 
[    0.547000]  *** DEADLOCK ***
[    0.547000] 
[    0.547000] no locks held by swapper/0/0.
[    0.547000] 
[    0.547000] stack backtrace:
[    0.547000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.5.0-rc6+ #1
[    0.547000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[    0.547000]  0000000000000000 ffff88001fc03cc0 ffffffff813730ee ffffffff81e1d500
[    0.547000]  ffffffff83874a20 ffff88001fc03d10 ffffffff8113cf77 0000000000000001
[    0.547000]  ffffffff00000000 ffff880000000000 0000000000000006 ffffffff81e1d500
[    0.547000] Call Trace:
[    0.547000]  <IRQ>  [<ffffffff813730ee>] dump_stack+0x67/0x99
[    0.547000]  [<ffffffff8113cf77>] print_usage_bug+0x1f2/0x203
[    0.547000]  [<ffffffff810a1840>] ? check_usage_backwards+0x120/0x120
[    0.547000]  [<ffffffff810a21d2>] mark_lock+0x212/0x2a0
[    0.547000]  [<ffffffff810a2b17>] __lock_acquire+0x397/0x1b50
[    0.547000]  [<ffffffff8108ebce>] ? update_blocked_averages+0x3e/0x4a0
[    0.547000]  [<ffffffff8197e945>] ? _raw_spin_unlock_irqrestore+0x55/0x70
[    0.547000]  [<ffffffff810a2394>] ? trace_hardirqs_on_caller+0xa4/0x1c0
[    0.547000]  [<ffffffff8109625a>] ? rebalance_domains+0x10a/0x3b0
[    0.547000]  [<ffffffff810a4ae5>] lock_acquire+0xc5/0x1e0
[    0.547000]  [<ffffffff810bd9e4>] ? rcu_process_callbacks+0xf4/0x860
[    0.547000]  [<ffffffff8197e791>] _raw_spin_lock_irqsave+0x41/0x60
[    0.547000]  [<ffffffff810bd9e4>] ? rcu_process_callbacks+0xf4/0x860
[    0.547000]  [<ffffffff810bd9e4>] rcu_process_callbacks+0xf4/0x860
[    0.547000]  [<ffffffff810966c8>] ? run_rebalance_domains+0x1c8/0x1f0
[    0.547000]  [<ffffffff8105e7b9>] __do_softirq+0x139/0x490
[    0.547000]  [<ffffffff8105edb5>] irq_exit+0xa5/0xb0
[    0.547000]  [<ffffffff8103d2cd>] smp_apic_timer_interrupt+0x3d/0x50
[    0.547000]  [<ffffffff8197ff29>] apic_timer_interrupt+0x89/0x90
[    0.547000]  <EOI>  [<ffffffff8100e738>] ? default_idle+0x18/0x1a0
[    0.547000]  [<ffffffff8100e736>] ? default_idle+0x16/0x1a0
[    0.547000]  [<ffffffff8100f11a>] arch_cpu_idle+0xa/0x10
[    0.547000]  [<ffffffff8109d035>] default_idle_call+0x25/0x40
[    0.547000]  [<ffffffff8109d2e8>] cpu_startup_entry+0x298/0x3c0
[    0.547000]  [<ffffffff8197768f>] rest_init+0x12f/0x140
[    0.547000]  [<ffffffff81977560>] ? csum_partial_copy_generic+0x170/0x170
[    0.547000]  [<ffffffff81f6ffd5>] start_kernel+0x435/0x442
[    0.547000]  [<ffffffff81f6f98e>] ? set_init_arg+0x55/0x55
[    0.547000]  [<ffffffff81f6f5ad>] x86_64_start_reservations+0x2a/0x2c
[    0.547000]  [<ffffffff81f6f699>] x86_64_start_kernel+0xea/0xed