lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180308034627.n2xw32zzujnmy2gb@tardis>
Date:   Thu, 8 Mar 2018 11:46:27 +0800
From:   Boqun Feng <boqun.feng@...il.com>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [PATCH] rcu: exp: Fix "must hold exp_mutex" comments for QS
 reporting functions

On Wed, Mar 07, 2018 at 07:48:29AM -0800, Paul E. McKenney wrote:
> On Wed, Mar 07, 2018 at 04:49:39PM +0800, Boqun Feng wrote:
> > Since commit d9a3da0699b2 ("rcu: Add expedited grace-period support for
> > preemptible RCU"), there are comments for some funtions in
> > rcu_report_exp_rnp()'s call-chain saying that exp_mutex or its
> > predecessors needs to be held.
> > 
> > However, exp_mutex and its predecessors are merely used for synchronize
> > between GPs, and it's clearly that all variables visited by those
> > functions are under the protection of rcu_node's ->lock. Moreover, those
> > functions are currently called without held exp_mutex, and seems that
> > doesn't introduce any trouble.
> > 
> > So this patch fix this problem by correcting the comments
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@...il.com>
> > Fixes: d9a3da0699b2 ("rcu: Add expedited grace-period support for preemptible RCU")
> 
> Good catch, applied!
> 
> These have been around for almost a decade!  ;-)  They happened because
> I ended up rewriting expedited support several times before it was
> accepted, and apparently failed to update the comments towards the end.
> 
> So thank you for catching this one!
> 
> But wouldn't it be better to use lockdep instead of comments in this case?
> 

Agreed. That's a good idea. And I might find something about this, so I
add this for sync_rcu_preempt_exp_done(), 

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 2fd882b08b7c..1fa394fe2f8a 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -20,6 +20,8 @@
  * Authors: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
  */
 
+#include <linux/lockdep.h>
+
 /*
  * Record the start of an expedited grace period.
  */
@@ -158,6 +160,8 @@ static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
  */
 static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp)
 {
+	BUG_ON(!lockdep_is_held(&rnp->lock));
+
 	return rnp->exp_tasks == NULL &&
 	       READ_ONCE(rnp->expmask) == 0;
 }

And I could got this with rcutorture (--configs TREE03 --bootargs
"rcutorture.gp_exp=1" --duration 10 --kconfig "CONFIG_PROVE_LOCKING=y"):

[    0.013629] ------------[ cut here ]------------
[    0.014000] kernel BUG at /home/fixme/linux-rcu/kernel/rcu/tree_exp.h:163!
[    0.014009] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[    0.014697] Modules linked in:
[    0.014992] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.16.0-rc1+ #1
[    0.015000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[    0.015000] RIP: 0010:sync_rcu_preempt_exp_done+0x30/0x40
[    0.015000] RSP: 0000:ffffffffb7003d00 EFLAGS: 00010246
[    0.015000] RAX: 0000000000000000 RBX: ffffffffb7064d00 RCX: 0000000000000003
[    0.015000] RDX: 0000000080000003 RSI: ffffffffb7064d18 RDI: ffffffffb7024da8
[    0.015000] RBP: ffffffffb7064d00 R08: 0000000000000000 R09: 0000000000000000
[    0.015000] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffb70686d0
[    0.015000] R13: ffffffffb765e2e0 R14: ffffffffb7064d00 R15: 0000000000000004
[    0.015000] FS:  0000000000000000(0000) GS:ffff94fb9fc00000(0000) knlGS:0000000000000000
[    0.015000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.015000] CR2: ffff94fb8e57f000 CR3: 000000000d01e000 CR4: 00000000000006f0
[    0.015000] Call Trace:
[    0.015000]  rcu_exp_wait_wake+0x6d/0x9e0
[    0.015000]  ? rcu_cleanup_dead_rnp+0x90/0x90
[    0.015000]  ? rcu_report_exp_cpu_mult+0x60/0x60
[    0.015000]  _synchronize_rcu_expedited+0x680/0x6f0
[    0.015000]  ? rcu_cleanup_dead_rnp+0x90/0x90
[    0.015000]  ? acpi_hw_read_port+0x4c/0xc2
[    0.015000]  ? acpi_hw_read+0xf4/0x153
[    0.015000]  synchronize_rcu.part.79+0x53/0x60
[    0.015000]  ? acpi_read_bit_register+0x38/0x69
[    0.015000]  rcu_test_sync_prims+0x5/0x20
[    0.015000]  rest_init+0x6/0xc0
[    0.015000]  start_kernel+0x447/0x467
[    0.015000]  secondary_startup_64+0xa5/0xb0
[    0.015000] Code: ff 48 89 fb 48 83 c7 18 e8 9e 52 fd ff 85 c0 74 1a 31 c0 48 83 bb c0 00 00 00 00 74 02 5b c3 48 8b 43 68 5b 48 85 c0 0f 94 c0 c3 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 41 56 49 89 fe 
[    0.015000] RIP: sync_rcu_preempt_exp_done+0x30/0x40 RSP: ffffffffb7003d00
[    0.015007] ---[ end trace 09539f6b90637d6c ]---


And I found in rcu_exp_wait_wake(), some of sync_rcu_preempt_exp_done()
are not protected by rcu_node's lock. I have a straight-forward fix
(along with the debug changes I mention above).

With this fix, rcutorture doesn't complain about the lock missing
anymore. I will continue to investigate whether missing lock is a
problem, but in the meanwhile, looking forward to your insight ;-)

Regards,
Boqun
---------------------------------->8
Subject: [PATCH] rcu: exp: Protect sync_rcu_preempt_exp_done() with rcu_node
 lock

Signed-off-by: Boqun Feng <boqun.feng@...il.com>
---
 kernel/rcu/tree_exp.h | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 2fd882b08b7c..0001ac0ce6a7 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -20,6 +20,8 @@
  * Authors: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
  */
 
+#include <linux/lockdep.h>
+
 /*
  * Record the start of an expedited grace period.
  */
@@ -158,10 +160,30 @@ static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
  */
 static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp)
 {
+	BUG_ON(!lockdep_is_held(&rnp->lock));
+
 	return rnp->exp_tasks == NULL &&
 	       READ_ONCE(rnp->expmask) == 0;
 }
 
+/*
+ * Like sync_rcu_preempt_exp_done(), but this function assumes the caller
+ * doesn't hold the rcu_node's ->lock, and will acquire and release the lock
+ * itself
+ */
+static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
+{
+	unsigned long flags;
+	bool ret;
+
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	ret = sync_rcu_preempt_exp_done(rnp);
+	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+
+	return ret;
+}
+
+
 /*
  * Report the exit from RCU read-side critical section for the last task
  * that queued itself during or before the current expedited preemptible-RCU
@@ -490,6 +512,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 	struct rcu_node *rnp;
 	struct rcu_node *rnp_root = rcu_get_root(rsp);
 	int ret;
+	unsigned long flags;
 
 	trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("startwait"));
 	jiffies_stall = rcu_jiffies_till_stall_check();
@@ -498,9 +521,9 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 	for (;;) {
 		ret = swait_event_timeout(
 				rsp->expedited_wq,
-				sync_rcu_preempt_exp_done(rnp_root),
+				sync_rcu_preempt_exp_done_unlocked(rnp_root),
 				jiffies_stall);
-		if (ret > 0 || sync_rcu_preempt_exp_done(rnp_root))
+		if (ret > 0 || sync_rcu_preempt_exp_done_unlocked(rnp_root))
 			return;
 		WARN_ON(ret < 0);  /* workqueues should not be signaled. */
 		if (rcu_cpu_stall_suppress)
@@ -533,8 +556,14 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 			rcu_for_each_node_breadth_first(rsp, rnp) {
 				if (rnp == rnp_root)
 					continue; /* printed unconditionally */
-				if (sync_rcu_preempt_exp_done(rnp))
+
+				raw_spin_lock_irqsave_rcu_node(rnp, flags);
+				if (sync_rcu_preempt_exp_done(rnp)) {
+					raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 					continue;
+				}
+				raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+
 				pr_cont(" l=%u:%d-%d:%#lx/%c",
 					rnp->level, rnp->grplo, rnp->grphi,
 					rnp->expmask,
-- 
2.16.2


Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ