lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240308224439.281349-2-joel@joelfernandes.org>
Date: Fri,  8 Mar 2024 17:44:38 -0500
From: "Joel Fernandes (Google)" <joel@...lfernandes.org>
To: linux-kernel@...r.kernel.org,
	frederic@...nel.org,
	boqun.feng@...il.com,
	urezki@...il.com,
	neeraj.iitr10@...il.com,
	joel@...lfernandes.org,
	rcu@...r.kernel.org,
	rostedt@...dmis.org,
	"Paul E. McKenney" <paulmck@...nel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
	Josh Triplett <josh@...htriplett.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	Zqiang <qiang.zhang1211@...il.com>
Subject: [PATCH v2 rcu/dev 2/2] rcu/tree: Add comments explaining now-offline-CPU QS reports

This a confusing piece of code (rightfully so as the issue it deals with
is complex). Recent discussions brought up a question -- what prevents the
rcu_implicit_dyntick_qs() from warning about QS reports for offline
CPUs.

QS reporting for now-offline CPUs should only happen from:
- gp_init()
- rcutree_cpu_report_dead()

Add some comments to this code explaining how QS reporting is not
missed when these functions are concurrently running.

Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
---
 kernel/rcu/tree.c | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bd29fe3c76bf..f3582f843a05 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1917,7 +1917,22 @@ static noinline_for_stack bool rcu_gp_init(void)
 		trace_rcu_grace_period_init(rcu_state.name, rnp->gp_seq,
 					    rnp->level, rnp->grplo,
 					    rnp->grphi, rnp->qsmask);
-		/* Quiescent states for tasks on any now-offline CPUs. */
+		/*
+		 * === Quiescent states for tasks on any now-offline CPUs. ===
+		 *
+		 * QS reporting for now-offline CPUs should only be performed from
+		 * either here, i.e., gp_init() or from rcutree_report_cpu_dead().
+		 *
+		 * Note that, when reporting quiescent states for now-offline CPUs,
+		 * the sequence of code doing those reports while also accessing
+		 * ->qsmask and ->qsmaskinitnext, has to be an atomic sequence so
+		 * that QS reporting is not missed! Otherwise it possible that
+		 * rcu_implicit_dyntick_qs() screams. This is ensured by keeping
+		 * the rnp->lock acquired throughout these QS-reporting
+		 * sequences, which is also acquired in
+		 * rcutree_report_cpu_dead(), so, acquiring ofl_lock is not
+		 * necessary here to synchronize with that function.
+		 */
 		mask = rnp->qsmask & ~rnp->qsmaskinitnext;
 		rnp->rcu_gp_init_mask = mask;
 		if ((mask || rnp->wait_blkd_tasks) && rcu_is_leaf_node(rnp))
@@ -5116,6 +5131,25 @@ void rcutree_report_cpu_dead(void)
 	raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
 	rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
 	rdp->rcu_ofl_gp_state = READ_ONCE(rcu_state.gp_state);
+
+	/*
+	 * === Quiescent state reporting for now-offline CPUs ===
+	 *
+	 * QS reporting for now-offline CPUs should only be performed from
+	 * either here, i.e. rcutree_report_cpu_dead(), or gp_init().
+	 *
+	 * Note that, when reporting quiescent states for now-offline CPUs,
+	 * the sequence of code doing those reports while also accessing
+	 * ->qsmask and ->qsmaskinitnext, has to be an atomic sequence so
+	 * that QS reporting is not missed! Otherwise it possible that
+	 * rcu_implicit_dyntick_qs() screams. This is ensured by keeping
+	 * the rnp->lock acquired throughout these QS-reporting sequences, which
+	 * is also acquired in gp_init().
+	 * One slight change to this rule is below, where we release and
+	 * reacquire the lock after a QS report, but before we clear the
+	 * ->qsmaskinitnext bit. That is OK to do, because gp_init() report a
+	 * QS again, if it acquired the rnp->lock before we reacquired below.
+	 */
 	if (rnp->qsmask & mask) { /* RCU waiting on outgoing CPU? */
 		/* Report quiescent state -before- changing ->qsmaskinitnext! */
 		rcu_disable_urgency_upon_qs(rdp);
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ