lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250211223807.1834978-1-joelagnelf@nvidia.com>
Date: Tue, 11 Feb 2025 17:38:07 -0500
From: Joel Fernandes <joelagnelf@...dia.com>
To: linux-kernel@...r.kernel.org,
	"Paul E. McKenney" <paulmck@...nel.org>,
	Frederic Weisbecker <frederic@...nel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
	Joel Fernandes <joel@...lfernandes.org>,
	Josh Triplett <josh@...htriplett.org>,
	Boqun Feng <boqun.feng@...il.com>,
	Uladzislau Rezki <urezki@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	Zqiang <qiang.zhang1211@...il.com>
Cc: "Joel Fernandes" <joelagnelf@...dia.com>,
	rcu@...r.kernel.org
Subject: [PATCH v2] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done()

From: "Joel Fernandes" <joelagnelf@...dia.com>

The rcu_seq_done() API has a large "false-negative" windows of size
ULONG_MAX/2, where after wrap around, it is possible that it will think
that a GP has not completed if a wrap around happens and the delta is
large.

One place this might cause a possible problem is SRCU:

poll_state_synchronize_srcu() uses rcu_seq_done() unlike
poll_state_synchronize_rcu() which uses rcu_seq_done_exact().

The  rcu_seq_done_exact() makes more sense for polling API, as
there is a higher chance that there is a significant delay between the
get_state..() and poll_state..() calls.

Another place where this seems scary is if the condition for the wakeup
was false causing missed wakeups, example in tree-nocb:

        swait_event_interruptible_exclusive(
            rnp->nocb_gp_wq[rcu_seq_ctr(wait_gp_seq) & 0x1],
            rcu_seq_done(&rnp->gp_seq, wait_gp_seq) ||
            !READ_ONCE(my_rdp->nocb_gp_sleep));

The shorter false-negative window of rcu_seq_done_exact() would improve
robustness as rcu_seq_done_exact() makes the window of false-negativity
by only ~2-3 GPs versus ULONG_MAX/2. It also results in a negative code
delta and could potentially avoid issues in the future where
rcu_seq_done() was reporting false-negatives for too long.

One downside of this change is the slightly higher computation, but it
is trivial computation and I think is worth it.

rcutorture runs of all scenarios for 15 minutes passed. Code inspection
was done thoroughly for all users to convince the change would work.
Further inspection reveals it is more robust so it is more than a
cleanup.

Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
---
v1->v2:
    Updated commit message with more analysis and points.

 kernel/rcu/rcu.h  | 13 ++-----------
 kernel/rcu/tree.c |  6 +++---
 2 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index eed2951a4962..c2ca196907cb 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s)
 
 /*
  * Given a snapshot from rcu_seq_snap(), determine whether or not a
- * full update-side operation has occurred.
+ * full update-side operation has occurred while also handling
+ * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band.
  */
 static inline bool rcu_seq_done(unsigned long *sp, unsigned long s)
-{
-	return ULONG_CMP_GE(READ_ONCE(*sp), s);
-}
-
-/*
- * Given a snapshot from rcu_seq_snap(), determine whether or not a
- * full update-side operation has occurred, but do not allow the
- * (ULONG_MAX / 2) safety-factor/guard-band.
- */
-static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s)
 {
 	unsigned long cur_s = READ_ONCE(*sp);
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b77ccc55557b..835600cec9ba 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full);
 bool poll_state_synchronize_rcu(unsigned long oldstate)
 {
 	if (oldstate == RCU_GET_STATE_COMPLETED ||
-	    rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) {
+	    rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) {
 		smp_mb(); /* Ensure GP ends before subsequent accesses. */
 		return true;
 	}
@@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
 
 	smp_mb(); // Order against root rcu_node structure grace-period cleanup.
 	if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED ||
-	    rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) ||
+	    rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) ||
 	    rgosp->rgos_exp == RCU_GET_STATE_COMPLETED ||
-	    rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) {
+	    rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) {
 		smp_mb(); /* Ensure GP ends before subsequent accesses. */
 		return true;
 	}
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ