[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250706171922.235086-2-joelagnelf@nvidia.com>
Date: Sun, 6 Jul 2025 13:19:21 -0400
From: Joel Fernandes <joelagnelf@...dia.com>
To: linux-kernel@...r.kernel.org,
"Paul E. McKenney" <paulmck@...nel.org>,
Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
Joel Fernandes <joelagnelf@...dia.com>,
Josh Triplett <josh@...htriplett.org>,
Boqun Feng <boqun.feng@...il.com>,
Uladzislau Rezki <urezki@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Lai Jiangshan <jiangshanlai@...il.com>,
Zqiang <qiang.zhang@...ux.dev>,
Jonathan Corbet <corbet@....net>
Cc: rcu@...r.kernel.org,
linux-doc@...r.kernel.org
Subject: [PATCH 2/2] rcu: Document separation of rcu_state and rnp's gp_seq
The details of this are subtle and was discussed recently. Add a
quick-quiz about this and refer to it from the code, for more clarity.
Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
---
.../Data-Structures/Data-Structures.rst | 32 +++++++++++++++++++
kernel/rcu/tree.c | 4 +++
2 files changed, 36 insertions(+)
diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
index 04e16775c752..930535f076b4 100644
--- a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
+++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
@@ -286,6 +286,38 @@ in order to detect the beginnings and ends of grace periods in a
distributed fashion. The values flow from ``rcu_state`` to ``rcu_node``
(down the tree from the root to the leaves) to ``rcu_data``.
++-----------------------------------------------------------------------+
+| **Quick Quiz**: |
++-----------------------------------------------------------------------+
+| Given that the root rcu_node structure has a gp_seq field, |
+| why does RCU maintain a separate gp_seq in the rcu_state structure? |
+| Why not just use the root rcu_node's gp_seq as the official record |
+| and update it directly when starting a new grace period? |
++-----------------------------------------------------------------------+
+| **Answer**: |
++-----------------------------------------------------------------------+
+| On single-node RCU trees (where the root node is also a leaf), |
+| updating the root node's gp_seq immediately would create unnecessary |
+| lock contention. Here's why: |
+| |
+| If we did rcu_seq_start() directly on the root node's gp_seq: |
+| 1. All CPUs would immediately see their node's gp_seq from their rdp's|
+| gp_seq, in rcu_pending(). They would all then invoke the RCU-core. |
+| 2. Which calls note_gp_changes() and try to acquire the node lock. |
+| 3. But rnp->qsmask isn't initialized yet (happens later in |
+| rcu_gp_init()) |
+| 4. So each CPU would acquire the lock, find it can't determine if it |
+| needs to report quiescent state (no qsmask), update rdp->gp_seq, |
+| and release the lock. |
+| 5. Result: Lots of lock acquisitions with no grace period progress |
+| |
+| By having a separate rcu_state.gp_seq, we can increment the official |
+| grace period counter without immediately affecting what CPUs see in |
+| their nodes. The hierarchical propagation in rcu_gp_init() then |
+| updates the root node's gp_seq and qsmask together under the same lock|
+| acquisition, avoiding this useless contention. |
++-----------------------------------------------------------------------+
+
Miscellaneous
'''''''''''''
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 32fdb66e9c9f..c31b85e62310 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1842,6 +1842,10 @@ static noinline_for_stack bool rcu_gp_init(void)
* use-after-free errors. For a detailed explanation of this race, see
* Documentation/RCU/Design/Requirements/Requirements.rst in the
* "Hotplug CPU" section.
+ *
+ * Also note that the root rnp's gp_seq is kept separate from, and lags,
+ * the rcu_state's gp_seq, for a reason. See the Quick-Quiz on
+ * Single-node systems for more details (in Data-Structures.rst).
*/
rcu_seq_start(&rcu_state.gp_seq);
/* Ensure that rcu_seq_done_exact() guardband doesn't give false positives. */
--
2.43.0
Powered by blists - more mailing lists