linux-kernel - [PATCH tip/core/rcu 13/14] documentation: Explain how RCU's combining tree fights contention

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1456290047-16654-13-git-send-email-paulmck@linux.vnet.ibm.com>
Date:	Tue, 23 Feb 2016 21:00:46 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	linux-kernel@...r.kernel.org
Cc:	mingo@...nel.org, jiangshanlai@...il.com, dipankar@...ibm.com,
	akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
	josh@...htriplett.org, tglx@...utronix.de, peterz@...radead.org,
	rostedt@...dmis.org, dhowells@...hat.com, edumazet@...gle.com,
	dvhart@...ux.intel.com, fweisbec@...il.com, oleg@...hat.com,
	bobby.prani@...il.com,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: [PATCH tip/core/rcu 13/14] documentation: Explain how RCU's combining tree fights contention

This commit adds a couple of paragraphs to the description of RCU's
combining tree explaining how the combining tree keeps lock contention
acceptably low, despite RCU grace periods being global operations.

Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
---
 .../Design/Data-Structures/Data-Structures.html    | 23 ++++++++++++++++++++++
 .../Design/Data-Structures/Data-Structures.htmlx   | 23 ++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html
index ba9fbb5177f6..d15744b87b99 100644
--- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html
+++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html
@@ -100,6 +100,29 @@ On the other hand, you can set <tt>CONFIG_RCU_FANOUT</tt> to be
 as small as 2 if you wish, which would permit only 16 CPUs, which
 is useful for testing.
 
+</p><p>This multi-level combining tree allows us to get most of the
+performance and scalability
+benefits of partitioning, even though RCU grace-period detection is
+inherently a global operation.
+The trick here is that only the last CPU to report a quiescent state
+into a given <tt>rcu_node</tt> structure need advance to the <tt>rcu_node</tt>
+structure at the next level up the tree.
+This means that at the leaf-level <tt>rcu_node</tt> structure, only
+one access out of sixteen will progress up the tree.
+For the internal <tt>rcu_node</tt> structures, the situation is even
+more extreme:  Only one access out of sixty-four will progress up
+the tree.
+Because the vast majority of the CPUs do not progress up the tree,
+the lock contention remains roughly constant up the tree.
+No matter how many CPUs there are in the system, at most 64 quiescent-state
+reports per grace period will progress all the way to the root
+<tt>rcu_node</tt> structure, thus ensuring that the lock contention
+on that root <tt>rcu_node</tt> structure remains acceptably low.
+
+</p><p>In effect, the combining tree acts like a big shock absorber,
+keeping lock contention under control at all tree levels regardless
+of the level of loading on the system.
+
 </p><p>The Linux kernel actually supports multiple flavors of RCU
 running concurrently, so RCU builds separate data structures for each
 flavor.
diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.htmlx b/Documentation/RCU/Design/Data-Structures/Data-Structures.htmlx
index c08fd8e9574a..8e88e3e7e2ef 100644
--- a/Documentation/RCU/Design/Data-Structures/Data-Structures.htmlx
+++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.htmlx
@@ -121,6 +121,29 @@ On the other hand, you can set <tt>CONFIG_RCU_FANOUT</tt> to be
 as small as 2 if you wish, which would permit only 16 CPUs, which
 is useful for testing.
 
+</p><p>This multi-level combining tree allows us to get most of the
+performance and scalability
+benefits of partitioning, even though RCU grace-period detection is
+inherently a global operation.
+The trick here is that only the last CPU to report a quiescent state
+into a given <tt>rcu_node</tt> structure need advance to the <tt>rcu_node</tt>
+structure at the next level up the tree.
+This means that at the leaf-level <tt>rcu_node</tt> structure, only
+one access out of sixteen will progress up the tree.
+For the internal <tt>rcu_node</tt> structures, the situation is even
+more extreme:  Only one access out of sixty-four will progress up
+the tree.
+Because the vast majority of the CPUs do not progress up the tree,
+the lock contention remains roughly constant up the tree.
+No matter how many CPUs there are in the system, at most 64 quiescent-state
+reports per grace period will progress all the way to the root
+<tt>rcu_node</tt> structure, thus ensuring that the lock contention
+on that root <tt>rcu_node</tt> structure remains acceptably low.
+
+</p><p>In effect, the combining tree acts like a big shock absorber,
+keeping lock contention under control at all tree levels regardless
+of the level of loading on the system.
+
 </p><p>The Linux kernel actually supports multiple flavors of RCU
 running concurrently, so RCU builds separate data structures for each
 flavor.
-- 
2.5.2