linux-kernel - Re: [PATCH] rcu: fix an infinite loop in rcu_gp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191215201646.GK2889@paulmck-ThinkPad-P72>
Date:   Sun, 15 Dec 2019 12:16:46 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Qian Cai <cai@....pw>
Cc:     josh@...htriplett.org, rostedt@...dmis.org,
        mathieu.desnoyers@...icios.com, jiangshanlai@...il.com,
        joel@...lfernandes.org, tj@...nel.org, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rcu: fix an infinite loop in rcu_gp_cleanup()

On Sun, Dec 15, 2019 at 01:52:42AM -0500, Qian Cai wrote:
> The commit 82150cb53dcb ("rcu: React to callback overload by
> aggressively seeking quiescent states") introduced an infinite loop
> during boot here,
> 
> // Reset overload indication for CPUs no longer overloaded
> for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) {
> 	rdp = per_cpu_ptr(&rcu_data, cpu);
> 	check_cb_ovld_locked(rdp, rnp);
> }
> 
> because on an affected machine,
> 
> rnp->cbovldmask = 0
> rnp->grphi = 127
> rnp->grplo = 0

Your powerpc.config file from your first email shows:

	rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=128

So this should be the root rcu_node structure (as opposed to one of the
eight leaf rcu_node structures, each of which should have the difference
between rnp->grphi and rnp->grplo equal to 15).  And RCU should not be
invoking for_each_leaf_node_cpu_mask() on this rcu_node structure.

> It ends up with "cpu" is always 64 and never be able to get out of the
> loop due to "cpu <= rnp->grphi". It is pointless to enter the loop when
> the cpumask is 0 as there is no CPU would be able to match it.
> 
> Fixes: 82150cb53dcb ("rcu: React to callback overload by aggressively seeking quiescent states")
> Signed-off-by: Qian Cai <cai@....pw>
> ---
>  kernel/rcu/rcu.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index ab504fbc76ca..fb691ec86df4 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -363,7 +363,7 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
>  	((rnp)->grplo + find_next_bit(&(mask), BITS_PER_LONG, (cpu)))
>  #define for_each_leaf_node_cpu_mask(rnp, cpu, mask) \
>  	for ((cpu) = rcu_find_next_bit((rnp), 0, (mask)); \
> -	     (cpu) <= rnp->grphi; \
> +	     (cpu) <= rnp->grphi && (mask); \
>  	     (cpu) = rcu_find_next_bit((rnp), (cpu) + 1 - (rnp->grplo), (mask)))

This change cannot be right, but has to be one of the better bug reports
I have ever received, so thank you very much, greatly appreciated!!!

So if I say the above change cannot be right, what change might work?

Again, it would certainly be a bug to invoke for_each_leaf_node_cpu_mask()
on anything but one of the leaf rcu_node structures, as you might guess
from the name.  And as noted above, your dump of the rcu_node fields
above looks like it is exactly that bug that happened.  So let's review
the uses of this macro:

kernel/rcu/tree.c:

o	rcu_gp_cleanup() invokes for_each_leaf_node_cpu_mask() within a
	srcu_for_each_node_breadth_first() loop, which includes non-leaf
	rcu_node structures.  This is a bug!  Please see patch below.

o	force_qs_rnp() is OK because for_each_leaf_node_cpu_mask() is
	invoked within a rcu_for_each_leaf_node() loop.

kernel/rcu/tree_exp.h:

o	rcu_report_exp_cpu_mult() invokes it on whatever rcu_node structure
	was passed in:

	o	sync_rcu_exp_select_node_cpus() also relies on its
		caller (via workqueues) to do the right thing.

		o	sync_rcu_exp_select_cpus() invokes
			sync_rcu_exp_select_node_cpus() from within a
			rcu_for_each_leaf_node() loop, so is OK.

		o	sync_rcu_exp_select_cpus() also invokes
			sync_rcu_exp_select_node_cpus() indirectly
			via workqueues, but also from within the same
			rcu_for_each_leaf_node() loop, so is OK.

	o	rcu_report_exp_rdp() invokes rcu_report_exp_cpu_mult()
		on the rcu_node structure corresponding to some
		specific CPU, which will always be a leaf rcu_node
		structure.

Again, thank you very much for your testing and debugging efforts!
I have queued the three (almost untested) patches below, the first of
which I will fold into the buggy "rcu: React to callback overload by
aggressively seeking quiescent states" patch, the second of which I will
apply to prevent future bugs of this sort, even when running on small
systems, and the third of which will allow me to easily run rcutorture
tests on the larger systems that I have recently gained easy access to.

And the big question...  Does the first patch clear up your hangs?  ;-)
Either way, please let me know!

							Thanx, Paul

------------------------------------------------------------------------

commit e8d6182b015bdd8221164477f4ab1c307bd2fbe9
Author: Paul E. McKenney <paulmck@...nel.org>
Date:   Sun Dec 15 10:59:06 2019 -0800

    squash! rcu: React to callback overload by aggressively seeking quiescent states
    
    [ paulmck: Fix bug located by Qian Cai. ]
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1d0935e..48fba22 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1760,10 +1760,11 @@ static void rcu_gp_cleanup(void)
 		/* smp_mb() provided by prior unlock-lock pair. */
 		needgp = rcu_future_gp_cleanup(rnp) || needgp;
 		// Reset overload indication for CPUs no longer overloaded
-		for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) {
-			rdp = per_cpu_ptr(&rcu_data, cpu);
-			check_cb_ovld_locked(rdp, rnp);
-		}
+		if (rcu_is_leaf_node(rnp))
+			for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) {
+				rdp = per_cpu_ptr(&rcu_data, cpu);
+				check_cb_ovld_locked(rdp, rnp);
+			}
 		sq = rcu_nocb_gp_get(rnp);
 		raw_spin_unlock_irq_rcu_node(rnp);
 		rcu_nocb_gp_cleanup(sq);

------------------------------------------------------------------------

commit 793de75e086a51464e31d74746d4804816541d6b
Author: Paul E. McKenney <paulmck@...nel.org>
Date:   Sun Dec 15 11:38:57 2019 -0800

    rcu: Warn on for_each_leaf_node_cpu_mask() from non-leaf
    
    The for_each_leaf_node_cpu_mask() and for_each_leaf_node_possible_cpu()
    macros must be invoked only on leaf rcu_node structures.  Failing to
    abide by this restriction can result in infinite loops on systems with
    more than 64 CPUs (or for more than 32 CPUs on 32-bit systems).  This
    commit therefore adds WARN_ON_ONCE() calls to make misuse of these two
    macros easier to debug.
    
    Reported-by: Qian Cai <cai@....pw>
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 1779cbf..00ddc92 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -342,7 +342,8 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
  * Iterate over all possible CPUs in a leaf RCU node.
  */
 #define for_each_leaf_node_possible_cpu(rnp, cpu) \
-	for ((cpu) = cpumask_next((rnp)->grplo - 1, cpu_possible_mask); \
+	for (WARN_ON_ONCE(!rcu_is_leaf_node(rnp)), \
+	     (cpu) = cpumask_next((rnp)->grplo - 1, cpu_possible_mask); \
 	     (cpu) <= rnp->grphi; \
 	     (cpu) = cpumask_next((cpu), cpu_possible_mask))
 
@@ -352,7 +353,8 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
 #define rcu_find_next_bit(rnp, cpu, mask) \
 	((rnp)->grplo + find_next_bit(&(mask), BITS_PER_LONG, (cpu)))
 #define for_each_leaf_node_cpu_mask(rnp, cpu, mask) \
-	for ((cpu) = rcu_find_next_bit((rnp), 0, (mask)); \
+	for (WARN_ON_ONCE(!rcu_is_leaf_node(rnp)), \
+	     (cpu) = rcu_find_next_bit((rnp), 0, (mask)); \
 	     (cpu) <= rnp->grphi; \
 	     (cpu) = rcu_find_next_bit((rnp), (cpu) + 1 - (rnp->grplo), (mask)))
 
------------------------------------------------------------------------

commit 6ce2513f745a7b3d5f0a2ae20d1b7fedcf918a3b
Author: Paul E. McKenney <paulmck@...nel.org>
Date:   Sun Dec 15 12:11:56 2019 -0800

    rcutorture: Add 100-CPU configuration
    
    The small-system rcutorture configurations have served us well for a great
    many years, but it is now time to add a larger one.  This commit does
    just that, but does not add it to the defaults in CFLIST.  This allows
    the kvm.sh argument '--configs "4*CFLIST TREE10" to run four instances
    of each of the default configurations concurrently with one instance of
    the large configuration.
    
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE10 b/tools/testing/selftests/rcutorture/configs/rcu/TREE10
new file mode 100644
index 0000000..2debe78
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE10
@@ -0,0 +1,18 @@
+CONFIG_SMP=y
+CONFIG_NR_CPUS=100
+CONFIG_PREEMPT_NONE=y
+CONFIG_PREEMPT_VOLUNTARY=n
+CONFIG_PREEMPT=n
+#CHECK#CONFIG_TREE_RCU=y
+CONFIG_HZ_PERIODIC=n
+CONFIG_NO_HZ_IDLE=y
+CONFIG_NO_HZ_FULL=n
+CONFIG_RCU_FAST_NO_HZ=n
+CONFIG_RCU_TRACE=n
+CONFIG_RCU_NOCB_CPU=n
+CONFIG_DEBUG_LOCK_ALLOC=n
+CONFIG_PROVE_LOCKING=n
+#CHECK#CONFIG_PROVE_RCU=n
+CONFIG_DEBUG_OBJECTS=n
+CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
+CONFIG_RCU_EXPERT=n