linux-kernel - Re: [BUG almost bisected] Splat in dequeue_rt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241008111150.GD17263@noisy.programming.kicks-ass.net>
Date: Tue, 8 Oct 2024 13:11:50 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: vschneid@...hat.com, linux-kernel@...r.kernel.org, sfr@...b.auug.org.au,
	linux-next@...r.kernel.org, kernel-team@...a.com
Subject: Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error

On Sun, Oct 06, 2024 at 01:44:53PM -0700, Paul E. McKenney wrote:

> With your patch, I got 24 failures out of 100 TREE03 runs of 18 hours
> each.  The failures were different, though, mostly involving boost
> failures in which RCU priority boosting didn't actually result in the
> low-priority readers getting boosted.  

Somehow I feel this is progress, albeit very minor :/

> There were also a number of "sched: DL replenish lagged too much"
> messages, but it looks like this was a symptom of the ftrace dump.
> 
> Given that this now involves priority boosting, I am trying 400*TREE03
> with each guest OS restricted to four CPUs to see if that makes things
> happen more quickly, and will let you know how this goes.
> 
> Any other debug I should apply?

The sched_pi_setprio tracepoint perhaps?

I've read all the RCU_BOOST and rtmutex code (once again), and I've been
running pi_stress with --sched id=low,policy=other to ensure the code
paths in question are taken. But so far so very nothing :/

(Noting that both RCU_BOOST and PI futexes use the same rt_mutex / PI API)

You know RCU_BOOST better than me.. then again, it is utterly weird this
is apparently affected. I've gotta ask, a kernel with my patch on and
additionally flipping kernel/sched/features.h:SCHED_FEAT(DELAY_DEQUEUE,
false) functions as expected?


One very minor thing I noticed while I read the code, do with as you
think best...

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 1c7cbd145d5e..95061119653d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1071,10 +1071,6 @@ static int rcu_boost(struct rcu_node *rnp)
 	 * Recheck under the lock: all tasks in need of boosting
 	 * might exit their RCU read-side critical sections on their own.
 	 */
-	if (rnp->exp_tasks == NULL && rnp->boost_tasks == NULL) {
-		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-		return 0;
-	}
 
 	/*
 	 * Preferentially boost tasks blocking expedited grace periods.
@@ -1082,10 +1078,13 @@ static int rcu_boost(struct rcu_node *rnp)
 	 * expedited grace period must boost all blocked tasks, including
 	 * those blocking the pre-existing normal grace period.
 	 */
-	if (rnp->exp_tasks != NULL)
-		tb = rnp->exp_tasks;
-	else
+	tb = rnp->exp_tasks;
+	if (!tb)
 		tb = rnp->boost_tasks;
+	if (!tb) {
+		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+		return 0;
+	}
 
 	/*
 	 * We boost task t by manufacturing an rt_mutex that appears to