linux-kernel - Re: Performance Degradation After Upgrading to Kernel 6.8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmhzflkmvt0.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Wed, 27 Nov 2024 18:27:23 +0100
From: Valentin Schneider <vschneid@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>, Chenbo Lu
 <chenbo.lu@...yaviation.com>
Cc: stable@...r.kernel.org, regressions@...ts.linux.dev, mingo@...hat.com,
 juri.lelli@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: Performance Degradation After Upgrading to Kernel 6.8

Damn, I wrote a reply on the 20th but seems like I never sent it. At least
I found it saved somewhere, so I don't have to rewrite it...

On 20/11/24 10:03, Peter Zijlstra wrote:
> On Tue, Nov 19, 2024 at 04:30:02PM -0800, Chenbo Lu wrote:
>> Hello,
>> 
>> I am experiencing a significant performance degradation after
>> upgrading my kernel from version 6.6 to 6.8 and would appreciate any
>> insights or suggestions.
>> 
>> I am running a high-load simulation system that spawns more than 1000
>> threads and the overall CPU usage is 30%+ . Most of the threads are
>> using real-time
>> scheduling (SCHED_RR), and the threads of a model are using
>> SCHED_DEADLINE. After upgrading the kernel, I noticed that the
>> execution time of my model has increased from 4.5ms to 6ms.
>> 
>> What I Have Done So Far:
>> 1. I found this [bug
>> report](https://bugzilla.kernel.org/show_bug.cgi?id=219366#c7) and
>> reverted the commit efa7df3e3bb5da8e6abbe37727417f32a37fba47 mentioned
>> in the post. Unfortunately, this did not resolve the issue.
>> 2. I performed a git bisect and found that after these two commits
>> related to scheduling (RT and deadline) were merged, the problem
>> happened. They are 612f769edd06a6e42f7cd72425488e68ddaeef0a,
>> 5fe7765997b139e2d922b58359dea181efe618f9
>
> And yet you failed to Cc Valentin, the author of said commits :/
>
>> After reverting these two commits, the model execution time improved
>> to around 5 ms.
>> 3. I revert two more commits, and the execution time is back to 4.7ms:
>> 63ba8422f876e32ee564ea95da9a7313b13ff0a1,
>> efa7df3e3bb5da8e6abbe37727417f32a37fba47
>> 
>> My questions are:
>> 1.Has anyone else experienced similar performance degradation after
>> upgrading to kernel 6.8?
>
> This is 4 kernel releases back, I my memory isn't that long.
>
>> 2.Can anyone explain why these two commits are causing the problem? I
>> am not very familiar with the kernel code and would appreciate any
>> insights.
>
> There might be a race window between setting the tro and sending the
> IPI, such that previously the extra IPIs would sooner find the newly
> pushable task.
>
> Valentin, would it make sense to set tro before enqueueing the pushable,
> instead of after it?

Urgh, those cachelines are beyond cold...

/me goes reading

Ok yeah I guess we could have this race vs

rto_push_irq_work_func()
`\
  rto_next_cpu()

Not sure if that applies to DL too since it doesn't have the PUSH_IPI
thing, but anyway - Chenbo, could you please try the below?
---
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d9d5a702f1a61..270a25335c4bc 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -602,16 +602,16 @@ static void enqueue_pushable_dl_task(struct rq *rq, struct task_struct *p)
 
 	WARN_ON_ONCE(!RB_EMPTY_NODE(&p->pushable_dl_tasks));
 
+	if (!rq->dl.overloaded) {
+		dl_set_overload(rq);
+		rq->dl.overloaded = 1;
+	}
+
 	leftmost = rb_add_cached(&p->pushable_dl_tasks,
 				 &rq->dl.pushable_dl_tasks_root,
 				 __pushable_less);
 	if (leftmost)
 		rq->dl.earliest_dl.next = p->dl.deadline;
-
-	if (!rq->dl.overloaded) {
-		dl_set_overload(rq);
-		rq->dl.overloaded = 1;
-	}
 }
 
 static void dequeue_pushable_dl_task(struct rq *rq, struct task_struct *p)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index bd66a46b06aca..1ea45a45ed657 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -385,6 +385,15 @@ static inline void rt_queue_pull_task(struct rq *rq)
 
 static void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
 {
+	/*
+	 * Set RTO first so rto_push_irq_work_func() can see @rq as a push
+	 * candidate as early as possible.
+	 */
+	if (!rq->rt.overloaded) {
+		rt_set_overload(rq);
+		rq->rt.overloaded = 1;
+	}
+
 	plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
 	plist_node_init(&p->pushable_tasks, p->prio);
 	plist_add(&p->pushable_tasks, &rq->rt.pushable_tasks);
@@ -392,15 +401,15 @@ static void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
 	/* Update the highest prio pushable task */
 	if (p->prio < rq->rt.highest_prio.next)
 		rq->rt.highest_prio.next = p->prio;
-
-	if (!rq->rt.overloaded) {
-		rt_set_overload(rq);
-		rq->rt.overloaded = 1;
-	}
 }
 
 static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)
 {
+	/*
+	 * To match enqueue we should check/unset RTO first, but for that we
+	 * need to pop @p first. This makes this asymmetric wrt enqueue, but
+	 * the worst we can get out of this is an extra useless IPI.
+	 */
 	plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
 
 	/* Update the new highest prio pushable task */