linux-kernel - [tip: sched/core] sched/fair: Forfeit vruntime on yield

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <176060722706.709179.4726867528904925275.tip-bot2@tip-bot2>
Date: Thu, 16 Oct 2025 09:33:47 -0000
From: "tip-bot2 for Fernand Sieber" <tip-bot2@...utronix.de>
To: linux-tip-commits@...r.kernel.org
Cc: Fernand Sieber <sieberf@...zon.com>,
 "Peter Zijlstra (Intel)" <peterz@...radead.org>, x86@...nel.org,
 linux-kernel@...r.kernel.org
Subject: [tip: sched/core] sched/fair: Forfeit vruntime on yield

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     79104becf42baeeb4a3f2b106f954b9fc7c10a3c
Gitweb:        https://git.kernel.org/tip/79104becf42baeeb4a3f2b106f954b9fc7c10a3c
Author:        Fernand Sieber <sieberf@...zon.com>
AuthorDate:    Thu, 18 Sep 2025 17:05:28 +02:00
Committer:     Peter Zijlstra <peterz@...radead.org>
CommitterDate: Thu, 16 Oct 2025 11:13:49 +02:00

sched/fair: Forfeit vruntime on yield

If a task yields, the scheduler may decide to pick it again. The task in
turn may decide to yield immediately or shortly after, leading to a tight
loop of yields.

If there's another runnable task as this point, the deadline will be
increased by the slice at each loop. This can cause the deadline to runaway
pretty quickly, and subsequent elevated run delays later on as the task
doesn't get picked again. The reason the scheduler can pick the same task
again and again despite its deadline increasing is because it may be the
only eligible task at that point.

Fix this by making the task forfeiting its remaining vruntime and pushing
the deadline one slice ahead. This implements yield behavior more
authentically.

We limit the forfeiting to eligible tasks. This is because core scheduling
prefers running ineligible tasks rather than force idling. As such, without
the condition, we can end up on a yield loop which makes the vruntime
increase rapidly, leading to anomalous run delays later down the line.

Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling  policy")
Signed-off-by: Fernand Sieber <sieberf@...zon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Link: https://lore.kernel.org/r/20250401123622.584018-1-sieberf@amazon.com
Link: https://lore.kernel.org/r/20250911095113.203439-1-sieberf@amazon.com
Link: https://lore.kernel.org/r/20250916140228.452231-1-sieberf@amazon.com
---
 kernel/sched/fair.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bc0b7ce..00f9d6c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9007,7 +9007,19 @@ static void yield_task_fair(struct rq *rq)
 	 */
 	rq_clock_skip_update(rq);

-	se->deadline += calc_delta_fair(se->slice, se);
+	/*
+	 * Forfeit the remaining vruntime, only if the entity is eligible. This
+	 * condition is necessary because in core scheduling we prefer to run
+	 * ineligible tasks rather than force idling. If this happens we may
+	 * end up in a loop where the core scheduler picks the yielding task,
+	 * which yields immediately again; without the condition the vruntime
+	 * ends up quickly running away.
+	 */
+	if (entity_eligible(cfs_rq, se)) {
+		se->vruntime = se->deadline;
+		se->deadline += calc_delta_fair(se->slice, se);
+		update_min_vruntime(cfs_rq);
+	}
 }

 static bool yield_to_task_fair(struct rq *rq, struct task_struct *p)