linux-kernel - [PATCH] sched: fair: make V move forward only

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251128081118.20025-1-tao.wangtao@honor.com>
Date: Fri, 28 Nov 2025 16:11:18 +0800
From: wangtao <tao.wangtao@...or.com>
To: <mingo@...hat.com>, <peterz@...radead.org>, <juri.lelli@...hat.com>,
	<vincent.guittot@...aro.org>
CC: <dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
	<mgorman@...e.de>, <vschneid@...hat.com>, <linux-kernel@...r.kernel.org>,
	<liulu.liu@...or.com>, <bintian.wang@...or.com>, wangtao
	<tao.wangtao@...or.com>
Subject: [PATCH] sched: fair: make V move forward only

V is the weighted average of entities. Adding tasks with positive lag or
removing tasks with negative lag may cause V to move backward. This will
result in unfair task scheduling, causing previously eligible tasks to
become ineligible, shorter runtimes, and more task switches.

For example, when adding tasks a, x, and b, where a and b have zero lag
and x has positive lag, task b (added later) might be scheduled before
task a.

Making V move forward only resolves such issues and simplifies the code
for adding tasks with positive lag.

hackbench tests show that with this patch, execution time is significantly
reduced due to fewer task switches.

-------------------------------------------------
hackbench test              base    patch   opt
-------------------------------------------------
process 1 group:            0.141   0.100   -29.3%
process 4 group:            0.375   0.295   -21.2%
process 16 group:           1.495   1.204   -19.5%
thread 1 group:             0.090   0.068   -25.1%
thread 4 group:             0.244   0.211   -13.4%
thread 16 group:            0.860   0.795    -7.6%
pipe process 1 group:       0.124   0.090   -27.8%
pipe process 4 group:       0.340   0.289   -15.2%
pipe process 16 group:      1.401   1.144   -18.3%
pipe thread 1 group:        0.081   0.071   -11.7%
pipe thread 4 group:        0.241   0.181   -24.7%
pipe thread 16 group:       0.787   0.706   -10.2%

Signed-off-by: wangtao <tao.wangtao@...or.com>
---
 kernel/sched/fair.c  | 16 ++++++++++++----
 kernel/sched/sched.h |  1 +
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5b752324270b..889ee8d4c9bd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -671,7 +671,11 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
 		avg = div_s64(avg, load);
 	}
 
-	return cfs_rq->min_vruntime + avg;
+	avg += cfs_rq->min_vruntime;
+	if ((s64)(cfs_rq->forward_avg_vruntime - avg) < 0)
+		cfs_rq->forward_avg_vruntime = avg;
+
+	return cfs_rq->forward_avg_vruntime;
 }
 
 /*
@@ -725,6 +729,9 @@ static int vruntime_eligible(struct cfs_rq *cfs_rq, u64 vruntime)
 	s64 avg = cfs_rq->avg_vruntime;
 	long load = cfs_rq->avg_load;
 
+	if ((s64)(cfs_rq->forward_avg_vruntime - vruntime) >= 0)
+		return 1;
+
 	if (curr && curr->on_rq) {
 		unsigned long weight = scale_load_down(curr->load.weight);
 
@@ -5139,12 +5146,13 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 *
 	 * EEVDF: placement strategy #1 / #2
 	 */
-	if (sched_feat(PLACE_LAG) && cfs_rq->nr_queued && se->vlag) {
+	if (sched_feat(PLACE_LAG) && cfs_rq->nr_queued && se->vlag)
+		lag = se->vlag;
+	/* positive lag does not evaporate with forward_avg_vruntime */
+	if (lag < 0) {
 		struct sched_entity *curr = cfs_rq->curr;
 		unsigned long load;
 
-		lag = se->vlag;
-
 		/*
 		 * If we want to place a task and preserve lag, we have to
 		 * consider the effect of the new entity on the weighted
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index adfb6e3409d7..2691d5e8a0ab 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -681,6 +681,7 @@ struct cfs_rq {
 
 	s64			avg_vruntime;
 	u64			avg_load;
+	u64			forward_avg_vruntime;
 
 	u64			min_vruntime;
 #ifdef CONFIG_SCHED_CORE
-- 
2.17.1