linux-kernel - [PATCH] sched/fair: Increase max lag clamping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250418151225.3006867-1-vincent.guittot@linaro.org>
Date: Fri, 18 Apr 2025 17:12:25 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: mingo@...hat.com,
	peterz@...radead.org,
	juri.lelli@...hat.com,
	dietmar.eggemann@....com,
	rostedt@...dmis.org,
	bsegall@...gle.com,
	mgorman@...e.de,
	vschneid@...hat.com,
	linux-kernel@...r.kernel.org
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Subject: [PATCH] sched/fair: Increase max lag clamping

sched_entity lag is currently limited to the maximum between the tick and
twice the slice. This is too short compared to the maximum custom slice
that can be set and accumulated by other tasks.
Clamp the lag to the maximum slice that a task can set. A task A can
accumulate up to its slice of negative lag while running to parity and
the other runnable tasks can accumulate the same positive lag while
waiting to run. This positive lag could be lost during dequeue when
clamping it to twice task's slice if task A's slice is 100ms and others
use a smaller value like the default 2.8ms.

Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
---
 kernel/sched/fair.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a0c4cd26ee07..1c2c70decb20 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -683,15 +683,17 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
  * is possible -- by addition/removal/reweight to the tree -- to move V around
  * and end up with a larger lag than we started with.
  *
- * Limit this to either double the slice length with a minimum of TICK_NSEC
- * since that is the timing granularity.
- *
- * EEVDF gives the following limit for a steady state system:
+ * Limit this to the max allowed custom slice length which is higher than the
+ * timing granularity (the tick) and EEVDF gives the following limit for
+ * a steady state system:
  *
  *   -r_max < lag < max(r_max, q)
  *
  * XXX could add max_slice to the augmented data to track this.
  */
+#define SCHED_SLICE_MIN		(NSEC_PER_MSEC/10)  /* HZ=1000 * 10 */
+#define SCHED_SLICE_MAX		(NSEC_PER_MSEC*100) /* HZ=100  / 10 */
+
 static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	s64 vlag, limit;
@@ -699,7 +701,7 @@ static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	WARN_ON_ONCE(!se->on_rq);
 
 	vlag = avg_vruntime(cfs_rq) - se->vruntime;
-	limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
+	limit = calc_delta_fair(SCHED_SLICE_MAX, se);
 
 	se->vlag = clamp(vlag, -limit, limit);
 }
@@ -5189,8 +5191,8 @@ void __setparam_fair(struct task_struct *p, const struct sched_attr *attr)
 	if (attr->sched_runtime) {
 		se->custom_slice = 1;
 		se->slice = clamp_t(u64, attr->sched_runtime,
-				      NSEC_PER_MSEC/10,   /* HZ=1000 * 10 */
-				      NSEC_PER_MSEC*100); /* HZ=100  / 10 */
+				      SCHED_SLICE_MIN,
+				      SCHED_SLICE_MAX);
 	} else {
 		se->custom_slice = 0;
 		se->slice = sysctl_sched_base_slice;
-- 
2.43.0