lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250418151225.3006867-1-vincent.guittot@linaro.org>
Date: Fri, 18 Apr 2025 17:12:25 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: mingo@...hat.com,
	peterz@...radead.org,
	juri.lelli@...hat.com,
	dietmar.eggemann@....com,
	rostedt@...dmis.org,
	bsegall@...gle.com,
	mgorman@...e.de,
	vschneid@...hat.com,
	linux-kernel@...r.kernel.org
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Subject: [PATCH] sched/fair: Increase max lag clamping

sched_entity lag is currently limited to the maximum between the tick and
twice the slice. This is too short compared to the maximum custom slice
that can be set and accumulated by other tasks.
Clamp the lag to the maximum slice that a task can set. A task A can
accumulate up to its slice of negative lag while running to parity and
the other runnable tasks can accumulate the same positive lag while
waiting to run. This positive lag could be lost during dequeue when
clamping it to twice task's slice if task A's slice is 100ms and others
use a smaller value like the default 2.8ms.

Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
---
 kernel/sched/fair.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a0c4cd26ee07..1c2c70decb20 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -683,15 +683,17 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
  * is possible -- by addition/removal/reweight to the tree -- to move V around
  * and end up with a larger lag than we started with.
  *
- * Limit this to either double the slice length with a minimum of TICK_NSEC
- * since that is the timing granularity.
- *
- * EEVDF gives the following limit for a steady state system:
+ * Limit this to the max allowed custom slice length which is higher than the
+ * timing granularity (the tick) and EEVDF gives the following limit for
+ * a steady state system:
  *
  *   -r_max < lag < max(r_max, q)
  *
  * XXX could add max_slice to the augmented data to track this.
  */
+#define SCHED_SLICE_MIN		(NSEC_PER_MSEC/10)  /* HZ=1000 * 10 */
+#define SCHED_SLICE_MAX		(NSEC_PER_MSEC*100) /* HZ=100  / 10 */
+
 static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	s64 vlag, limit;
@@ -699,7 +701,7 @@ static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	WARN_ON_ONCE(!se->on_rq);
 
 	vlag = avg_vruntime(cfs_rq) - se->vruntime;
-	limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
+	limit = calc_delta_fair(SCHED_SLICE_MAX, se);
 
 	se->vlag = clamp(vlag, -limit, limit);
 }
@@ -5189,8 +5191,8 @@ void __setparam_fair(struct task_struct *p, const struct sched_attr *attr)
 	if (attr->sched_runtime) {
 		se->custom_slice = 1;
 		se->slice = clamp_t(u64, attr->sched_runtime,
-				      NSEC_PER_MSEC/10,   /* HZ=1000 * 10 */
-				      NSEC_PER_MSEC*100); /* HZ=100  / 10 */
+				      SCHED_SLICE_MIN,
+				      SCHED_SLICE_MAX);
 	} else {
 		se->custom_slice = 0;
 		se->slice = sysctl_sched_base_slice;
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ