linux-kernel - [tip: sched/core] sched/pelt: Avoid underestimation of task utilization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <170073542912.398.8772998793614415245.tip-bot2@tip-bot2>
Date:   Thu, 23 Nov 2023 10:30:29 -0000
From:   "tip-bot2 for Vincent Guittot" <tip-bot2@...utronix.de>
To:     linux-tip-commits@...r.kernel.org
Cc:     Lukasz Luba <lukasz.luba@....com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Ingo Molnar <mingo@...nel.org>,
        Hongyan Xia <hongyan.xia2@....com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: [tip: sched/core] sched/pelt: Avoid underestimation of task utilization

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     50181c0cff31281b9f1071575ffba8a102375ece
Gitweb:        https://git.kernel.org/tip/50181c0cff31281b9f1071575ffba8a102375ece
Author:        Vincent Guittot <vincent.guittot@...aro.org>
AuthorDate:    Wed, 22 Nov 2023 15:01:19 +01:00
Committer:     Ingo Molnar <mingo@...nel.org>
CommitterDate: Thu, 23 Nov 2023 11:24:28 +01:00

sched/pelt: Avoid underestimation of task utilization

Lukasz Luba reported that a thread's util_est can significantly decrease as
a result of sharing the CPU with other threads.

The use case can be easily reproduced with a periodic task TA that runs 1ms
and sleeps 100us. When the task is alone on the CPU, its max utilization and
its util_est is around 888. If another similar task starts to run on the
same CPU, TA will have to share the CPU runtime and its maximum utilization
will decrease around half the CPU capacity (512) then TA's util_est will
follow this new maximum trend which is only the result of sharing the CPU
with others tasks.

Such situation can be detected with runnable_avg wich is close or
equal to util_avg when TA is alone, but increases above util_avg when TA
shares the CPU with other threads and wait on the runqueue.

[ We prefer an util_est that overestimate rather than under estimate
  because in 1st case we will not provide enough performance to the
  task which will remain under-provisioned, whereas in the other case we
  will create some idle time which will enable to reduce contention and
  as a result reduces the util_est so the overestimate will be transient
  whereas the underestimate will remain. ]

[ mingo: Refined the changelog, added comments from the LKML discussion. ]

Reported-by: Lukasz Luba <lukasz.luba@....com>
Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
Signed-off-by: Ingo Molnar <mingo@...nel.org>
Link: https://lore.kernel.org/lkml/CAKfTPtDd-HhF-YiNTtL9i5k0PfJbF819Yxu4YquzfXgwi7voyw@mail.gmail.com/#t
Link: https://lore.kernel.org/r/20231122140119.472110-1-vincent.guittot@linaro.org
Cc: Hongyan Xia <hongyan.xia2@....com>
---
 kernel/sched/fair.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 07f5558..53dea95 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4774,6 +4774,11 @@ static inline unsigned long task_util(struct task_struct *p)
 	return READ_ONCE(p->se.avg.util_avg);
 }

+static inline unsigned long task_runnable(struct task_struct *p)
+{
+	return READ_ONCE(p->se.avg.runnable_avg);
+}
+
 static inline unsigned long _task_util_est(struct task_struct *p)
 {
 	struct util_est ue = READ_ONCE(p->se.avg.util_est);
@@ -4893,6 +4898,14 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 		return;

 	/*
+	 * To avoid underestimate of task utilization, skip updates of EWMA if
+	 * we cannot grant that thread got all CPU time it wanted.
+	 */
+	if ((ue.enqueued + UTIL_EST_MARGIN) < task_runnable(p))
+		goto done;
+
+
+	/*
 	 * Update Task's estimated utilization
 	 *
 	 * When *p completes an activation we can consolidate another sample