lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 16 Sep 2022 15:15:38 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org
Cc:     zhangqiao22@...wei.com,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: [PATCH v2] sched/fair: limit sched slice duration

In presence of a lot of small weight tasks like sched_idle tasks, normal
or high weight tasks can see their ideal runtime (sched_slice) to increase
to hundreds ms whereas it normally stays below sysctl_sched_latency.

2 normal tasks running on a CPU will have a max sched_slice of 12ms
(half of the sched_period). This means that they will make progress
every sysctl_sched_latency period.

If we now add 1000 idle tasks on the CPU, the sched_period becomes
3006 ms and the ideal runtime of the normal tasks becomes 609 ms.
It will even become 1500ms if the idle tasks belongs to an idle cgroup.
This means that the scheduler will look for picking another waiting task
after 609ms running time (1500ms respectively). The idle tasks change
significantly the way the 2 normal tasks interleave their running time
slot whereas they should have a small impact.

Such long sched_slice can delay significantly the release of resources
as the tasks can wait hundreds of ms before the next running slot just
because of idle tasks queued on the rq.

Cap the ideal_runtime to the weighted version of sysctl_sched_latency when
comparing with the vruntime of the next waiting task to make sure that
tasks will regularly make progress and will not be significantly impacted
by idle/background tasks queued on the rq.

Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
---

I have kept the test if (delta < 0) as calc_delta_fair() can't handle negative
value.

Change since v1:
  - the first 3 patches have been already queued
  - use the weight of curr to scale sysctl_sched_latency before capping
    the ideal_runtime so we can compare vruntime values.

 kernel/sched/fair.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5ffec4370602..ba451bb25929 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4610,6 +4610,8 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 	if (delta < 0)
 		return;
 
+	ideal_runtime = min_t(u64, ideal_runtime,
+				   calc_delta_fair(sysctl_sched_latency, curr));
 	if (delta > ideal_runtime)
 		resched_curr(rq_of(cfs_rq));
 }
-- 
2.17.1

Powered by blists - more mailing lists