[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <ec049fd9b635f76a9e1d1ad380fd9184ebeeca53.1671158588.git.yu.c.chen@intel.com>
Date: Fri, 16 Dec 2022 14:11:17 +0800
From: Chen Yu <yu.c.chen@...el.com>
To: Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Tim Chen <tim.c.chen@...el.com>,
Mel Gorman <mgorman@...hsingularity.net>
Cc: Juri Lelli <juri.lelli@...hat.com>,
Rik van Riel <riel@...riel.com>,
Aaron Lu <aaron.lu@...el.com>,
Abel Wu <wuyun.abel@...edance.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Yicong Yang <yangyicong@...ilicon.com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
Ingo Molnar <mingo@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Hillf Danton <hdanton@...a.com>,
Honglei Wang <wanghonglei@...ichuxing.com>,
Len Brown <len.brown@...el.com>,
Chen Yu <yu.chen.surf@...il.com>,
Tianchen Ding <dtcccc@...ux.alibaba.com>,
Joel Fernandes <joel@...lfernandes.org>,
Josh Don <joshdon@...gle.com>, linux-kernel@...r.kernel.org,
Chen Yu <yu.c.chen@...el.com>
Subject: [RFC PATCH v4 1/2] sched/fair: Introduce short duration task check
Introduce short-duration task checks, as there is a requirement
to leverage this attribute for better task placement.
There are several choices of metrics that could be used to
indicate if a task is a short-duration task.
At first thought the (p->se.sum_exec_runtime / p->nvcsw)
can be used to measure the task duration. However, the
history long past was factored too heavily in such a formula.
Ideally, the old activity should decay and not affect
the current status too much.
Although something based on PELT can be used, se.util_avg might
not be appropriate to describe the task duration:
1. Task p1 and task p2 are doing frequent ping-pong scheduling on
one CPU, both p1 and p2 have a short duration, but the util_avg
can be up to 50%.
2. Suppose a task lasting less than 4ms is regarded as a short task.
If task p3 runs for 6 ms and sleeps for 32 ms, p3 should not be a
short-duration task. However, PELT would decay p3's accumulated
running time from 6 ms to 3 ms, because 32 ms is the half-life
in PELT. As a result, p3 would be incorrectly treated as a short
task.
It was found that there was once a similar feature to track the
duration of a task, which is in Commit ad4b78bbcbab ("sched: Add
new wakeup preemption mode: WAKEUP_RUNNING"). Unfortunately, it
was reverted because it was an experiment. So pick the patch up
again, by recording the average duration when a task voluntarily
switches out.
The threshold of short duration is sysctl_sched_min_granularity / 8,
so it can be tuned by the user. By default, the threshold is 375 us.
The reason to reuse sysctl_sched_min_granularity is that it reflects
how long the user would like the task to run. So the criteria of a
short task have a connection to it.
Josh is not in favor of tying the threshold to
sysctl_sched_min_granularity, ideally there should be a dedicated
parameter for the threshold, but that introduces complexity for
maintenance and the user.
Introduce SIS_SHORT to enable the short duration check.
Suggested-by: Tim Chen <tim.c.chen@...el.com>
Suggested-by: Vincent Guittot <vincent.guittot@...aro.org>
Signed-off-by: Chen Yu <yu.c.chen@...el.com>
---
include/linux/sched.h | 3 +++
kernel/sched/core.c | 2 ++
kernel/sched/debug.c | 1 +
kernel/sched/fair.c | 23 +++++++++++++++++++++++
kernel/sched/features.h | 1 +
5 files changed, 30 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ffb6eb55cd13..26f4768e63f3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -557,6 +557,9 @@ struct sched_entity {
u64 prev_sum_exec_runtime;
u64 nr_migrations;
+ u64 prev_sum_exec_runtime_vol;
+ /* average duration of a task */
+ u64 dur_avg;
#ifdef CONFIG_FAIR_GROUP_SCHED
int depth;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index daff72f00385..c5202f1be3f7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4348,6 +4348,8 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
p->se.prev_sum_exec_runtime = 0;
p->se.nr_migrations = 0;
p->se.vruntime = 0;
+ p->se.dur_avg = 0;
+ p->se.prev_sum_exec_runtime_vol = 0;
INIT_LIST_HEAD(&p->se.group_node);
#ifdef CONFIG_FAIR_GROUP_SCHED
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 1637b65ba07a..8d64fba16cfe 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1024,6 +1024,7 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
__PS("nr_involuntary_switches", p->nivcsw);
P(se.load.weight);
+ P(se.dur_avg);
#ifdef CONFIG_SMP
P(se.avg.load_sum);
P(se.avg.runnable_sum);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e4a0b8bd941c..abdb7a442052 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4824,6 +4824,16 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
static int
wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se);
+/*
+ * If a task switches in and then voluntarily relinquishes the
+ * CPU quickly, it is regarded as a short duration task.
+ */
+static inline int is_short_task(struct task_struct *p)
+{
+ return sched_feat(SIS_SHORT) && p->se.dur_avg &&
+ ((p->se.dur_avg * 8) <= sysctl_sched_min_granularity);
+}
+
/*
* Pick the next process, keeping these things in mind, in this order:
* 1) keep things fair between processes/task groups
@@ -5995,6 +6005,18 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
static void set_next_buddy(struct sched_entity *se);
+static inline void dur_avg_update(struct task_struct *p, bool task_sleep)
+{
+ u64 dur;
+
+ if (!task_sleep)
+ return;
+
+ dur = p->se.sum_exec_runtime - p->se.prev_sum_exec_runtime_vol;
+ p->se.prev_sum_exec_runtime_vol = p->se.sum_exec_runtime;
+ update_avg(&p->se.dur_avg, dur);
+}
+
/*
* The dequeue_task method is called before nr_running is
* decreased. We remove the task from the rbtree and
@@ -6067,6 +6089,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
dequeue_throttle:
util_est_update(&rq->cfs, p, task_sleep);
+ dur_avg_update(p, task_sleep);
hrtick_update(rq);
}
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index ee7f23c76bd3..efdc29c42161 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -62,6 +62,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
*/
SCHED_FEAT(SIS_PROP, false)
SCHED_FEAT(SIS_UTIL, true)
+SCHED_FEAT(SIS_SHORT, true)
/*
* Issue a WARN when we do multiple update_rq_clock() calls
--
2.25.1
Powered by blists - more mailing lists