[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190810141556.GA73644@aaronlu>
Date: Sat, 10 Aug 2019 22:15:56 +0800
From: Aaron Lu <aaron.lu@...ux.alibaba.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Julien Desfossez <jdesfossez@...italocean.com>,
"Li, Aubrey" <aubrey.li@...ux.intel.com>,
Aubrey Li <aubrey.intel@...il.com>,
Subhra Mazumdar <subhra.mazumdar@...cle.com>,
Vineeth Remanan Pillai <vpillai@...italocean.com>,
Nishanth Aravamudan <naravamudan@...italocean.com>,
Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Paul Turner <pjt@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
Frédéric Weisbecker <fweisbec@...il.com>,
Kees Cook <keescook@...omium.org>,
Greg Kerr <kerrnel@...gle.com>, Phil Auld <pauld@...hat.com>,
Valentin Schneider <valentin.schneider@....com>,
Mel Gorman <mgorman@...hsingularity.net>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [RFC PATCH v3 00/16] Core scheduling v3
On Thu, Aug 08, 2019 at 02:42:57PM -0700, Tim Chen wrote:
> On 8/8/19 10:27 AM, Tim Chen wrote:
> > On 8/7/19 11:47 PM, Aaron Lu wrote:
> >> On Tue, Aug 06, 2019 at 02:19:57PM -0700, Tim Chen wrote:
> >>> +void account_core_idletime(struct task_struct *p, u64 exec)
> >>> +{
> >>> + const struct cpumask *smt_mask;
> >>> + struct rq *rq;
> >>> + bool force_idle, refill;
> >>> + int i, cpu;
> >>> +
> >>> + rq = task_rq(p);
> >>> + if (!sched_core_enabled(rq) || !p->core_cookie)
> >>> + return;
> >>
> >> I don't see why return here for untagged task. Untagged task can also
> >> preempt tagged task and force a CPU thread enter idle state.
> >> Untagged is just another tag to me, unless we want to allow untagged
> >> task to coschedule with a tagged task.
> >
> > You are right. This needs to be fixed.
> >
>
> Here's the updated patchset, including Aaron's fix and also
> added accounting of force idle time by deadline and rt tasks.
I have two other small changes that I think are worth sending out.
The first simplify logic in pick_task() and the 2nd avoid task pick all
over again when max is preempted. I also refined the previous hack patch to
make schedule always happen only for root cfs rq. Please see below for
details, thanks.
patch1:
>From cea56db35fe9f393c357cdb1bdcb2ef9b56cfe97 Mon Sep 17 00:00:00 2001
From: Aaron Lu <ziqian.lzq@...fin.com>
Date: Mon, 5 Aug 2019 21:21:25 +0800
Subject: [PATCH 1/3] sched/core: simplify pick_task()
No need to special case !cookie case in pick_task(), we just need to
make it possible to return idle in sched_core_find() for !cookie query.
And cookie_pick will always have less priority than class_pick, so
remove the redundant check of prio_less(cookie_pick, class_pick).
Signed-off-by: Aaron Lu <ziqian.lzq@...fin.com>
---
kernel/sched/core.c | 19 ++++---------------
1 file changed, 4 insertions(+), 15 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 90655c9ad937..84fec9933b74 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -186,6 +186,8 @@ static struct task_struct *sched_core_find(struct rq *rq, unsigned long cookie)
* The idle task always matches any cookie!
*/
match = idle_sched_class.pick_task(rq);
+ if (!cookie)
+ goto out;
while (node) {
node_task = container_of(node, struct task_struct, core_node);
@@ -199,7 +201,7 @@ static struct task_struct *sched_core_find(struct rq *rq, unsigned long cookie)
node = node->rb_left;
}
}
-
+out:
return match;
}
@@ -3657,18 +3659,6 @@ pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *ma
if (!class_pick)
return NULL;
- if (!cookie) {
- /*
- * If class_pick is tagged, return it only if it has
- * higher priority than max.
- */
- if (max && class_pick->core_cookie &&
- prio_less(class_pick, max))
- return idle_sched_class.pick_task(rq);
-
- return class_pick;
- }
-
/*
* If class_pick is idle or matches cookie, return early.
*/
@@ -3682,8 +3672,7 @@ pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *ma
* the core (so far) and it must be selected, otherwise we must go with
* the cookie pick in order to satisfy the constraint.
*/
- if (prio_less(cookie_pick, class_pick) &&
- (!max || prio_less(max, class_pick)))
+ if (!max || prio_less(max, class_pick))
return class_pick;
return cookie_pick;
--
2.19.1.3.ge56e4f7
patch2:
>From 487950dc53a40d5c566602f775ce46a0bab7a412 Mon Sep 17 00:00:00 2001
From: Aaron Lu <ziqian.lzq@...fin.com>
Date: Fri, 9 Aug 2019 14:48:01 +0800
Subject: [PATCH 2/3] sched/core: no need to pick again after max is preempted
When sibling's task preempts current max, there is no need to do the
pick all over again - the preempted cpu could just pick idle and done.
Signed-off-by: Aaron Lu <ziqian.lzq@...fin.com>
---
kernel/sched/core.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 84fec9933b74..e88583860abe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3756,7 +3756,6 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
* order.
*/
for_each_class(class) {
-again:
for_each_cpu_wrap(i, smt_mask, cpu) {
struct rq *rq_i = cpu_rq(i);
struct task_struct *p;
@@ -3828,10 +3827,10 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
if (j == i)
continue;
- cpu_rq(j)->core_pick = NULL;
+ cpu_rq(j)->core_pick = idle_sched_class.pick_task(cpu_rq(j));
}
occ = 1;
- goto again;
+ goto out;
} else {
/*
* Once we select a task for a cpu, we
@@ -3846,7 +3845,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
}
next_class:;
}
-
+out:
rq->core->core_pick_seq = rq->core->core_task_seq;
next = rq->core_pick;
rq->core_sched_seq = rq->core->core_pick_seq;
--
2.19.1.3.ge56e4f7
patch3:
>From 2d396d99e0dd7157b0b4f7a037c8b84ed135ea56 Mon Sep 17 00:00:00 2001
From: Aaron Lu <ziqian.lzq@...fin.com>
Date: Thu, 25 Jul 2019 19:57:21 +0800
Subject: [PATCH 3/3] sched/fair: make tick based schedule always happen
When a hyperthread is forced idle and the other hyperthread has a single
CPU intensive task running, the running task can occupy the hyperthread
for a long time with no scheduling point and starve the other
hyperthread.
Fix this temporarily by always checking if the task has exceed its
timeslice and if so, for root cfs_rq, do a schedule.
Signed-off-by: Aaron Lu <ziqian.lzq@...fin.com>
---
kernel/sched/fair.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 26d29126d6a5..b1f0defdad91 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4011,6 +4011,9 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
return;
}
+ if (cfs_rq->nr_running <= 1)
+ return;
+
/*
* Ensure that a task that missed wakeup preemption by a
* narrow margin doesn't have to wait for a full slice.
@@ -4179,7 +4182,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
return;
#endif
- if (cfs_rq->nr_running > 1)
+ if (cfs_rq->nr_running > 1 || cfs_rq->tg == &root_task_group)
check_preempt_tick(cfs_rq, curr);
}
--
2.19.1.3.ge56e4f7
Powered by blists - more mailing lists