linux-kernel - Re: [RFC PATCH v3 00/16] Core scheduling v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190810141556.GA73644@aaronlu>
Date:   Sat, 10 Aug 2019 22:15:56 +0800
From:   Aaron Lu <aaron.lu@...ux.alibaba.com>
To:     Tim Chen <tim.c.chen@...ux.intel.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Julien Desfossez <jdesfossez@...italocean.com>,
        "Li, Aubrey" <aubrey.li@...ux.intel.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        Subhra Mazumdar <subhra.mazumdar@...cle.com>,
        Vineeth Remanan Pillai <vpillai@...italocean.com>,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paul Turner <pjt@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Frédéric Weisbecker <fweisbec@...il.com>,
        Kees Cook <keescook@...omium.org>,
        Greg Kerr <kerrnel@...gle.com>, Phil Auld <pauld@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [RFC PATCH v3 00/16] Core scheduling v3

On Thu, Aug 08, 2019 at 02:42:57PM -0700, Tim Chen wrote:
> On 8/8/19 10:27 AM, Tim Chen wrote:
> > On 8/7/19 11:47 PM, Aaron Lu wrote:
> >> On Tue, Aug 06, 2019 at 02:19:57PM -0700, Tim Chen wrote:
> >>> +void account_core_idletime(struct task_struct *p, u64 exec)
> >>> +{
> >>> +	const struct cpumask *smt_mask;
> >>> +	struct rq *rq;
> >>> +	bool force_idle, refill;
> >>> +	int i, cpu;
> >>> +
> >>> +	rq = task_rq(p);
> >>> +	if (!sched_core_enabled(rq) || !p->core_cookie)
> >>> +		return;
> >>
> >> I don't see why return here for untagged task. Untagged task can also
> >> preempt tagged task and force a CPU thread enter idle state.
> >> Untagged is just another tag to me, unless we want to allow untagged
> >> task to coschedule with a tagged task.
> > 
> > You are right.  This needs to be fixed.
> > 
> 
> Here's the updated patchset, including Aaron's fix and also
> added accounting of force idle time by deadline and rt tasks.

I have two other small changes that I think are worth sending out.

The first simplify logic in pick_task() and the 2nd avoid task pick all
over again when max is preempted. I also refined the previous hack patch to
make schedule always happen only for root cfs rq. Please see below for
details, thanks.

patch1:

>From cea56db35fe9f393c357cdb1bdcb2ef9b56cfe97 Mon Sep 17 00:00:00 2001
From: Aaron Lu <ziqian.lzq@...fin.com>
Date: Mon, 5 Aug 2019 21:21:25 +0800
Subject: [PATCH 1/3] sched/core: simplify pick_task()

No need to special case !cookie case in pick_task(), we just need to
make it possible to return idle in sched_core_find() for !cookie query.
And cookie_pick will always have less priority than class_pick, so
remove the redundant check of prio_less(cookie_pick, class_pick).

Signed-off-by: Aaron Lu <ziqian.lzq@...fin.com>
---
 kernel/sched/core.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 90655c9ad937..84fec9933b74 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -186,6 +186,8 @@ static struct task_struct *sched_core_find(struct rq *rq, unsigned long cookie)
 	 * The idle task always matches any cookie!
 	 */
 	match = idle_sched_class.pick_task(rq);
+	if (!cookie)
+		goto out;
 
 	while (node) {
 		node_task = container_of(node, struct task_struct, core_node);
@@ -199,7 +201,7 @@ static struct task_struct *sched_core_find(struct rq *rq, unsigned long cookie)
 			node = node->rb_left;
 		}
 	}
-
+out:
 	return match;
 }
 
@@ -3657,18 +3659,6 @@ pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *ma
 	if (!class_pick)
 		return NULL;
 
-	if (!cookie) {
-		/*
-		 * If class_pick is tagged, return it only if it has
-		 * higher priority than max.
-		 */
-		if (max && class_pick->core_cookie &&
-		    prio_less(class_pick, max))
-			return idle_sched_class.pick_task(rq);
-
-		return class_pick;
-	}
-
 	/*
 	 * If class_pick is idle or matches cookie, return early.
 	 */
@@ -3682,8 +3672,7 @@ pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *ma
 	 * the core (so far) and it must be selected, otherwise we must go with
 	 * the cookie pick in order to satisfy the constraint.
 	 */
-	if (prio_less(cookie_pick, class_pick) &&
-	    (!max || prio_less(max, class_pick)))
+	if (!max || prio_less(max, class_pick))
 		return class_pick;
 
 	return cookie_pick;
-- 
2.19.1.3.ge56e4f7

patch2:

>From 487950dc53a40d5c566602f775ce46a0bab7a412 Mon Sep 17 00:00:00 2001
From: Aaron Lu <ziqian.lzq@...fin.com>
Date: Fri, 9 Aug 2019 14:48:01 +0800
Subject: [PATCH 2/3] sched/core: no need to pick again after max is preempted

When sibling's task preempts current max, there is no need to do the
pick all over again - the preempted cpu could just pick idle and done.

Signed-off-by: Aaron Lu <ziqian.lzq@...fin.com>
---
 kernel/sched/core.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 84fec9933b74..e88583860abe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3756,7 +3756,6 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	 * order.
 	 */
 	for_each_class(class) {
-again:
 		for_each_cpu_wrap(i, smt_mask, cpu) {
 			struct rq *rq_i = cpu_rq(i);
 			struct task_struct *p;
@@ -3828,10 +3827,10 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 						if (j == i)
 							continue;
 
-						cpu_rq(j)->core_pick = NULL;
+						cpu_rq(j)->core_pick = idle_sched_class.pick_task(cpu_rq(j));
 					}
 					occ = 1;
-					goto again;
+					goto out;
 				} else {
 					/*
 					 * Once we select a task for a cpu, we
@@ -3846,7 +3845,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 		}
 next_class:;
 	}
-
+out:
 	rq->core->core_pick_seq = rq->core->core_task_seq;
 	next = rq->core_pick;
 	rq->core_sched_seq = rq->core->core_pick_seq;
-- 
2.19.1.3.ge56e4f7

patch3:

>From 2d396d99e0dd7157b0b4f7a037c8b84ed135ea56 Mon Sep 17 00:00:00 2001
From: Aaron Lu <ziqian.lzq@...fin.com>
Date: Thu, 25 Jul 2019 19:57:21 +0800
Subject: [PATCH 3/3] sched/fair: make tick based schedule always happen

When a hyperthread is forced idle and the other hyperthread has a single
CPU intensive task running, the running task can occupy the hyperthread
for a long time with no scheduling point and starve the other
hyperthread.

Fix this temporarily by always checking if the task has exceed its
timeslice and if so, for root cfs_rq, do a schedule.

Signed-off-by: Aaron Lu <ziqian.lzq@...fin.com>
---
 kernel/sched/fair.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 26d29126d6a5..b1f0defdad91 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4011,6 +4011,9 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 		return;
 	}
 
+	if (cfs_rq->nr_running <= 1)
+		return;
+
 	/*
 	 * Ensure that a task that missed wakeup preemption by a
 	 * narrow margin doesn't have to wait for a full slice.
@@ -4179,7 +4182,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
 		return;
 #endif
 
-	if (cfs_rq->nr_running > 1)
+	if (cfs_rq->nr_running > 1 || cfs_rq->tg == &root_task_group)
 		check_preempt_tick(cfs_rq, curr);
 }
 
-- 
2.19.1.3.ge56e4f7