lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 4 Sep 2019 21:44:23 -0400
From:   Julien Desfossez <jdesfossez@...italocean.com>
To:     Tim Chen <tim.c.chen@...ux.intel.com>
Cc:     Dario Faggioli <dfaggioli@...e.com>,
        "Li, Aubrey" <aubrey.li@...ux.intel.com>,
        Aaron Lu <aaron.lu@...ux.alibaba.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        Subhra Mazumdar <subhra.mazumdar@...cle.com>,
        Vineeth Remanan Pillai <vpillai@...italocean.com>,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paul Turner <pjt@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Frédéric Weisbecker <fweisbec@...il.com>,
        Kees Cook <keescook@...omium.org>,
        Greg Kerr <kerrnel@...gle.com>, Phil Auld <pauld@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [RFC PATCH v3 00/16] Core scheduling v3

> 1) Unfairness between the sibling threads
> -----------------------------------------
> One sibling thread could be suppressing and force idling
> the sibling thread over proportionally.  Resulting in
> the force idled CPU not getting run and stall tasks on
> suppressed CPU.
> 
> Status:
> i) Aaron has proposed a patchset here based on using one
> rq as a base reference for vruntime for task priority
> comparison between siblings.
> 
> https://lore.kernel.org/lkml/20190725143248.GC992@aaronlu/
> It works well on fairness but has some initialization issues
> 
> ii) Tim has proposed a patchset here to account for forced
> idle time in rq's min_vruntime
> https://lore.kernel.org/lkml/f96350c1-25a9-0564-ff46-6658e96d726c@linux.intel.com/
> It improves over v3 with simpler logic compared to
> Aaron's patch, but does not work as well on fairness
> 
> iii) Tim has proposed yet another patch to maintain fairness
> of forced idle time between CPU threads per Peter's suggestion.
> https://lore.kernel.org/lkml/21933a50-f796-3d28-664c-030cb7c98431@linux.intel.com/
> Its performance has yet to be tested.
> 
> 2) Not rescheduling forced idled CPU
> ------------------------------------
> The forced idled CPU does not get a chance to re-schedule
> itself, and will stall for a long time even though it
> has eligible tasks to run.
> 
> Status:
> i) Aaron proposed a patch to fix this to check if there
> are runnable tasks when scheduling tick comes in.
> https://lore.kernel.org/lkml/20190725143344.GD992@aaronlu/
> 
> ii) Vineeth has patches to this issue and also issue 1, based
> on scheduling in a new "forced idle task" when getting forced
> idle, but has yet to post the patches.

We finished writing and debugging the PoC for the coresched_idle task
and here are the results and the code.

Those patches are applied on top of Aaron's patches:
- sched: Fix incorrect rq tagged as forced idle
- wrapper for cfs_rq->min_vruntime
  https://lore.kernel.org/lkml/20190725143127.GB992@aaronlu/
- core vruntime comparison
  https://lore.kernel.org/lkml/20190725143248.GC992@aaronlu/

For the testing, we used the same strategy as described in
https://lore.kernel.org/lkml/20190802153715.GA18075@sinkpad/

No tag
------
Test                            Average     Stdev
Alone                           1306.90     0.94
nosmt                           649.95      1.44
Aaron's full patchset:          828.15      32.45
Aaron's first 2 patches:        832.12      36.53
Tim's first patchset:           852.50      4.11
Tim's second patchset:          855.11      9.89
coresched_idle                  985.67      0.83

Sysbench mem untagged, sysbench cpu tagged
------------------------------------------
Test                            Average     Stdev
Alone                           1306.90     0.94
nosmt                           649.95      1.44
Aaron's full patchset:          586.06      1.77
Tim's first patchset:           852.50      4.11
Tim's second patchset:          663.88      44.43
coresched_idle                  653.58      0.49

Sysbench mem tagged, sysbench cpu untagged
------------------------------------------
Test                            Average     Stdev
Alone                           1306.90     0.94
nosmt                           649.95      1.44
Aaron's full patchset:          583.77      3.52
Tim's first patchset:           564.04      58.05
Tim's second patchset:          524.72      55.24
coresched_idle                  653.30      0.81

Both sysbench tagged
--------------------
Test                            Average     Stdev
Alone                           1306.90     0.94
nosmt                           649.95      1.44
Aaron's full patchset:          582.15      3.75
Tim's first patchset:           679.43      70.07
Tim's second patchset:          563.10      34.58
coresched_idle                  653.12      1.68

As we can see from this stress-test, with the coresched_idle thread
being a real process, the fairness is more consistent (low stdev). Also,
the performance remains the same regardless of the tagging, and even
always slightly better than nosmt.

Thanks,

Julien

From: vpillai <vpillai@...italocean.com>
Date: Wed, 4 Sep 2019 17:41:38 +0000
Subject: [RFC PATCH 1/2] coresched_idle thread

---
 kernel/sched/core.c  | 46 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h |  1 +
 2 files changed, 47 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f7839bf96e8b..fe560739c247 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3639,6 +3639,51 @@ static inline bool cookie_match(struct task_struct *a, struct task_struct *b)
 	return a->core_cookie == b->core_cookie;
 }
 
+static int coresched_idle_worker(void *data)
+{
+	struct rq *rq = (struct rq *)data;
+
+	/*
+	 * Transition to parked state and dequeue from runqueue.
+	 * pick_task() will select us if needed without enqueueing.
+	 */
+	set_special_state(TASK_PARKED);
+	schedule();
+
+	while (true) {
+		if (kthread_should_stop())
+			break;
+
+		play_idle(1);
+	}
+
+	return 0;
+}
+
+static void coresched_idle_worker_init(struct rq *rq)
+{
+
+	// XXX core_idle_task needs lock protection?
+	if (!rq->core_idle_task) {
+		rq->core_idle_task = kthread_create_on_cpu(coresched_idle_worker,
+				(void *)rq, cpu_of(rq), "coresched_idle");
+		if (rq->core_idle_task) {
+			wake_up_process(rq->core_idle_task);
+		}
+
+	}
+
+	return;
+}
+
+static void coresched_idle_worker_fini(struct rq *rq)
+{
+	if (rq->core_idle_task) {
+		kthread_stop(rq->core_idle_task);
+		rq->core_idle_task = NULL;
+	}
+}
+
 // XXX fairness/fwd progress conditions
 /*
  * Returns
@@ -6774,6 +6819,7 @@ void __init sched_init(void)
 		atomic_set(&rq->nr_iowait, 0);
 
 #ifdef CONFIG_SCHED_CORE
+		rq->core_idle_task = NULL;
 		rq->core = NULL;
 		rq->core_pick = NULL;
 		rq->core_enabled = 0;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e91c188a452c..c3ae0af55b05 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -965,6 +965,7 @@ struct rq {
 	unsigned int		core_sched_seq;
 	struct rb_root		core_tree;
 	bool			core_forceidle;
+	struct task_struct	*core_idle_task;
 
 	/* shared state */
 	unsigned int		core_task_seq;
-- 
2.17.1

From: vpillai <vpillai@...italocean.com>
Date: Wed, 4 Sep 2019 18:22:55 +0000
Subject: [RFC PATCH 2/2] Use coresched_idle to force idle a sibling

Currently we use idle thread to force idle on a sibling. Lets
use the new coresched_idle thread that scheduler sees a valid
task during force idle.
---
 kernel/sched/core.c | 66 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 56 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fe560739c247..e35d69a81adb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -244,23 +244,33 @@ static int __sched_core_stopper(void *data)
 static DEFINE_MUTEX(sched_core_mutex);
 static int sched_core_count;
 
+static void coresched_idle_worker_init(struct rq *rq);
+static void coresched_idle_worker_fini(struct rq *rq);
 static void __sched_core_enable(void)
 {
+	int cpu;
+
 	// XXX verify there are no cookie tasks (yet)
 
 	static_branch_enable(&__sched_core_enabled);
 	stop_machine(__sched_core_stopper, (void *)true, NULL);
 
+	for_each_online_cpu(cpu)
+		coresched_idle_worker_init(cpu_rq(cpu));
 	printk("core sched enabled\n");
 }
 
 static void __sched_core_disable(void)
 {
+	int cpu;
+
 	// XXX verify there are no cookie tasks (left)
 
 	stop_machine(__sched_core_stopper, (void *)false, NULL);
 	static_branch_disable(&__sched_core_enabled);
 
+	for_each_online_cpu(cpu)
+		coresched_idle_worker_fini(cpu_rq(cpu));
 	printk("core sched disabled\n");
 }
 
@@ -3626,14 +3636,25 @@ __pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 
 #ifdef CONFIG_SCHED_CORE
 
+static inline bool is_force_idle_task(struct task_struct *p)
+{
+	BUG_ON(task_rq(p)->core_idle_task == NULL);
+	return task_rq(p)->core_idle_task == p;
+}
+
+static inline bool is_core_idle_task(struct task_struct *p)
+{
+	return is_idle_task(p) || is_force_idle_task(p);
+}
+
 static inline bool cookie_equals(struct task_struct *a, unsigned long cookie)
 {
-	return is_idle_task(a) || (a->core_cookie == cookie);
+	return is_core_idle_task(a) || (a->core_cookie == cookie);
 }
 
 static inline bool cookie_match(struct task_struct *a, struct task_struct *b)
 {
-	if (is_idle_task(a) || is_idle_task(b))
+	if (is_core_idle_task(a) || is_core_idle_task(b))
 		return true;
 
 	return a->core_cookie == b->core_cookie;
@@ -3641,8 +3662,6 @@ static inline bool cookie_match(struct task_struct *a, struct task_struct *b)
 
 static int coresched_idle_worker(void *data)
 {
-	struct rq *rq = (struct rq *)data;
-
 	/*
 	 * Transition to parked state and dequeue from runqueue.
 	 * pick_task() will select us if needed without enqueueing.
@@ -3666,7 +3685,7 @@ static void coresched_idle_worker_init(struct rq *rq)
 	// XXX core_idle_task needs lock protection?
 	if (!rq->core_idle_task) {
 		rq->core_idle_task = kthread_create_on_cpu(coresched_idle_worker,
-				(void *)rq, cpu_of(rq), "coresched_idle");
+				NULL, cpu_of(rq), "coresched_idle");
 		if (rq->core_idle_task) {
 			wake_up_process(rq->core_idle_task);
 		}
@@ -3684,6 +3703,14 @@ static void coresched_idle_worker_fini(struct rq *rq)
 	}
 }
 
+static inline struct task_struct *core_idle_task(struct rq *rq)
+{
+	BUG_ON(rq->core_idle_task == NULL);
+
+	return rq->core_idle_task;
+
+}
+
 // XXX fairness/fwd progress conditions
 /*
  * Returns
@@ -3709,7 +3736,7 @@ pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *ma
 		 */
 		if (max && class_pick->core_cookie &&
 		    prio_less(class_pick, max))
-			return idle_sched_class.pick_task(rq);
+			return core_idle_task(rq);
 
 		return class_pick;
 	}
@@ -3853,7 +3880,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 				goto done;
 			}
 
-			if (!is_idle_task(p))
+			if (!is_force_idle_task(p))
 				occ++;
 
 			rq_i->core_pick = p;
@@ -3906,7 +3933,6 @@ next_class:;
 	rq->core->core_pick_seq = rq->core->core_task_seq;
 	next = rq->core_pick;
 	rq->core_sched_seq = rq->core->core_pick_seq;
-	trace_printk("picked: %s/%d %lx\n", next->comm, next->pid, next->core_cookie);
 
 	/*
 	 * Reschedule siblings
@@ -3924,13 +3950,24 @@ next_class:;
 
 		WARN_ON_ONCE(!rq_i->core_pick);
 
-		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
+		if (is_core_idle_task(rq_i->core_pick) && rq_i->nr_running) {
+			/*
+			 * Matching logic can sometimes select idle_task when
+			 * iterating the sched_classes. If that selection is
+			 * actually a forced idle case, we need to update the
+			 * core_pick to coresched_idle.
+			 */
+			if (is_idle_task(rq_i->core_pick))
+				rq_i->core_pick = core_idle_task(rq_i);
 			rq_i->core_forceidle = true;
+		}
 
 		rq_i->core_pick->core_occupation = occ;
 
-		if (i == cpu)
+		if (i == cpu) {
+			next = rq_i->core_pick;
 			continue;
+		}
 
 		if (rq_i->curr != rq_i->core_pick) {
 			trace_printk("IPI(%d)\n", i);
@@ -3947,6 +3984,7 @@ next_class:;
 			WARN_ON_ONCE(1);
 		}
 	}
+	trace_printk("picked: %s/%d %lx\n", next->comm, next->pid, next->core_cookie);
 
 done:
 	set_next_task(rq, next);
@@ -4200,6 +4238,12 @@ static void __sched notrace __schedule(bool preempt)
 		 *   is a RELEASE barrier),
 		 */
 		++*switch_count;
+#ifdef CONFIG_SCHED_CORE
+		if (next == rq->core_idle_task)
+			next->state = TASK_RUNNING;
+		else if (prev == rq->core_idle_task)
+			prev->state = TASK_PARKED;
+#endif
 
 		trace_sched_switch(preempt, prev, next);
 
@@ -6479,6 +6523,7 @@ int sched_cpu_activate(unsigned int cpu)
 #ifdef CONFIG_SCHED_CORE
 		if (static_branch_unlikely(&__sched_core_enabled)) {
 			rq->core_enabled = true;
+			coresched_idle_worker_init(rq);
 		}
 #endif
 	}
@@ -6535,6 +6580,7 @@ int sched_cpu_deactivate(unsigned int cpu)
 		struct rq *rq = cpu_rq(cpu);
 		if (static_branch_unlikely(&__sched_core_enabled)) {
 			rq->core_enabled = false;
+			coresched_idle_worker_fini(rq);
 		}
 #endif
 		static_branch_dec_cpuslocked(&sched_smt_present);
-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ