linux-kernel - [PATCH] sched: Optimize pick_next_task for idle_sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170119101703.2abeaeb6@gandalf.local.home>
Date:   Thu, 19 Jan 2017 10:17:03 -0500
From:   Steven Rostedt <rostedt@...dmis.org>
To:     LKML <linux-kernel@...r.kernel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: [PATCH] sched: Optimize pick_next_task for idle_sched_class too

When running my likely/unlikely profiler, I noticed that the
SCHED_DEADLINE's pick_next_task_dl() unlikely case of
(!dl_rq->dl_nr_running) was always being hit. There's two cases where
this can happen.

First, there's an optimization in pick_next_task() for the likely case
that the only tasks running on the run queue are SCHED_OTHER tasks. In
a normal system, this is the case most of the time. When this is true,
only the pick_next_task() of the fair_sched_class is called. If an RT or
DEADLINE task is queued, then the other pick_next_task()s of the other
sched classes are called in sequence.

The SCHED_DEADLINE pick_next_task() is called first, and that
unlikely() case is hit if there's no deadline tasks available. This
happens when an RT task is queued (first case). But tracing revealed
that this happens in another very common case. The case where the
system goes from idle to running any task, including SCHED_OTHER. This
is because the idle task has a different sched class than the
fair_sched_class.

The optimization has:

	if (prev->sched_class == fair_sched_class &&
	    rq->nr_running == rq->cfs.h_nr_running) {

When going from SCHED_OTHER to idle, this optimization is hit, because
the SCHED_OTHER task is of the fair_sched_class, and rq->nr_running and
rq->cfs.h_nr_running are both zero. But when we go from idle to
SCHED_OTHER, the first test fails. prev->sched_class is equal to
idle_sched_class, and this causes both the pick_next_task() of deadline
and RT sched classes to be called unnecessarily.

Signed-off-by: Steven Rostedt (VMware) <rostedt@...dmis.org>
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 154fd68..e2c6d3b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3259,13 +3259,15 @@ static inline struct task_struct *
 pick_next_task(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
 {
 	const struct sched_class *class = &fair_sched_class;
+	const struct sched_class *idle_class = &idle_sched_class;
 	struct task_struct *p;

 	/*
 	 * Optimization: we know that if all tasks are in
 	 * the fair class we can call that function directly:
 	 */
-	if (likely(prev->sched_class == class &&
+	if (likely((prev->sched_class == class ||
+		    prev->sched_class == idle_class) &&
 		   rq->nr_running == rq->cfs.h_nr_running)) {
 		p = fair_sched_class.pick_next_task(rq, prev, cookie);
 		if (unlikely(p == RETRY_TASK))