linux-kernel - Re: [patch] Re: scheduler oddity [bug?]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1236596664.8389.331.camel@laptop>
Date:	Mon, 09 Mar 2009 12:04:24 +0100
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Mike Galbraith <efault@....de>,
	Balazs Scheidler <bazsi@...abit.hu>,
	linux-kernel@...r.kernel.org, Willy Tarreau <w@....eu>
Subject: Re: [patch] Re: scheduler oddity [bug?]

On Mon, 2009-03-09 at 09:07 +0100, Ingo Molnar wrote:
> * Mike Galbraith <efault@....de> wrote:

> > I see it as a problem, but it's your call.  Dunno if I'd apply it or
> > hold back, given these conflicting reports.
> 
> I think we still want it - as the purpose of the overlap metric 
> is to measure reality. If preemption causes overlap in execution 
> we should not ignore that.
> 
> The fact that your hw triggers it currently is enough of a 
> justification. Gautham's change to load-balancing might have 
> shifted the preemption and migration characteristics on his box 
> just enough to not trigger this - but it does not 'fix' the 
> problem per se.
> 
> Peter, what do you think?

Mostly confusion... trying to reverse engineer wth the patch does, and
why, as the changelog is somewhat silent on the issue, nor are there
comments added to clarify things.

Having something of a cold doesn't really help either..

OK, so staring at this:

---
diff --git a/kernel/sched.c b/kernel/sched.c
index 8e2558c..c670050 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1712,12 +1712,17 @@ static void enqueue_task(struct rq *rq, struct task_struct *p, int wakeup)
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
 {
+       u64 runtime;
+
        if (sleep && p->se.last_wakeup) {
-               update_avg(&p->se.avg_overlap,
-                          p->se.sum_exec_runtime - p->se.last_wakeup);
+               runtime = p->se.sum_exec_runtime - p->se.last_wakeup;
                p->se.last_wakeup = 0;
+       } else {
+               runtime = p->se.sum_exec_runtime - p->se.prev_sum_exec_runtime;
        }
 
+       update_avg(&p->se.avg_overlap, runtime);
+
        sched_info_dequeued(p);
        p->sched_class->dequeue_task(rq, p, sleep);
        p->se.on_rq = 0;
---

The idea of avg_overlap is to measure the time between waking someone
and going to sleep yourself. If this overlap time is short for both
tasks, we infer a mutal relation and try to keep these tasks on the same
cpu.

The above patch changes this definition by adding the full run-time on !
sleep dequeues.

We reset prev_sum_exec_runtime in set_next_entity(), iow every time we
start running a task.

Now !sleep dequeues happen mostly with preemption, but also with things
like migration, nice, etc..

Take migration, that would simply add the last full runtime again, even
though it hasn't ran -- that seems most odd.

OK, talked a bit with Ingo, the reason you're doing is that avg_overlap
can easily grow stale.. I can see that happen indeed.

So the 'perfect' thing would be a task-runtime decay, barring that the
preemption thing seems a sane enough hart-beat of a task.

How does the below look to you?

---
 kernel/sched.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 4414926..ec7ffdc 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4692,6 +4692,19 @@ static inline void schedule_debug(struct task_struct *prev)
 #endif
 }
 
+static void put_prev_task(struct rq *rq, struct task_struct *prev)
+{
+	if (prev->state == TASK_RUNNING) {
+		/*
+		 * In order to avoid avg_overlap growing stale when we are
+		 * indeed overlapping and hence not getting put to sleep, grow
+		 * the avg_overlap on preemption.
+		 */
+		update_avg(&prev->se.avg_overlap, sysctl_sched_migration_cost);
+	}
+	prev->sched_class->put_prev_task(rq, prev);
+}
+
 /*
  * Pick up the highest-prio task:
  */
@@ -4768,7 +4781,7 @@ need_resched_nonpreemptible:
 	if (unlikely(!rq->nr_running))
 		idle_balance(cpu, rq);
 
-	prev->sched_class->put_prev_task(rq, prev);
+	put_prev_task(rq, prev);
 	next = pick_next_task(rq);
 
 	if (likely(prev != next)) {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/