lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 09 Mar 2009 12:04:24 +0100
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Mike Galbraith <efault@....de>,
	Balazs Scheidler <bazsi@...abit.hu>,
	linux-kernel@...r.kernel.org, Willy Tarreau <w@....eu>
Subject: Re: [patch] Re: scheduler oddity [bug?]

On Mon, 2009-03-09 at 09:07 +0100, Ingo Molnar wrote:
> * Mike Galbraith <efault@....de> wrote:

> > I see it as a problem, but it's your call.  Dunno if I'd apply it or
> > hold back, given these conflicting reports.
> 
> I think we still want it - as the purpose of the overlap metric 
> is to measure reality. If preemption causes overlap in execution 
> we should not ignore that.
> 
> The fact that your hw triggers it currently is enough of a 
> justification. Gautham's change to load-balancing might have 
> shifted the preemption and migration characteristics on his box 
> just enough to not trigger this - but it does not 'fix' the 
> problem per se.
> 
> Peter, what do you think?

Mostly confusion... trying to reverse engineer wth the patch does, and
why, as the changelog is somewhat silent on the issue, nor are there
comments added to clarify things.

Having something of a cold doesn't really help either..

OK, so staring at this:

---
diff --git a/kernel/sched.c b/kernel/sched.c
index 8e2558c..c670050 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1712,12 +1712,17 @@ static void enqueue_task(struct rq *rq, struct task_struct *p, int wakeup)
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
 {
+       u64 runtime;
+
        if (sleep && p->se.last_wakeup) {
-               update_avg(&p->se.avg_overlap,
-                          p->se.sum_exec_runtime - p->se.last_wakeup);
+               runtime = p->se.sum_exec_runtime - p->se.last_wakeup;
                p->se.last_wakeup = 0;
+       } else {
+               runtime = p->se.sum_exec_runtime - p->se.prev_sum_exec_runtime;
        }
 
+       update_avg(&p->se.avg_overlap, runtime);
+
        sched_info_dequeued(p);
        p->sched_class->dequeue_task(rq, p, sleep);
        p->se.on_rq = 0;
---

The idea of avg_overlap is to measure the time between waking someone
and going to sleep yourself. If this overlap time is short for both
tasks, we infer a mutal relation and try to keep these tasks on the same
cpu.

The above patch changes this definition by adding the full run-time on !
sleep dequeues.

We reset prev_sum_exec_runtime in set_next_entity(), iow every time we
start running a task.

Now !sleep dequeues happen mostly with preemption, but also with things
like migration, nice, etc..

Take migration, that would simply add the last full runtime again, even
though it hasn't ran -- that seems most odd.

OK, talked a bit with Ingo, the reason you're doing is that avg_overlap
can easily grow stale.. I can see that happen indeed.

So the 'perfect' thing would be a task-runtime decay, barring that the
preemption thing seems a sane enough hart-beat of a task.

How does the below look to you?

---
 kernel/sched.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 4414926..ec7ffdc 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4692,6 +4692,19 @@ static inline void schedule_debug(struct task_struct *prev)
 #endif
 }
 
+static void put_prev_task(struct rq *rq, struct task_struct *prev)
+{
+	if (prev->state == TASK_RUNNING) {
+		/*
+		 * In order to avoid avg_overlap growing stale when we are
+		 * indeed overlapping and hence not getting put to sleep, grow
+		 * the avg_overlap on preemption.
+		 */
+		update_avg(&prev->se.avg_overlap, sysctl_sched_migration_cost);
+	}
+	prev->sched_class->put_prev_task(rq, prev);
+}
+
 /*
  * Pick up the highest-prio task:
  */
@@ -4768,7 +4781,7 @@ need_resched_nonpreemptible:
 	if (unlikely(!rq->nr_running))
 		idle_balance(cpu, rq);
 
-	prev->sched_class->put_prev_task(rq, prev);
+	put_prev_task(rq, prev);
 	next = pick_next_task(rq);
 
 	if (likely(prev != next)) {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ