[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090115125451.GB21839@elte.hu>
Date: Thu, 15 Jan 2009 13:54:51 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc: Mike Galbraith <efault@....de>, Brian Rogers <brian@...w.org>,
linux-kernel@...r.kernel.org
Subject: Re: [BUG] How to get real-time priority using idle priority
* Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
> > > Aha. Yeah, I'll re-test with that instead.
> >
> > Works a treat.
>
> *cheer* lets get this merged asap, and CC -stable as well.
I've created the delta patch below for sched/urgent - i've been testing
the previous version already. Mike, do you agree with this splitup?
Ingo
--------------->
>From 7bad8c0618dc32ecf7632b7d5abfb21645ee6b60 Mon Sep 17 00:00:00 2001
From: Mike Galbraith <efault@....de>
Date: Thu, 15 Jan 2009 10:28:43 +0100
Subject: [PATCH] sched: fix SCHED_IDLE latency/starvation, v2
The below seems to cure all of the problems I've encountered.
The rate of advance thing in set_task_cpu() seems to have been a case of
fixing the symptom instead of the problem. Perhaps this needs more
thought, but my box says "Red-Herring" ;-)
The real problem (excluding the SCHED_IDLE specific problems), is that
update_min_vruntime() doesn't work quite as intended, and will slam
min_vruntime far right if load balancing etc places a task which is far
right of the currently running task on the runqueue. If the currently
running, and up to this point the min_vruntime pace setter, is a hog,
any task waking to this runqueue after min_vruntime leaps forward has to
wait for the hog to consume the gap. In the case of SCHED_IDLE tasks,
that gap can be huge, but even with a nice 19 tasks it can be quite
large and painful.
Removing the if (vruntime == cfs_rq->min_vruntime) test, which will be
true if the currently running task is the pace setter, cured it for me.
Signed-off-by: Mike Galbraith <efault@....de>
Acked-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc: <stable@...nel.org>
Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
kernel/sched.c | 23 ++++-------------------
kernel/sched_fair.c | 13 +++++++------
2 files changed, 11 insertions(+), 25 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index e551bb6..b087c7f 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1323,8 +1323,8 @@ static inline void update_load_sub(struct load_weight *lw, unsigned long dec)
* slice expiry etc.
*/
-#define WEIGHT_IDLEPRIO 2
-#define WMULT_IDLEPRIO (1 << 31)
+#define WEIGHT_IDLEPRIO 3
+#define WMULT_IDLEPRIO 1431655765
/*
* Nice levels are multiplicative, with a gentle 10% change for every
@@ -1891,23 +1891,8 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
schedstat_inc(p, se.nr_forced2_migrations);
}
#endif
- if (old_cpu != new_cpu) {
- s64 delta = p->se.vruntime - old_cfsrq->min_vruntime;
-
- /*
- * min_vruntimes may be advancing at wildly different
- * rates, so we must scale the delta accordingly.
- */
- if (new_cfsrq->load.weight != old_cfsrq->load.weight) {
- int negative = delta < 0;
-
- delta = negative ? -delta : delta;
- delta = calc_delta_mine(delta,
- new_cfsrq->load.weight, &old_cfsrq->load);
- delta = negative ? -delta : delta;
- }
- p->se.vruntime = new_cfsrq->min_vruntime + delta;
- }
+ p->se.vruntime -= old_cfsrq->min_vruntime -
+ new_cfsrq->min_vruntime;
__set_task_cpu(p, new_cpu);
}
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 500ed14..761071d 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -283,10 +283,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
struct sched_entity,
run_node);
- if (vruntime == cfs_rq->min_vruntime)
- vruntime = se->vruntime;
- else
- vruntime = min_vruntime(vruntime, se->vruntime);
+ vruntime = min_vruntime(vruntime, se->vruntime);
}
cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
@@ -677,9 +674,13 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
unsigned long thresh = sysctl_sched_latency;
/*
- * convert the sleeper threshold into virtual time
+ * Convert the sleeper threshold into virtual time.
+ * SCHED_IDLE is a special sub-class. We care about
+ * fairness only relative to other SCHED_IDLE tasks,
+ * all of which have the same weight.
*/
- if (sched_feat(NORMALIZED_SLEEPER))
+ if (sched_feat(NORMALIZED_SLEEPER) &&
+ task_of(se)->policy != SCHED_IDLE)
thresh = calc_delta_fair(thresh, se);
vruntime -= thresh;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists