[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1175071091.29829.58.camel@Homer.simpson.net>
Date: Wed, 28 Mar 2007 10:38:11 +0200
From: Mike Galbraith <efault@....de>
To: Satoru Takeuchi <takeuchi_satoru@...fujitsu.com>
Cc: Linux Kernel <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...e.hu>
Subject: Re: [BUG] scheduler: strange behavor with massive interactive
processes
Greetings!
On Tue, 2007-03-27 at 07:04 +0200, Mike Galbraith wrote:
> On Tue, 2007-03-27 at 10:34 +0900, Satoru Takeuchi wrote:
> > When I was executing massive interactive processes, I found that some of them
> > occupy CPU time and the others hardly run.
> >
> > It seems that some of processes which occupy CPU time always has max effective
> > prio (default+5) and the others have max - 1. What happen here is...
> >
> > 1. If there are moderate number of max interactive processes, they can be
> > re-inserted into active queue without falling down its priority again and
> > again.
> > 2. In this case, the others seldom run, and can't get max effective priority
> > at next exhausting because scheduler considers them to sleep too long.
> > 3. Goto 1, OOPS!
>
> Ah, that's a new twist. Cool testcase. A mechanism which was put in
> specifically to prevent tasks from jumping straight to max interactive,
> and thus hurting truly interactive tasks, is starving these not truly
> interactive (but nothing says they couldn't be) tasks.
I puttered around with your testcase a bit, and didn't see interactive
tasks starving other interactive tasks so much as plain old interactive
tasks starving expired tasks, which they're supposed to be able to do,
but to a limited degree. Merely forcing array switches fixes that here,
however array switches can be _very_ rough on interactivity, so I tried
a hybrid of the forced array switch and...
> One way to prevent this may be to do something very simple like (I did
> this here a while back) upon timeslice completion (or possibly better,
> round-robin interval [whenever]), promote a task from a lower priority
> queue. Tasks are then always crawling up the ladder, and _will_ be
> inserted into the stream no matter what is going on in the system.
...the above, and it seems to work. Below is a rough (second) cut,
which tries to preserve interactivity, but also ensure that
non-interactive tasks _will_ receive cpu in a reasonable amount of time.
It only engages when we're not expiring non-interactive tasks, ie should
have no effect until we're exclusively running interactive tasks, but
then it should also prevent the interactive/interactive starvation
scenario as well. It doesn't attempt to enforce fairness, only prevent
terminal starvation. With this patch^W(mess?;) I got the numbers below,
whereas without it, the distribution was pretty ugly. I didn't try it
with many many attackers. (many many false interactive tasks is doomed
unless you detect and switch behavior)
Interactivity still seems to be fine with reasonable non-interactive
loads despite ~reserving more bandwidth for non-interactive tasks. Only
lightly tested, YMMV, and of course the standard guarantee applies ;)
-Mike
root@...er: ./massive_intr 20 180
009417 00003450
009419 00002965
009423 00003068
009406 00002565
009403 00002964
009421 00002949
009422 00002314
009420 00002879
009418 00002730
009415 00003126
009405 00002737
009408 00003664
009424 00003355
009414 00002947
009416 00004340
009407 00003424
009412 00003491
009404 00003101
009413 00002553
009425 00003396
2.6.21-rc5
--- kernel/sched.c.org 2007-03-27 15:47:49.000000000 +0200
+++ kernel/sched.c 2007-03-28 08:53:17.000000000 +0200
@@ -234,7 +234,8 @@ struct rq {
*/
unsigned long nr_uninterruptible;
- unsigned long expired_timestamp;
+ unsigned long switch_timestamp;
+ unsigned long last_expired;
/* Cached timestamp set by update_cpu_clock() */
unsigned long long most_recent_timestamp;
struct task_struct *curr, *idle;
@@ -3051,9 +3052,9 @@ static inline int expired_starving(struc
{
if (rq->curr->static_prio > rq->best_expired_prio)
return 1;
- if (!STARVATION_LIMIT || !rq->expired_timestamp)
+ if (!STARVATION_LIMIT)
return 0;
- if (jiffies - rq->expired_timestamp > STARVATION_LIMIT * rq->nr_running)
+ if (jiffies - rq->switch_timestamp > STARVATION_LIMIT * rq->nr_running)
return 1;
return 0;
}
@@ -3131,8 +3132,75 @@ void account_steal_time(struct task_stru
cpustat->steal = cputime64_add(cpustat->steal, tmp);
}
+/*
+ * Promote and requeue the next lower priority task. If no task
+ * is available in the active array, switch to the expired array.
+ * @rq: runqueue to search.
+ * @prio: priority at which to begin search.
+ */
+static inline int promote_next_lower(struct rq *rq, int prio)
+{
+ struct prio_array *array = rq->active;
+ unsigned long *bitmap;
+ struct task_struct *p = NULL;
+ unsigned long long now = rq->most_recent_timestamp;
+ int idx = prio + 1, found_noninteractive = 0;
+
+repeat:
+ bitmap = array->bitmap;
+ idx = find_next_bit(bitmap, MAX_PRIO, idx);
+ if (idx < MAX_PRIO) {
+ struct list_head *queue = array->queue + idx;
+
+ p = list_entry(queue->next, struct task_struct, run_list);
+ if (!TASK_INTERACTIVE(p))
+ found_noninteractive = 1;
+
+ /* Skip non-starved queues. */
+ if (now < p->last_ran + JIFFIES_TO_NS(HZ)) {
+ idx++;
+ p = NULL;
+ goto repeat;
+ }
+ } else if (!found_noninteractive && array == rq->active){
+ /* Nobody home. Check the expired array. */
+ array = rq->expired;
+ idx = 0;
+ p = NULL;
+ goto repeat;
+ }
+
+ /* Found one, requeue it. */
+ if (p) {
+ dequeue_task(p, p->array);
+ if (array == rq->active)
+ p->prio--;
+ /*
+ * If we pulled a task from the expired array, correct
+ * expired task info.
+ */
+ else {
+ idx = sched_find_first_bit(array->bitmap);
+ if (idx < MAX_PRIO) {
+ /* FIXME this isn't perfect */
+ if (rq->best_expired_prio > idx)
+ rq->best_expired_prio = idx;
+ } else {
+ rq->best_expired_prio = MAX_PRIO;
+#if 0
+ rq->switch_timestamp = jiffies;
+#endif
+ }
+ }
+ enqueue_task(p, rq->active);
+ }
+ return p != NULL;
+}
+
static void task_running_tick(struct rq *rq, struct task_struct *p)
{
+ static unsigned long warn_me = INITIAL_JIFFIES;
+ static int promoted;
if (p->array != rq->active) {
/* Task has expired but was not scheduled yet */
set_tsk_need_resched(p);
@@ -3162,20 +3230,31 @@ static void task_running_tick(struct rq
goto out_unlock;
}
if (!--p->time_slice) {
+ int was_interactive = TASK_INTERACTIVE(p);
+
dequeue_task(p, rq->active);
set_tsk_need_resched(p);
p->prio = effective_prio(p);
p->time_slice = task_timeslice(p);
p->first_time_slice = 0;
- if (!rq->expired_timestamp)
- rq->expired_timestamp = jiffies;
if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
enqueue_task(p, rq->expired);
if (p->static_prio < rq->best_expired_prio)
rq->best_expired_prio = p->static_prio;
- } else
+ if (!was_interactive)
+ rq->last_expired = jiffies;
+ } else {
+ int limit = STARVATION_LIMIT;
enqueue_task(p, rq->active);
+
+ /*
+ * Start promoting lower priority tasks if we haven't
+ * expired a task within STARVATION_LIMIT tics.
+ */
+ if (time_after(jiffies, rq->last_expired + limit))
+ promoted += promote_next_lower(rq, p->prio);
+ }
} else {
/*
* Prevent a too long timeslice allowing a task to monopolize
@@ -3204,6 +3283,12 @@ static void task_running_tick(struct rq
}
out_unlock:
spin_unlock(&rq->lock);
+ if (time_after(jiffies, warn_me)) {
+ if (promoted)
+ printk(KERN_ERR "%d tasks promoted\n", promoted);
+ promoted = 0;
+ warn_me = jiffies + HZ;
+ }
}
/*
@@ -3356,7 +3441,7 @@ need_resched_nonpreemptible:
idle_balance(cpu, rq);
if (!rq->nr_running) {
next = rq->idle;
- rq->expired_timestamp = 0;
+ rq->switch_timestamp = jiffies;
goto switch_tasks;
}
}
@@ -3370,7 +3455,7 @@ need_resched_nonpreemptible:
rq->active = rq->expired;
rq->expired = array;
array = rq->active;
- rq->expired_timestamp = 0;
+ rq->switch_timestamp = jiffies;
rq->best_expired_prio = MAX_PRIO;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists