[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200708290719.24422.a1426z@gawab.com>
Date: Wed, 29 Aug 2007 07:19:24 +0300
From: Al Boldi <a1426z@...ab.com>
To: Ingo Molnar <mingo@...e.hu>,
Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Mike Galbraith <efault@....de>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Subject: Re: CFS review
Ingo Molnar wrote:
> * Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> > On Tue, 28 Aug 2007, Al Boldi wrote:
> > > I like your analysis, but how do you explain that these stalls
> > > vanish when __update_curr is disabled?
> >
> > It's entirely possible that what happens is that the X scheduling is
> > just a slightly unstable system - which effectively would turn a small
> > scheduling difference into a *huge* visible difference.
>
> i think it's because disabling __update_curr() in essence removes the
> ability of scheduler to preempt tasks - that hack in essence results in
> a non-scheduler. Hence the gears + X pair of tasks becomes a synchronous
> pair of tasks in essence - and thus gears cannot "overload" X.
I have narrowed it down a bit to add_wait_runtime.
Patch 2.6.22.5-v20.4 like this:
346- * the two values are equal)
347- * [Note: delta_mine - delta_exec is negative]:
348- */
349:// add_wait_runtime(cfs_rq, curr, delta_mine - delta_exec);
350-}
351-
352-static void update_curr(struct cfs_rq *cfs_rq)
When disabling add_wait_runtime the stalls are gone. With this change the
scheduler is still usable, but it does not constitute a fix.
Now, even with this hack, uneven nice-levels between X and gears causes a
return of the stalls, so make sure both X and gears run on the same
nice-level when testing.
Again, the whole point of this workload is to expose scheduler glitches
regardless of whether X is broken or not, and my hunch is that this problem
looks suspiciously like an ia-boosting bug. What's important to note is
that by adjusting the scheduler we can effect a correction in behaviour, and
as such should yield this problem as fixable.
It's probably a good idea to look further into add_wait_runtime.
Thanks!
--
Al
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists