linux-kernel - Re: CFS review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200708290719.24422.a1426z@gawab.com>
Date:	Wed, 29 Aug 2007 07:19:24 +0300
From:	Al Boldi <a1426z@...ab.com>
To:	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Mike Galbraith <efault@....de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: CFS review

Ingo Molnar wrote:
> * Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> > On Tue, 28 Aug 2007, Al Boldi wrote:
> > > I like your analysis, but how do you explain that these stalls
> > > vanish when __update_curr is disabled?
> >
> > It's entirely possible that what happens is that the X scheduling is
> > just a slightly unstable system - which effectively would turn a small
> > scheduling difference into a *huge* visible difference.
>
> i think it's because disabling __update_curr() in essence removes the
> ability of scheduler to preempt tasks - that hack in essence results in
> a non-scheduler. Hence the gears + X pair of tasks becomes a synchronous
> pair of tasks in essence - and thus gears cannot "overload" X.

I have narrowed it down a bit to add_wait_runtime.

Patch 2.6.22.5-v20.4 like this:

346-     * the two values are equal)
347-     * [Note: delta_mine - delta_exec is negative]:
348-     */
349://  add_wait_runtime(cfs_rq, curr, delta_mine - delta_exec);
350-}
351-
352-static void update_curr(struct cfs_rq *cfs_rq)

When disabling add_wait_runtime the stalls are gone.  With this change the 
scheduler is still usable, but it does not constitute a fix.

Now, even with this hack, uneven nice-levels between X and gears causes a 
return of the stalls, so make sure both X and gears run on the same 
nice-level when testing.

Again, the whole point of this workload is to expose scheduler glitches 
regardless of whether X is broken or not, and my hunch is that this problem 
looks suspiciously like an ia-boosting bug.  What's important to note is 
that by adjusting the scheduler we can effect a correction in behaviour, and 
as such should yield this problem as fixable.

It's probably a good idea to look further into add_wait_runtime.

Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/