[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.0.999.0708011922500.3582@woody.linux-foundation.org>
Date: Wed, 1 Aug 2007 19:31:26 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Nick Piggin <npiggin@...e.de>
cc: Ingo Molnar <mingo@...e.hu>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: lmbench ctxsw regression with CFS
On Thu, 2 Aug 2007, Nick Piggin wrote:
>
> lmbench 3 lat_ctx context switching time with 2 processes bound to a
> single core increases by between 25%-35% on my Core2 system (didn't do
> enough runs to get more significance, but it is around 30%). The problem
> bisected to the main CFS commit.
One thing to check out is whether the lmbench numbers are "correct".
Especially on SMP systems, the lmbench numbers are actually *best* when
the two processes run on the same CPU, even though that's not really at
all the best scheduling - it's just that it artificially improves lmbench
numbers because of the close cache affinity for the pipe data structures.
So when running the lmbench scheduling benchmarks on SMP, it actually
makes sense to run them *pinned* to one CPU, because then you see the true
scheduler performance. Otherwise you easily get noise due to balancing
issues, and a clearly better scheduler can in fact generate worse
numbers for lmbench.
Did you do that? It's at least worth testing. I'm not saying it's the case
here, but it's one reason why lmbench3 has the option to either keep
processes on the same CPU or force them to spread out (and both cases are
very interesting for scheduler testing, and tell different things: the
"pin them to the same CPU" shows the latency on one runqueue, while the
"pin them to different CPU's" shows the latency of a remote wakeup).
IOW, while we used the lmbench scheduling benchmark pretty extensively in
early scheduler tuning, if you select the defaults ("let the system just
schedule processes on any CPU") the end result really isn't necessarily a
very meaningful value: getting the best lmbench numbers actually requires
you to do things that tend to be actively *bad* in real life.
Of course, a perfect scheduler would notice when two tasks are *so*
closely related and only do synchronous wakups, that it would keep them on
the same core, and get the best possible scores for lmbench, while not
doing that for other real-life situations. So with a *really* smart
scheduler, lmbench numbers would always be optimal, but I'm not sure
aiming for that kind of perfection is even worth it!
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists