linux-kernel - Re: lmbench ctxsw regression with CFS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 1 Aug 2007 19:31:26 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Nick Piggin <npiggin@...e.de>
cc:	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: lmbench ctxsw regression with CFS

On Thu, 2 Aug 2007, Nick Piggin wrote:
> 
> lmbench 3 lat_ctx context switching time with 2 processes bound to a
> single core increases by between 25%-35% on my Core2 system (didn't do
> enough runs to get more significance, but it is around 30%). The problem
> bisected to the main CFS commit.

One thing to check out is whether the lmbench numbers are "correct". 
Especially on SMP systems, the lmbench numbers are actually *best* when 
the two processes run on the same CPU, even though that's not really at 
all the best scheduling - it's just that it artificially improves lmbench 
numbers because of the close cache affinity for the pipe data structures.

So when running the lmbench scheduling benchmarks on SMP, it actually 
makes sense to run them *pinned* to one CPU, because then you see the true 
scheduler performance. Otherwise you easily get noise due to balancing 
issues, and a clearly better scheduler can in fact generate worse 
numbers for lmbench.

Did you do that? It's at least worth testing. I'm not saying it's the case 
here, but it's one reason why lmbench3 has the option to either keep 
processes on the same CPU or force them to spread out (and both cases are 
very interesting for scheduler testing, and tell different things: the 
"pin them to the same CPU" shows the latency on one runqueue, while the 
"pin them to different CPU's" shows the latency of a remote wakeup).

IOW, while we used the lmbench scheduling benchmark pretty extensively in 
early scheduler tuning, if you select the defaults ("let the system just 
schedule processes on any CPU") the end result really isn't necessarily a 
very meaningful value: getting the best lmbench numbers actually requires 
you to do things that tend to be actively *bad* in real life.

Of course, a perfect scheduler would notice when two tasks are *so* 
closely related and only do synchronous wakups, that it would keep them on 
the same core, and get the best possible scores for lmbench, while not 
doing that for other real-life situations. So with a *really* smart 
scheduler, lmbench numbers would always be optimal, but I'm not sure 
aiming for that kind of perfection is even worth it!

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/