linux-kernel - Re: [RFC] Extend Linux to support proportional-share scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Wed, 06 Jun 2007 09:26:30 -0400
From:	Bill Davidsen <davidsen@....com>
To:	Willy Tarreau <w@....eu>
CC:	"Li, Tong N" <tong.n.li@...el.com>, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...e.hu>, Con Kolivas <kernel@...ivas.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Arjan van de Ven <arjan@...radead.org>,
	"Siddha, Suresh B" <suresh.b.siddha@...el.com>,
	"Barnes, Jesse" <jesse.barnes@...el.com>,
	William Lee Irwin III <wli@...omorphy.com>,
	"Bill Huey (hui)" <billh@...ppy.monkey.org>, vatsa@...ibm.com,
	balbir@...ibm.com, Nick Piggin <npiggin@...e.de>,
	John Kingman <kingman@...ragegear.com>,
	Peter Williams <pwil3058@...pond.net.au>, seuler.shi@...il.com
Subject: Re: [RFC] Extend Linux to support proportional-share scheduling

Willy Tarreau wrote:
> On Tue, Jun 05, 2007 at 09:31:33PM -0700, Li, Tong N wrote:
>   
>> Willy,
>>
>> These are all good comments. Regarding the cache penalty, I've done some
>> measurements using benchmarks like SPEC OMP on an 8-processor SMP and
>> the performance with this patch was nearly identical to that with the
>> mainline. I'm sure some apps may suffer from the potentially more
>> migrations with this design. In the end, I think what we want is to
>> balance fairness and performance. This design currently emphasizes on
>> fairness, but it could be changed to relax fairness when performance
>> does become an issue (which could even be a user-tunable knob depending
>> on which aspect the user cares more).
>>     
>
> Maybe storing in each task a small list of the 2 or 4 last CPUs used would
> help the scheduler in trying to place them. I mean, let's say you have 10
> tasks and 8 CPUs. You first assign tasks 1..8 CPUs 1..8 for 1 timeslice.
> Then you will give 9..10 a run on CPUs 1..2, and CPUs 3..8 will be usable
> for other tasks. It wil be optimal to run tasks 3..8 on them. Then you will
> stop some of those because they are "in advance", and run 9..10 and 1..2
> again. You'll have to switch 1..2 to another group of CPUs to maintain hot
> cache on CPUs 1..2 for tasks 9..10. But another possibility would be to
> consider that 9..10 and 1..2 have performed the same amount of work, so
> let's 9..10 take some advance and benefit from the hot cache, then try to
> place 1..2 there again. But it will mean that 3..8 will now have run 2
> timeslices more than others. At this moment, it should be wise to make
> them sleep and keep their CPU history for future use.
>
> Maybe on end-user systems, the CPUs history is not that important because
> of the often small caches, but on high-end systems with large L2/L3 caches,
> I think that we can often keep several tasks in the cache, justifying the
> ability to select one of the last CPUs used.
>
>   
CPU affinity to preserve cache is a very delicate balance. It makes 
sense to try to run a process on the same CPU, but since even a few ms 
of running some other process is long enough to refill the cache with 
new contents (depending on what it does, obviously) that long delays in 
running a process to get it on the "right CPU" are not always a saving, 
using the previous CPU becomes less beneficial rapidly.

Some omnipotent scheduler would have a count of pages evicted from cache 
as process A runs, and deduct that from the affinity of process B 
previously on the same CPU. Then make a perfect decision when it's 
better to migrate the task and how far. Since the schedulers now being 
advanced are "fair" rather than "perfect," everyone is making educated 
guesses on optimal process migration policy, migrating all threads to 
improve cache hit vs. spread them to better run threads in parallel, etc.

For a desktop I want a scheduler which "doesn't suck" at the things I do 
regularly. For a server I'm more concerned with overall tps than the 
latency of one transaction. Most users would trade a slowdown in kernel 
compiles for being able to watch youtube while the compile runs, and 
conversely people with heavily loaded servers would usually trade a 
slower transaction for more of them per second. Obviously within 
reason... what people will tolerate is a bounded value.
> Not an easy thing to do, but probably very complementary to your work IMHO.
>   
Agree, not easy at all.

-- 
bill davidsen <davidsen@....com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/