linux-kernel - Re: [RFC] scheduler: improve SMP fairness in CFS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <46A66393.5000705@redhat.com>
Date:	Tue, 24 Jul 2007 16:39:47 -0400
From:	Chris Snook <csnook@...hat.com>
To:	Chris Friesen <cfriesen@...tel.com>
CC:	Tong Li <tong.n.li@...el.com>, mingo@...e.hu,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC] scheduler: improve SMP fairness in CFS

Chris Friesen wrote:
> Chris Snook wrote:
> 
>> Concerns aside, I agree that fairness is important, and I'd really 
>> like to see a test case that demonstrates the problem.
> 
> One place that might be useful is the case of fairness between resource 
> groups, where the load balancer needs to consider each group separately.

You mean like the CFS group scheduler patches?  I don't see how this patch is 
related to that, besides working on top of it.

> Now it may be the case that trying to keep the load of each class within 
> X% of the other cpus is sufficient, but it's not trivial.

I agree.  My suggestion is that we try being fair from the bottom-up, rather 
than top-down.  If most of the rebalancing is local, we can minimize expensive 
locking and cross-node migrations, and scale very nicely on large NUMA boxes.

> Consider the case where you have a resource group that is allocated 50% 
> of each cpu in a dual cpu system, and only have a single task in that 
> group.  This means that in order to make use of the full group 
> allocation, that task needs to be load-balanced to the other cpu as soon 
> as it gets scheduled out.  Most load-balancers can't handle that kind of 
> granularity, but I have guys in our engineering team that would really 
> like this level of performance.

Divining the intentions of the administrator is an AI-complete problem and we're 
not going to try to solve that in the kernel.  An intelligent administrator 
could also allocate 50% of each CPU to a resource group containing all the 
*other* processes.  Then, when the other processes are scheduled out, your 
single task will run on whichever CPU is idle.  This will very quickly 
equilibrate to the scheduling ping-pong you seem to want.  The scheduler 
deliberately avoids this kind of migration by default because it hurts cache and 
TLB performance, so if you want to override this very sane default behavior, 
you're going to have to explicitly configure it yourself.

> We currently use CKRM on an SMP machine, but the only way we can get 
> away with it is because our main app is affined to one cpu and just 
> about everything else is affined to the other.

If you're not explicitly allocating resources, you're just low-latency, not 
truly realtime.  Realtime requires guaranteed resources, so messing with 
affinities is a necessary evil.

> We have another SMP box that would benefit from group scheduling, but we 
> can't use it because the load balancer is not nearly good enough.

Which scheduler?  Have you tried the CFS group scheduler patches?

	-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/