[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46A66393.5000705@redhat.com>
Date: Tue, 24 Jul 2007 16:39:47 -0400
From: Chris Snook <csnook@...hat.com>
To: Chris Friesen <cfriesen@...tel.com>
CC: Tong Li <tong.n.li@...el.com>, mingo@...e.hu,
linux-kernel@...r.kernel.org
Subject: Re: [RFC] scheduler: improve SMP fairness in CFS
Chris Friesen wrote:
> Chris Snook wrote:
>
>> Concerns aside, I agree that fairness is important, and I'd really
>> like to see a test case that demonstrates the problem.
>
> One place that might be useful is the case of fairness between resource
> groups, where the load balancer needs to consider each group separately.
You mean like the CFS group scheduler patches? I don't see how this patch is
related to that, besides working on top of it.
> Now it may be the case that trying to keep the load of each class within
> X% of the other cpus is sufficient, but it's not trivial.
I agree. My suggestion is that we try being fair from the bottom-up, rather
than top-down. If most of the rebalancing is local, we can minimize expensive
locking and cross-node migrations, and scale very nicely on large NUMA boxes.
> Consider the case where you have a resource group that is allocated 50%
> of each cpu in a dual cpu system, and only have a single task in that
> group. This means that in order to make use of the full group
> allocation, that task needs to be load-balanced to the other cpu as soon
> as it gets scheduled out. Most load-balancers can't handle that kind of
> granularity, but I have guys in our engineering team that would really
> like this level of performance.
Divining the intentions of the administrator is an AI-complete problem and we're
not going to try to solve that in the kernel. An intelligent administrator
could also allocate 50% of each CPU to a resource group containing all the
*other* processes. Then, when the other processes are scheduled out, your
single task will run on whichever CPU is idle. This will very quickly
equilibrate to the scheduling ping-pong you seem to want. The scheduler
deliberately avoids this kind of migration by default because it hurts cache and
TLB performance, so if you want to override this very sane default behavior,
you're going to have to explicitly configure it yourself.
> We currently use CKRM on an SMP machine, but the only way we can get
> away with it is because our main app is affined to one cpu and just
> about everything else is affined to the other.
If you're not explicitly allocating resources, you're just low-latency, not
truly realtime. Realtime requires guaranteed resources, so messing with
affinities is a necessary evil.
> We have another SMP box that would benefit from group scheduling, but we
> can't use it because the load balancer is not nearly good enough.
Which scheduler? Have you tried the CFS group scheduler patches?
-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists