[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46ABFE2D.1060505@redhat.com>
Date: Sat, 28 Jul 2007 22:40:45 -0400
From: Chris Snook <csnook@...hat.com>
To: Tong Li <tong.n.li@...el.com>
CC: "Bill Huey (hui)" <billh@...ppy.monkey.org>,
Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org
Subject: Re: [RFC] scheduler: improve SMP fairness in CFS
Tong Li wrote:
> On Fri, 27 Jul 2007, Chris Snook wrote:
>
>> Bill Huey (hui) wrote:
>>> You have to consider the target for this kind of code. There are
>>> applications
>>> where you need something that falls within a constant error bound.
>>> According
>>> to the numbers, the current CFS rebalancing logic doesn't achieve
>>> that to
>>> any degree of rigor. So CFS is ok for SCHED_OTHER, but not for
>>> anything more
>>> strict than that.
>>
>> I've said from the beginning that I think that anyone who desperately
>> needs perfect fairness should be explicitly enforcing it with the aid
>> of realtime priorities. The problem is that configuring and tuning a
>> realtime application is a pain, and people want to be able to
>> approximate this behavior without doing a whole lot of dirty work
>> themselves. I believe that CFS can and should be enhanced to ensure
>> SMP-fairness over potentially short, user-configurable intervals, even
>> for SCHED_OTHER. I do not, however, believe that we should take it to
>> the extreme of wasting CPU cycles on migrations that will not improve
>> performance for *any* task, just to avoid letting some tasks get ahead
>> of others. We should be as fair as possible but no fairer. If we've
>> already made it as fair as possible, we should account for the margin
>> of error and correct for it the next time we rebalance. We should not
>> burn the surplus just to get rid of it.
>
> Proportional-share scheduling actually has one of its roots in real-time
> and having a p-fair scheduler is essential for real-time apps (soft
> real-time).
Sounds like another scheduler class might be in order. I find CFS to be
fair enough for most purposes. If the code that gives us near-perfect
fairness at the expense of efficiency only runs when tasks have been
given boosted priority by a privileged user, and only on the CPUs that
have such tasks queued on them, the run time overhead and code
complexity become much smaller concerns.
>>
>> On a non-NUMA box with single-socket, non-SMT processors, a constant
>> error bound is fine. Once we add SMT, go multi-core, go NUMA, and add
>> inter-chassis interconnects on top of that, we need to multiply this
>> error bound at each stage in the hierarchy, or else we'll end up
>> wasting CPU cycles on migrations that actually hurt the processes
>> they're supposed to be helping, and hurt everyone else even more. I
>> believe we should enforce an error bound that is proportional to
>> migration cost.
>>
>
> I think we are actually in agreement. When I say constant bound, it can
> certainly be a constant that's determined based on inputs from the
> memory hierarchy. The point is that it needs to be a constant
> independent of things like # of tasks.
Agreed.
>> But this patch is only relevant to SCHED_OTHER. The realtime
>> scheduler doesn't have a concept of fairness, just priorities. That
>> why each realtime priority level has its own separate runqueue.
>> Realtime schedulers are supposed to be dumb as a post, so they cannot
>> heuristically decide to do anything other than precisely what you
>> configured them to do, and so they don't get in the way when you're
>> context switching a million times a second.
>
> Are you referring to hard real-time? As I said, an infrastructure that
> enables p-fair scheduling, EDF, or things alike is the foundation for
> real-time. I designed DWRR, however, with a target of non-RT apps,
> although I was hoping the research results might be applicable to RT.
I'm referring to the static priority SCHED_FIFO and SCHED_RR schedulers,
which are (intentionally) dumb as a post, allowing userspace to manage
CPU time explicitly. Proportionally fair scheduling is a cool
capability, but not a design goal of those schedulers.
-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists