linux-kernel - Re: [RFC] scheduler: improve SMP fairness in CFS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <46ABFE2D.1060505@redhat.com>
Date:	Sat, 28 Jul 2007 22:40:45 -0400
From:	Chris Snook <csnook@...hat.com>
To:	Tong Li <tong.n.li@...el.com>
CC:	"Bill Huey (hui)" <billh@...ppy.monkey.org>,
	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org
Subject: Re: [RFC] scheduler: improve SMP fairness in CFS

Tong Li wrote:
> On Fri, 27 Jul 2007, Chris Snook wrote:
> 
>> Bill Huey (hui) wrote:
>>> You have to consider the target for this kind of code. There are 
>>> applications
>>> where you need something that falls within a constant error bound. 
>>> According
>>> to the numbers, the current CFS rebalancing logic doesn't achieve 
>>> that to
>>> any degree of rigor. So CFS is ok for SCHED_OTHER, but not for 
>>> anything more
>>> strict than that.
>>
>> I've said from the beginning that I think that anyone who desperately 
>> needs perfect fairness should be explicitly enforcing it with the aid 
>> of realtime priorities.  The problem is that configuring and tuning a 
>> realtime application is a pain, and people want to be able to 
>> approximate this behavior without doing a whole lot of dirty work 
>> themselves.  I believe that CFS can and should be enhanced to ensure 
>> SMP-fairness over potentially short, user-configurable intervals, even 
>> for SCHED_OTHER.  I do not, however, believe that we should take it to 
>> the extreme of wasting CPU cycles on migrations that will not improve 
>> performance for *any* task, just to avoid letting some tasks get ahead 
>> of others.  We should be as fair as possible but no fairer.  If we've 
>> already made it as fair as possible, we should account for the margin 
>> of error and correct for it the next time we rebalance.  We should not 
>> burn the surplus just to get rid of it.
> 
> Proportional-share scheduling actually has one of its roots in real-time 
> and having a p-fair scheduler is essential for real-time apps (soft 
> real-time).

Sounds like another scheduler class might be in order.  I find CFS to be 
fair enough for most purposes.  If the code that gives us near-perfect 
fairness at the expense of efficiency only runs when tasks have been 
given boosted priority by a privileged user, and only on the CPUs that 
have such tasks queued on them, the run time overhead and code 
complexity become much smaller concerns.

>>
>> On a non-NUMA box with single-socket, non-SMT processors, a constant 
>> error bound is fine.  Once we add SMT, go multi-core, go NUMA, and add 
>> inter-chassis interconnects on top of that, we need to multiply this 
>> error bound at each stage in the hierarchy, or else we'll end up 
>> wasting CPU cycles on migrations that actually hurt the processes 
>> they're supposed to be helping, and hurt everyone else even more.  I 
>> believe we should enforce an error bound that is proportional to 
>> migration cost.
>>
> 
> I think we are actually in agreement. When I say constant bound, it can 
> certainly be a constant that's determined based on inputs from the 
> memory hierarchy. The point is that it needs to be a constant 
> independent of things like # of tasks.

Agreed.

>> But this patch is only relevant to SCHED_OTHER.  The realtime 
>> scheduler doesn't have a concept of fairness, just priorities.  That 
>> why each realtime priority level has its own separate runqueue.  
>> Realtime schedulers are supposed to be dumb as a post, so they cannot 
>> heuristically decide to do anything other than precisely what you 
>> configured them to do, and so they don't get in the way when you're 
>> context switching a million times a second.
> 
> Are you referring to hard real-time? As I said, an infrastructure that 
> enables p-fair scheduling, EDF, or things alike is the foundation for 
> real-time. I designed DWRR, however, with a target of non-RT apps, 
> although I was hoping the research results might be applicable to RT.

I'm referring to the static priority SCHED_FIFO and SCHED_RR schedulers, 
which are (intentionally) dumb as a post, allowing userspace to manage 
CPU time explicitly.  Proportionally fair scheduling is a cool 
capability, but not a design goal of those schedulers.

	-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/