linux-kernel - Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070413225216.GA11384@elte.hu>
Date:	Sat, 14 Apr 2007 00:52:16 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	William Lee Irwin III <wli@...omorphy.com>
Cc:	linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Con Kolivas <kernel@...ivas.org>,
	Nick Piggin <npiggin@...e.de>, Mike Galbraith <efault@....de>,
	Arjan van de Ven <arjan@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]


* William Lee Irwin III <wli@...omorphy.com> wrote:

> On Fri, Apr 13, 2007 at 10:21:00PM +0200, Ingo Molnar wrote:
> > [announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
> > i'm pleased to announce the first release of the "Modular Scheduler Core
> > and Completely Fair Scheduler [CFS]" patchset:
> >    http://redhat.com/~mingo/cfs-scheduler/sched-modular+cfs.patch
> > This project is a complete rewrite of the Linux task scheduler. My goal
> > is to address various feature requests and to fix deficiencies in the
> > vanilla scheduler that were suggested/found in the past few years, both
> > for desktop scheduling and for server scheduling workloads.
> > [ QuickStart: apply the patch to v2.6.21-rc6, recompile, reboot. The
> >   new scheduler will be active by default and all tasks will default
> >   to the new SCHED_FAIR interactive scheduling class. ]
> 
> A pleasant surprise, though I did see it coming.

hey ;)

> On Fri, Apr 13, 2007 at 10:21:00PM +0200, Ingo Molnar wrote:
> > Highlights are:
> >  - the introduction of Scheduling Classes: an extensible hierarchy of
> >    scheduler modules. These modules encapsulate scheduling policy
> >    details and are handled by the scheduler core without the core
> >    code assuming about them too much.
> 
> It probably needs further clarification that they're things on the 
> order of SCHED_FIFO, SCHED_RR, SCHED_NORMAL, etc.; some prioritization 
> amongst the classes is furthermore assumed, and so on. [...]

yep - they are linked via sched_ops->next pointer, with NULL delimiting 
the last one.

> [...] They're not quite capable of being full-blown alternative 
> policies, though quite a bit can be crammed into them.

yeah, they are not full-blown: i extended them on-demand, for the 
specific purposes of sched_fair.c and sched_rt.c. More can be done too.

> There are issues with the per- scheduling class data not being very 
> well-abstracted. [...]

yes. It's on my TODO list: i'll work more on extending the cleanups to 
those fields too.

> A binomial heap would likely serve your purposes better than rbtrees. 
> It's faster to have the next item to dequeue at the root of the tree 
> structure rather than a leaf, for one. There are, of course, other 
> priority queue structures (e.g. van Emde Boas) able to exploit the 
> limited precision of the priority key for faster asymptotics, though 
> actual performance is an open question.

i'm caching the leftmost leaf, which serves as an alternate, task-pick 
centric root in essence.

> Another advantage of heaps is that they support decreasing priorities 
> directly, so that instead of removal and reinsertion, a less invasive 
> movement within the tree is possible. This nets additional constant 
> factor improvements beyond those for the next item to dequeue for the 
> case where a task remains runnable, but is preempted and its priority 
> decreased while it remains runnable.

yeah. (Note that in CFS i'm not decreasing priorities anywhere though - 
all the priority levels in CFS stay constant, fairness is not achieved 
via rotating priorities or similar, it is achieved via the accounting 
code.)

> On Fri, Apr 13, 2007 at 10:21:00PM +0200, Ingo Molnar wrote:
> >    due to its design, the CFS scheduler is not prone to any of the
> >    'attacks' that exist today against the heuristics of the stock
> >    scheduler: fiftyp.c, thud.c, chew.c, ring-test.c, massive_intr.c all
> >    work fine and do not impact interactivity and produce the expected
> >    behavior.
> 
> I'm always suspicious of these claims.  [...]

hey, sure - but please give it a go nevertheless, i _did_ test all these 
;)

> A moderately formal regression test suite needs to be assembled [...]

by all means feel free! ;)

> A more general question here is what you mean by "completely fair;"

by that i mean the most common-sense definition: with N tasks running 
each gets 1/N CPU time if observed for a reasonable amount of time. Now 
extend this to arbitrary scheduling patterns, the end result should 
still be completely fair, according to the fundamental 1/N(time) rule 
individually applied to all the small scheduling patterns that the 
scheduling patterns give. (this assumes that the scheduling patterns are 
reasonably independent of each other - if they are not then there's no 
reasonable definition of fairness that makes sense, and we might as well 
use the 1/N rule for those cases too.)

> there doesn't appear to be inter-tgrp, inter-pgrp, inter-session, or 
> inter-user fairness going on, though one might argue those are 
> relatively obscure notions of fairness. [...]

sure, i mainly concentrated on what we have in Linux today. The things 
you mention are add-ons that i can see handling via new scheduling 
classes: all the CKRM and containers type of CPU time management 
facilities.

> What these things mean when there are multiple CPU's to schedule 
> across may also be of concern.

that is handled by the existing smp-nice load balancer, that logic is 
preserved under CFS.

> These testcases are oblivious to SMP. This will demand that a 
> scheduling policy integrate with load balancing to the extent that 
> load balancing occurs for the sake of distributing CPU bandwidth 
> according to nice level. Some explicit decision should be made 
> regarding that.

this should already work reasonably fine with CFS: try massive_intr.c on 
an SMP box.

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/