linux-kernel - Re: [ANNOUNCE/RFC] Really Simple Really Fair Scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0709041244080.1817@scrub.home>
Date:	Tue, 4 Sep 2007 13:21:40 +0200 (CEST)
From:	Roman Zippel <zippel@...ux-m68k.org>
To:	Ingo Molnar <mingo@...e.hu>
cc:	linux-kernel@...r.kernel.org, peterz@...radead.org,
	Mike Galbraith <efault@....de>
Subject: Re: [ANNOUNCE/RFC] Really Simple Really Fair Scheduler

Hi,

On Tue, 4 Sep 2007, Ingo Molnar wrote:

> and what about the mirror image problem?

Sorry, I'm not familiar with that in a scheduler context.

> Your math assumes that tasks 
> use up their full timeslices, somewhere starting at (12):
> 
> | (12)    time_norm_app = sum_{t}^{T}(time_norm_{t} * weight_{t}) /
> |                                            sum_{t}^{T}(weight_{t})
> |
> | This produces only a approximate normalized time, but if all 
> | time_norm_{t} are equal (i.e. all tasks got their share), the result 
> | is the same, thus the error is only temporary.
> 
> >From here on the reasoning and the math is flawed i believe: it assumes 
> that tasks use up their timeslices - which is rarely the case.

I'm not sure what you're refering to. The remaining math basically only 
deals with the average calculation and if we're still at your simplified 
model, then it's largely irrevelant, because the approximation error only 
comes into effect with different weights. This means for the common case, 
you can assume my code produces a perfect average.

> In practical terms: your code causes unfairness to tasks that sleep when 
> they have used up only a fraction of their slice, when they are in a 
> fairness-deficit. For example consider a task that just waited alot to 
> get on the CPU, and when it finally got there (gathering a fairness 
> deficit of say 9 milliseconds), a hundred microseconds later it sleeps.
> 
> The problem is that when it wakes up and gets back on the runqueue your 
> code "forgets" that the task is in "need of fairness" by 9 milliseconds: 
> the task continues as if its previous starvation didnt happen at all. 

Please put this into more context, e.g. demonstrate how this problem 
doesn't exist in CFS. My code should largely mirror CFS behaviour with the 
exception that the bonus maybe isn't as big and this:

        if (sysctl_sched_features & SCHED_FEAT_SLEEPER_AVG)
                delta_fair = div64_likely32((u64)delta_fair * load,
                                                load + se->load.weight);

> This effect can cause _far more noise_ and even systematic starvation 
> than any numeric rounding effects could cause. (This could perhaps 
> explain the unfairness Mike reported, and this could perhaps explain the 
> noisy hackbench results i'm seeing with your patch - although i'm not 
> certain about this, i havent looked into those usecases in detail.)

The problem Mike reported looks more like a bug, I'm puzzled why the 
verify code didn't trigger as the whole thing was pretty much out of sync, 
so in the next version I'll add a few more checks, but I'm not finished 
with the hackbench results yet.

> CFS's current math addresses this problem via the use of the fair-clock 
> metric and via ->wait_runtime: that gives us a good idea what happened 
> to the system while the task slept. With your model, that information is 
> not maintained at all, and is not regainable. At the moment i dont see 
> how we could achieve good sleeper behavior with your model, you've 
> over-simplified the scheduler metrics and we've lost essential 
> information.

Please describe this sleeper behaviour, what kind of sleeper do you have 
in mind exactly? If we take for example interactive tasks, these should be 
constantly behind other running tasks and thus have an advantage at 
getting the cpu, i.e. if a task doesn't use up its time slice its runtime 
value won't advance as much as of other tasks which do use their slice.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/