linux-kernel - Re: [ANNOUNCE/RFC] Really Fair Scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1189047837.6521.37.camel@excalibur>
Date:	Thu, 06 Sep 2007 05:03:57 +0200
From:	Syren Baran <sbaran@....de>
To:	Roman Zippel <zippel@...ux-m68k.org>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: [ANNOUNCE/RFC] Really Fair Scheduler

Am Montag, den 03.09.2007, 04:58 +0200 schrieb Roman Zippel:
> Hi,
> 
> On Sun, 2 Sep 2007, Ingo Molnar wrote:
> 
> > > Did you even try to understand what I wrote? I didn't say that it's a 
> > > "common problem", it's a conceptual problem. The rounding has been 
> > > improved lately, so it's not as easy to trigger with some simple busy 
> > > loops.
> > 
> > As i mentioned it in my first reply to you i really welcome your changes 
> > and your interest in the Linux scheduler, but i'm still somewhat 
> > surprised why you focus on rounding so much (and why you attack CFS's 
> > math implementation so vehemently and IMO so unfairly) - and we had this 
> > discussion before in the "CFS review" thread that you started.
> 
> The rounding is insofar a problem as it makes it very difficult to produce 
> a correct mathematical model of CFS, which can be used to verify the 
> working of code.
> With the recent rounding changes they have indeed little effect in the 
> short term, especially as the error is distributed equally to the task, so 
> every task gets relatively the same time, the error still exists 
> nonetheless and adds up over time (although with the rounding changes it 
> doesn't grow into one direction anymore).
So, its like i roll a dice repeatedly and subtract 3.5 every time? Even
in the long run the the result should be close to zero.
>  The problem is now the limiting, 
> which you can't remove as long as the error exists. Part of this problem
> is that CFS doesn't really maintain a balanced sum of the available 
> runtime, i.e. the sum of all updated wait_runtime values of all active 
> tasks should be zero. This means under complex loads it's possible to hit 
> the wait_runtime limits and the overflow/underflow time is lost in the 
> calculation and this value can be easily in the millisecond range.
> To be honest I have no idea how real this problem in the praxis is right 
> now, quite possibly people will only see it as a small glitch and in most 
> cases they won't even notice.
So, given the above example, if the result of the dice rolls ever
exceeds +/- 1,000,000 i´ll get some ugly timing glitches? As the number
of dice rolls grows towards infinity the chance of remaining within this
boundary goes steadily towards 0%.
What does this equate to in the real world? Weeks, month, years,
milennia?

>  Previously it had been rather easy to 
> trigger these problems due to the rounding problems. The main problem for 
> me here is now that with any substantial change in CFS I'm running risk of 
> making this worse and noticable again somehow. For example CFS relies on 
> the 64bit math to have enough resolution so that the errors aren't 
> noticable.
> That's why I needed a mathematical model I can work with and with it for 
> example I can easily downscale the math to 32bit and I know exactly the 
> limits within it will work correctly.

If i understand the problem correctly these errors occur due to the fact
that delta-values are used instead of recalculating the "fair" process
time every "dice" roll?
Somehow the dead simple solution would seem to "reset" or "reinitialise"
the "dice" roll every million rolls or so.
Or, to put this into context again, figure out how large the accumulated
error would have to be to be noticable. Then figure out how long it
would take for such an error occur in the real world and just
reinitialise the damn thing. Should´nt be too complicated stochastics.

Greetings,
Syren

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/