linux-kernel - Re: fair clock use in CFS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4648D381.6040808@cs.umass.edu>
Date:	Mon, 14 May 2007 17:24:17 -0400
From:	Ting Yang <tingy@...umass.edu>
To:	Ingo Molnar <mingo@...e.hu>
CC:	William Lee Irwin III <wli@...omorphy.com>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>, efault@....de,
	linux-kernel@...r.kernel.org
Subject: Re: fair clock use in CFS

Ingo Molnar wrote:
> * William Lee Irwin III <wli@...omorphy.com> wrote:
>
>   
>> On Mon, May 14, 2007 at 12:31:20PM +0200, Ingo Molnar wrote:
>>     
>>> please clarify - exactly what is a mistake? Thanks,
>>>       
>> The variability in ->fair_clock advancement rate was the mistake, at 
>> least according to my way of thinking. [...]
>>     
>
> you are quite wrong. Lets consider the following example:
>
> we have 10 tasks running (all at nice 0). The current task spends 20 
> msecs on the CPU and a new task is picked. How much CPU time did that 
> waiting task get entitled to during its 20 msecs wait? If fair_clock was 
> constant as you suggest then we'd give it 20 msecs - but its true 'fair 
> expectation' of CPU time was only 20/10 == 2 msecs!
>   
Exactly, once the queue virtual time is used, all other measures should 
be scaled onto virtual time for consistency, since at different time a 
unit of virtual time maps different amount wall clock time.

In CFS, the p->wait_runtime is in fact the lag against the ideal system, 
that is the difference between the amount of "really" done so far and 
the amount of work "should" be done so far. CFS really keeps a record 
for each task indicates how far away it is from the "should" case. If a 
task has p->wait_runtime = 0, it must have just received the exact share 
it entitled till now. Similarly negative means  faster than it "should" 
and positive  means slower than it "should".

I guess CFS may provides better "fairness" if it controls the negative 
wait_runtime can be accumulated by  a task. Higher priority allows more 
negative to be accumulated and low priority allows less. CFS has already 
done so by scaling granularity of preemption based on weight, the only 
issue is that the amount of negative wait_runtime can be accumulated is 
proportional to weight, which potentially can be O(n).

It is possible to do something like this in check_preemption ?

       delta = curr->fair_key - first->fair_key;

       if (delta > ??? [scale it as you wish] ||
            (curr->key > first->key) && (curr->wait_runtime > ??? 
[simple funtion of curr->weight]) )
              preempt

A limit control on wait_runtime may prevent a high weight task from 
running for too long, so that others get executed a little earlier. Just 
a thought :-)

Ting.

Ting

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/