linux-kernel - Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170328125022.75wlcrsnhobysxbj@hirez.programming.kicks-ass.net>
Date:   Tue, 28 Mar 2017 14:50:22 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Yuyang Du <yuyang.du@...el.com>
Cc:     mingo@...nel.org, linux-kernel@...r.kernel.org, pjt@...gle.com,
        bsegall@...gle.com, morten.rasmussen@....com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        matt@...eblueprint.co.uk, umgwanakikbuti@...il.com
Subject: Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched_avg()

This Changelog being so impenetrable is what makes me skip over it;
I'll put it on the 'look-at-later' pile, and that just never happens :/

On Mon, Feb 13, 2017 at 05:44:23AM +0800, Yuyang Du wrote:
> __update_load_avg() has the following steps:
> 
>   1. add the remainder of the last incomplete period
>   2. decay old sum
>   3. accumulate new sum in full periods since last_update_time
>   4. accumulate the current incomplete period
>   5. update averages
> 
> However, there is no need to separately compute steps 1, 3, and 4.
> 
> Illustation:
> 
>              c1          c3           c4
>              ^           ^            ^
>              |           |            |
>            |<->|<----------------->|<--->|
>    ... |---x---|------| ... |------|-----x (now)
> 
> c1, c3, and c4 are the accumulated (meanwhile decayed) contributions
> in timing in steps 1, 3, and 4 respectively.
> 
> With them, the accumulated contribution to load_sum, for example, is:
> 
> contrib = c1 * weight * freq_scaled;
> contrib += c3 * weight * freq_scaled;
> contrib += c4 * weight * freq_scaled;
> 
> Obviously, we can optimize the above and they equate to:
> 
> contrib = c1 + c3 + c4;
> contrib *= weight * freq_scaled;

But that's not at all what's happening;

The equation is something like:

                       1 (p+1)/32            p+1 1 n/32
 load = (load' + c1) * -^          + 1024 * \Sum -^     + c4
                       2                     n=1 2

                                     <---------------->
				            c3


And then its 'obvious' you cannot do c1+c3+c4 anything.


The decay factor of each part (c1,c3,4) is different, so unless you
explain how that works, instead of hand-wave a bit, this isn't making
any sense.