linux-kernel - Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170329000442.GC2459@ydu19desktop>
Date:   Wed, 29 Mar 2017 08:04:42 +0800
From:   Yuyang Du <yuyang.du@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...nel.org, linux-kernel@...r.kernel.org, pjt@...gle.com,
        bsegall@...gle.com, morten.rasmussen@....com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        matt@...eblueprint.co.uk, umgwanakikbuti@...il.com
Subject: Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched_avg()

Hi Peter,

On Tue, Mar 28, 2017 at 04:46:25PM +0200, Peter Zijlstra wrote:
> On Mon, Feb 13, 2017 at 05:44:23AM +0800, Yuyang Du wrote:
> > __update_load_avg() has the following steps:
> > 
> >   1. add the remainder of the last incomplete period
> >   2. decay old sum
> >   3. accumulate new sum in full periods since last_update_time
> >   4. accumulate the current incomplete period
> >   5. update averages
> > 
> > However, there is no need to separately compute steps 1, 3, and 4.
> > 
> > Illustation:
> > 
> >              c1          c3           c4
> >              ^           ^            ^
> >              |           |            |
> >            |<->|<----------------->|<--->|
> >    ... |---x---|------| ... |------|-----x (now)
> > 
> > c1, c3, and c4 are the accumulated (meanwhile decayed) contributions
> > in timing in steps 1, 3, and 4 respectively.
> > 
> > With them, the accumulated contribution to load_sum, for example, is:
> > 
> > contrib = c1 * weight * freq_scaled;
> > contrib += c3 * weight * freq_scaled;
> > contrib += c4 * weight * freq_scaled;
> > 
> > Obviously, we can optimize the above and they equate to:
> > 
> > contrib = c1 + c3 + c4;
> > contrib *= weight * freq_scaled;
> > 
> 
> So I figured out what it is you're doing, how's this? I still need to
> rewrite the Changelog to make this cleared,

Yes, you need to, and let me do it too and learn how you will rewrite
it.

Subject: [PATCH] sched/fair: Optimize __update_sched_avg()

In __update_load_avg(), the flow of periods and update times are
illustrated as:

             t1          t3           t4
             ^           ^            ^
             |           |            |
           |<->|<----------------->|<--->|
   ... |---x---|------| ... |------|-----x (now)

in which, t1 is the remainder of the last incomplete period, t2
is the full periods since the last_update_time, and t3 is the
current period.

These times altogether make contributions to the load_sum with
the following 5 steps:

  step 1: t1 is decayed as c1
  step 2: last sum is decayed
  step 3: t3 has accumulated c3
  step 4: t4 is c4
  step 5: average the sum

These contributions are further scaled with weight and frequency:

contrib = c1 * weight * freq_scaled;
contrib += c3 * weight * freq_scaled;
contrib += c4 * weight * freq_scaled;

Obviously, they equate to:

contrib = c1 + c3 + c4;
contrib *= weight * freq_scaled;

By doing so, we save instructions, and clean the codes. With that, c1
must be additionally decayed as opposed to doing it in step 2, but
these two approaches are about the same if you compare decay_load()
with a round of contrib scaling.

Code size comparison (with allyesconfig):

Before: kernel/sched/built-in.o 1213304
After : kernel/sched/built-in.o 1212536 (-768B)

> but I think the code now has
> understandable comments.

Yes, you made it. It's much better now.

Thanks,
Yuyang