lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170329000442.GC2459@ydu19desktop>
Date:   Wed, 29 Mar 2017 08:04:42 +0800
From:   Yuyang Du <yuyang.du@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...nel.org, linux-kernel@...r.kernel.org, pjt@...gle.com,
        bsegall@...gle.com, morten.rasmussen@....com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        matt@...eblueprint.co.uk, umgwanakikbuti@...il.com
Subject: Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched_avg()

Hi Peter,

On Tue, Mar 28, 2017 at 04:46:25PM +0200, Peter Zijlstra wrote:
> On Mon, Feb 13, 2017 at 05:44:23AM +0800, Yuyang Du wrote:
> > __update_load_avg() has the following steps:
> > 
> >   1. add the remainder of the last incomplete period
> >   2. decay old sum
> >   3. accumulate new sum in full periods since last_update_time
> >   4. accumulate the current incomplete period
> >   5. update averages
> > 
> > However, there is no need to separately compute steps 1, 3, and 4.
> > 
> > Illustation:
> > 
> >              c1          c3           c4
> >              ^           ^            ^
> >              |           |            |
> >            |<->|<----------------->|<--->|
> >    ... |---x---|------| ... |------|-----x (now)
> > 
> > c1, c3, and c4 are the accumulated (meanwhile decayed) contributions
> > in timing in steps 1, 3, and 4 respectively.
> > 
> > With them, the accumulated contribution to load_sum, for example, is:
> > 
> > contrib = c1 * weight * freq_scaled;
> > contrib += c3 * weight * freq_scaled;
> > contrib += c4 * weight * freq_scaled;
> > 
> > Obviously, we can optimize the above and they equate to:
> > 
> > contrib = c1 + c3 + c4;
> > contrib *= weight * freq_scaled;
> > 
> 
> So I figured out what it is you're doing, how's this? I still need to
> rewrite the Changelog to make this cleared,

Yes, you need to, and let me do it too and learn how you will rewrite
it.

Subject: [PATCH] sched/fair: Optimize __update_sched_avg()

In __update_load_avg(), the flow of periods and update times are
illustrated as:

             t1          t3           t4
             ^           ^            ^
             |           |            |
           |<->|<----------------->|<--->|
   ... |---x---|------| ... |------|-----x (now)

in which, t1 is the remainder of the last incomplete period, t2
is the full periods since the last_update_time, and t3 is the
current period.

These times altogether make contributions to the load_sum with
the following 5 steps:

  step 1: t1 is decayed as c1
  step 2: last sum is decayed
  step 3: t3 has accumulated c3
  step 4: t4 is c4
  step 5: average the sum

These contributions are further scaled with weight and frequency:

contrib = c1 * weight * freq_scaled;
contrib += c3 * weight * freq_scaled;
contrib += c4 * weight * freq_scaled;

Obviously, they equate to:

contrib = c1 + c3 + c4;
contrib *= weight * freq_scaled;

By doing so, we save instructions, and clean the codes. With that, c1
must be additionally decayed as opposed to doing it in step 2, but
these two approaches are about the same if you compare decay_load()
with a round of contrib scaling.

Code size comparison (with allyesconfig):

Before: kernel/sched/built-in.o 1213304
After : kernel/sched/built-in.o 1212536 (-768B)

> but I think the code now has
> understandable comments.

Yes, you made it. It's much better now.

Thanks,
Yuyang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ