linux-kernel - Re: [PATCH 2/4] sched/fair: Drop out incomplete current period when sched averages accrue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Tue, 19 Apr 2016 01:59:16 +0800
From:	Yuyang Du <yuyang.du@...el.com>
To:	Dietmar Eggemann <dietmar.eggemann@....com>
Cc:	Vincent Guittot <vincent.guittot@...aro.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Benjamin Segall <bsegall@...gle.com>,
	Paul Turner <pjt@...gle.com>,
	Morten Rasmussen <morten.rasmussen@....com>,
	Juri Lelli <juri.lelli@....com>, yuyang.du@...el.com
Subject: Re: [PATCH 2/4] sched/fair: Drop out incomplete current period when
 sched averages accrue

Hi Dietmar,

On Fri, Apr 15, 2016 at 04:05:23AM +0800, Yuyang Du wrote:
> > It shows periods of 0 load/util (~1.55s) and than massive spikes (~700 for
> > ~300ms). The short runtime and the task period synced to 1024*1024ns
> > allow that we hit consecutive enqueues or dequeues for a long time even
> > the task might drift relative to the pelt window.
> 
> But whenever we pass 1ms, we will update. And I am curious, how does the
> current 1us works in this case? Anyway, I will reproduce it myself.

I did some experiments to compare.

First, the starting 0 Dietmar observed is due to rq's util_avg may already reached
full util, so new task's util_avg is initialized as 0. But I never observed
0 for any long time (definitely not as long as 1s). The following experiments
did not include the fourth patch (flat util hierarchy implementation).

Second, the 1ms grainular update loses much precision, and leads to big variance.
See attached figures, which runs 100us out of every 200us. I actually did
many more other combinations, but the results are about the same (disappointing).
The experiments used fixed workload (rt-app), fixed CPU, and fixed frequency.

In addition, about the 2-level period scheme, the idea is to have a finer
period level than 1ms, such as 128us. And to not generate more constants,
we can have a first-level period as 2ms and a second-level period as 128us,
and the half-life is the same 32ms. It is definitely doable, but the implementation
complicates the __accumulated_sum(). Therefore, I simply dropped it.

So, let me get back to the 1us precision, and then move on for the flat hierarchical
util.

Thanks,
Yuyang

Download attachment "Opt_100us_200us.jpg" of type "image/jpeg" (64052 bytes)

Download attachment "Master_100us_200us.jpg" of type "image/jpeg" (46927 bytes)