[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160418175915.GR8697@intel.com>
Date: Tue, 19 Apr 2016 01:59:16 +0800
From: Yuyang Du <yuyang.du@...el.com>
To: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Benjamin Segall <bsegall@...gle.com>,
Paul Turner <pjt@...gle.com>,
Morten Rasmussen <morten.rasmussen@....com>,
Juri Lelli <juri.lelli@....com>, yuyang.du@...el.com
Subject: Re: [PATCH 2/4] sched/fair: Drop out incomplete current period when
sched averages accrue
Hi Dietmar,
On Fri, Apr 15, 2016 at 04:05:23AM +0800, Yuyang Du wrote:
> > It shows periods of 0 load/util (~1.55s) and than massive spikes (~700 for
> > ~300ms). The short runtime and the task period synced to 1024*1024ns
> > allow that we hit consecutive enqueues or dequeues for a long time even
> > the task might drift relative to the pelt window.
>
> But whenever we pass 1ms, we will update. And I am curious, how does the
> current 1us works in this case? Anyway, I will reproduce it myself.
I did some experiments to compare.
First, the starting 0 Dietmar observed is due to rq's util_avg may already reached
full util, so new task's util_avg is initialized as 0. But I never observed
0 for any long time (definitely not as long as 1s). The following experiments
did not include the fourth patch (flat util hierarchy implementation).
Second, the 1ms grainular update loses much precision, and leads to big variance.
See attached figures, which runs 100us out of every 200us. I actually did
many more other combinations, but the results are about the same (disappointing).
The experiments used fixed workload (rt-app), fixed CPU, and fixed frequency.
In addition, about the 2-level period scheme, the idea is to have a finer
period level than 1ms, such as 128us. And to not generate more constants,
we can have a first-level period as 2ms and a second-level period as 128us,
and the half-life is the same 32ms. It is definitely doable, but the implementation
complicates the __accumulated_sum(). Therefore, I simply dropped it.
So, let me get back to the 1us precision, and then move on for the flat hierarchical
util.
Thanks,
Yuyang
Download attachment "Opt_100us_200us.jpg" of type "image/jpeg" (64052 bytes)
Download attachment "Master_100us_200us.jpg" of type "image/jpeg" (46927 bytes)
Powered by blists - more mailing lists