lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 29 Mar 2017 10:05:00 +0200 From: Vincent Guittot <vincent.guittot@...aro.org> To: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>, linux-kernel <linux-kernel@...r.kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@....com>, Morten Rasmussen <Morten.Rasmussen@....com>, Yuyang Du <yuyang.du@...el.com>, Paul Turner <pjt@...gle.com>, Ben Segall <bsegall@...gle.com>, Vincent Guittot <vincent.guittot@...aro.org> Subject: Re: [PATCH] sched/fair: update scale invariance of PELT On 28 March 2017 at 17:35, Vincent Guittot <vincent.guittot@...aro.org> wrote: > The current implementation of load tracking invariance scales the contribution > with current frequency and uarch performance (only for utilization) of the > CPU. One main result of this formula is that the figures are capped by current > capacity of CPU. Another one is that the load_avg is not invariant because not > scaled with uarch. > > The util_avg of a periodic task that runs r time slots every p time slots > varies in the range : > > U * (1-y^r)/(1-y^p) * y^i < Utilization < U * (1-y^r)/(1-y^p) > > with U is the max util_avg value = SCHED_CAPACITY_SCALE > > At a lower capacity, the range becomes: > > U * C * (1-y^r')/(1-y^p) * y^i' < Utilization < U * C * (1-y^r')/(1-y^p) > > with C reflecting the compute capacity ratio between current capacity and > max capacity. > > so C tries to compensate changes in (1-y^r') but it can't be accurate. > > Instead of scaling the contribution value of PELT algo, we should scale the > running time. The PELT signal aims to track the amount of computation of tasks > and/or rq so it seems more correct to scale the running time to reflect the > effective amount of computation done since the last update. > > In order to be fully invariant, we need to apply the same amount of running > time and idle time whatever the current capacity. Because running at lower > capacity implies that the task will run longer, we have to track the amount of > "stolen" idle time and to apply it when task becomes idle. > > But once we have reached the maximum utilization value (SCHED_CAPACITY_SCALE), > it means that the task is seen as an always-running task whatever the capacity of > the cpu (even at max compute capacity). In this case, we can discard the "stolen" > idle times which becomes meaningless. In order to cope with rounding effect of > PELT algo we take a margin and consider task with utilization greater than 1000 > (vs 1024 max) as an always-running task. > > Then, we can use the same algorithm for both utilization and load and > simplify __update_load_avg now that the load of a task doesn't have to be > capped by CPU uarch. > > The responsivness of PELT is improved when CPU is not running at max > capacity with this new algorithm. I have put below some examples of > duration to reach some typical load values according to the capacity of the > CPU with current implementation and with this patch. > > Util (%) max capacity half capacity(mainline) half capacity(w/ patch) > 972 (95%) 138ms not reachable 276ms > 486 (47.5%) 30ms 138ms 60ms > 256 (25%) 13ms 32ms 26ms > > On my hikey (octo ARM platform) with schedutil governor, the time to reach > max OPP when starting from a null utilization, decreases from 223ms with > current scale invariance down to 121ms with the new algorithm. For this > test, i have enable arch_scale_freq for arm64. > > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org> > --- I just noticed that i forgot to mentioned that this patch was a v2 and the discussion about v1 can be found here: https://lkml.org/lkml/2015/11/24/432 Change since v1: * track stolen idle time to make PELT fully invariant Vincent > include/linux/sched.h | 1 + > kernel/sched/fair.c | 49 ++++++++++++++++++++++++++++++++++--------------- > 2 files changed, 35 insertions(+), 15 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index d67eee8..ca9d00f 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -313,6 +313,7 @@ struct load_weight { > */ > struct sched_avg { > u64 last_update_time; > + u64 stolen_idle_time; > u64 load_sum; > u32 util_sum; > u32 period_contrib; > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 31453d5..d1514cb 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -735,6 +735,7 @@ void init_entity_runnable_average(struct sched_entity *se) > struct sched_avg *sa = &se->avg; > > sa->last_update_time = 0; > + sa->stolen_idle_time = 0; > /* > * sched_avg's period_contrib should be strictly less then 1024, so > * we give it 1023 to make sure it is almost a period (1024us), and > @@ -2852,10 +2853,9 @@ static __always_inline int > __update_load_avg(u64 now, int cpu, struct sched_avg *sa, > unsigned long weight, int running, struct cfs_rq *cfs_rq) > { > - u64 delta, scaled_delta, periods; > + u64 delta, periods; > u32 contrib; > - unsigned int delta_w, scaled_delta_w, decayed = 0; > - unsigned long scale_freq, scale_cpu; > + unsigned int delta_w, decayed = 0; > > delta = now - sa->last_update_time; > /* > @@ -2876,8 +2876,30 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, > return 0; > sa->last_update_time = now; > > - scale_freq = arch_scale_freq_capacity(NULL, cpu); > - scale_cpu = arch_scale_cpu_capacity(NULL, cpu); > + if (running) { > + sa->stolen_idle_time += delta; > + /* > + * scale the elapsed time to reflect the real amount of > + * computation > + */ > + delta = cap_scale(delta, arch_scale_freq_capacity(NULL, cpu)); > + delta = cap_scale(delta, arch_scale_cpu_capacity(NULL, cpu)); > + > + /* > + * Track the amount of stolen idle time due to running at > + * lower capacity > + */ > + sa->stolen_idle_time -= delta; > + } else if (!weight) { > + if (sa->util_sum < (LOAD_AVG_MAX * 1000)) { > + /* > + * Add the idle time stolen by running at lower compute > + * capacity > + */ > + delta += sa->stolen_idle_time; > + } > + sa->stolen_idle_time = 0; > + } > > /* delta_w is the amount already accumulated against our next period */ > delta_w = sa->period_contrib; > @@ -2893,16 +2915,15 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, > * period and accrue it. > */ > delta_w = 1024 - delta_w; > - scaled_delta_w = cap_scale(delta_w, scale_freq); > if (weight) { > - sa->load_sum += weight * scaled_delta_w; > + sa->load_sum += weight * delta_w; > if (cfs_rq) { > cfs_rq->runnable_load_sum += > - weight * scaled_delta_w; > + weight * delta_w; > } > } > if (running) > - sa->util_sum += scaled_delta_w * scale_cpu; > + sa->util_sum += delta_w << SCHED_CAPACITY_SHIFT; > > delta -= delta_w; > > @@ -2919,25 +2940,23 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, > > /* Efficiently calculate \sum (1..n_period) 1024*y^i */ > contrib = __compute_runnable_contrib(periods); > - contrib = cap_scale(contrib, scale_freq); > if (weight) { > sa->load_sum += weight * contrib; > if (cfs_rq) > cfs_rq->runnable_load_sum += weight * contrib; > } > if (running) > - sa->util_sum += contrib * scale_cpu; > + sa->util_sum += contrib << SCHED_CAPACITY_SHIFT; > } > > /* Remainder of delta accrued against u_0` */ > - scaled_delta = cap_scale(delta, scale_freq); > if (weight) { > - sa->load_sum += weight * scaled_delta; > + sa->load_sum += weight * delta; > if (cfs_rq) > - cfs_rq->runnable_load_sum += weight * scaled_delta; > + cfs_rq->runnable_load_sum += weight * delta; > } > if (running) > - sa->util_sum += scaled_delta * scale_cpu; > + sa->util_sum += delta << SCHED_CAPACITY_SHIFT; > > sa->period_contrib += delta; > > -- > 2.7.4 >
Powered by blists - more mailing lists