linux-kernel - Re: [PATCH] sched/fair: update scale invariance of pelt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKfTPtCe4L7BTgpONHwNbDRTZAoBRwNxtADE8TLxOh-WcL-MxA@mail.gmail.com>
Date:	Tue, 15 Dec 2015 11:21:08 +0100
From:	Vincent Guittot <vincent.guittot@...aro.org>
To:	Yuyang Du <yuyang.du@...el.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Morten Rasmussen <Morten.Rasmussen@....com>,
	Linaro Kernel Mailman List <linaro-kernel@...ts.linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Paul Turner <pjt@...gle.com>,
	Benjamin Segall <bsegall@...gle.com>
Subject: Re: [PATCH] sched/fair: update scale invariance of pelt

On 14 December 2015 at 01:26, Yuyang Du <yuyang.du@...el.com> wrote:
> Hi Vincent,
>
> I don't quite catch what this is doing, maybe I need more time
> to ramp up to the gory detail difficult like this.
>
> Do you scale or not scale? You seem removed the scaling, but added it
> after "Remainder of delta accrued against u_0"..

I'm scaling the time before taking it in the pelt algorithm. My reply
to Morten's comment tries to explain more deeply what i'm trying to
achieve

Thanks,
Vincent

>
> Thanks,
> Yuyang
>
> On Tue, Nov 24, 2015 at 02:49:30PM +0100, Vincent Guittot wrote:
>> The current implementation of load tracking invariance scales the load
>> tracking value with current frequency and uarch performance (only for
>> utilization) of the CPU.
>>
>> One main result of the current formula is that the figures are capped by
>> the current capacity of the CPU. This limitation is the main reason of not
>> including the uarch invariance (arch_scale_cpu_capacity) in the calculation
>> of load_avg because capping the load can generate erroneous system load
>> statistic as described with this example [1]
>>
>> Instead of scaling the complete value of PELT algo, we should only scale
>> the running time by the current capacity of the CPU. It seems more correct
>> to only scale the running time because the non running time of a task
>> (sleeping or waiting for a runqueue) is the same whatever the current freq
>> and the compute capacity of the CPU.
>>
>> Then, one main advantage of this change is that the load of a task can
>> reach max value whatever the current freq and the uarch of the CPU on which
>> it run. It will just take more time at a lower freq than a max freq or on a
>> "little" CPU compared to a "big" one. The load and the utilization stay
>> invariant across system so we can still compared them between CPU but with
>> a wider range of values.
>>
>> With this change, we don't have to test if a CPU is overloaded or not in
>> order to use one metric (util) or another (load) as all metrics are always
>> valid.
>>
>> I have put below some examples of duration to reach some typical load value
>> according to the capacity of the CPU with current implementation
>> and with this patch.
>>
>> Util (%)     max capacity  half capacity(mainline)  half capacity(w/ patch)
>> 972 (95%)    138ms       not reachable            276ms
>> 486 (47.5%)  30ms        138ms                     60ms
>> 256 (25%)    13ms         32ms                     26ms
>>
>> We can see that at half capacity, we need twice the duration of max
>> capacity with this patch whereas we have a non linear increase of the
>> duration with current implementation.
>>
>> [1] https://lkml.org/lkml/2014/12/18/128
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
>> ---
>>  kernel/sched/fair.c | 28 +++++++++++++---------------
>>  1 file changed, 13 insertions(+), 15 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 824aa9f..f2a18e1 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2560,10 +2560,9 @@ static __always_inline int
>>  __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>                 unsigned long weight, int running, struct cfs_rq *cfs_rq)
>>  {
>> -     u64 delta, scaled_delta, periods;
>> +     u64 delta, periods;
>>       u32 contrib;
>> -     unsigned int delta_w, scaled_delta_w, decayed = 0;
>> -     unsigned long scale_freq, scale_cpu;
>> +     unsigned int delta_w, decayed = 0;
>>
>>       delta = now - sa->last_update_time;
>>       /*
>> @@ -2584,8 +2583,10 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>               return 0;
>>       sa->last_update_time = now;
>>
>> -     scale_freq = arch_scale_freq_capacity(NULL, cpu);
>> -     scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
>> +     if (running) {
>> +             delta = cap_scale(delta, arch_scale_freq_capacity(NULL, cpu));
>> +             delta = cap_scale(delta, arch_scale_cpu_capacity(NULL, cpu));
>> +     }
>>
>>       /* delta_w is the amount already accumulated against our next period */
>>       delta_w = sa->period_contrib;
>> @@ -2601,16 +2602,15 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>                * period and accrue it.
>>                */
>>               delta_w = 1024 - delta_w;
>> -             scaled_delta_w = cap_scale(delta_w, scale_freq);
>>               if (weight) {
>> -                     sa->load_sum += weight * scaled_delta_w;
>> +                     sa->load_sum += weight * delta_w;
>>                       if (cfs_rq) {
>>                               cfs_rq->runnable_load_sum +=
>> -                                             weight * scaled_delta_w;
>> +                                             weight * delta_w;
>>                       }
>>               }
>>               if (running)
>> -                     sa->util_sum += scaled_delta_w * scale_cpu;
>> +                     sa->util_sum += delta_w << SCHED_CAPACITY_SHIFT;
>>
>>               delta -= delta_w;
>>
>> @@ -2627,25 +2627,23 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>
>>               /* Efficiently calculate \sum (1..n_period) 1024*y^i */
>>               contrib = __compute_runnable_contrib(periods);
>> -             contrib = cap_scale(contrib, scale_freq);
>>               if (weight) {
>>                       sa->load_sum += weight * contrib;
>>                       if (cfs_rq)
>>                               cfs_rq->runnable_load_sum += weight * contrib;
>>               }
>>               if (running)
>> -                     sa->util_sum += contrib * scale_cpu;
>> +                     sa->util_sum += contrib << SCHED_CAPACITY_SHIFT;
>>       }
>>
>>       /* Remainder of delta accrued against u_0` */
>> -     scaled_delta = cap_scale(delta, scale_freq);
>>       if (weight) {
>> -             sa->load_sum += weight * scaled_delta;
>> +             sa->load_sum += weight * delta;
>>               if (cfs_rq)
>> -                     cfs_rq->runnable_load_sum += weight * scaled_delta;
>> +                     cfs_rq->runnable_load_sum += weight * delta;
>>       }
>>       if (running)
>> -             sa->util_sum += scaled_delta * scale_cpu;
>> +             sa->util_sum += delta << SCHED_CAPACITY_SHIFT;
>>
>>       sa->period_contrib += delta;
>>
>> --
>> 1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/