linux-kernel - Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtB_5qOx=6s9bbb+EDBt0mj_XEwgv5XdYR=S3rXPaerT-w@mail.gmail.com>
Date:	Tue, 8 Sep 2015 16:06:36 +0200
From:	Vincent Guittot <vincent.guittot@...aro.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steve Muckle <steve.muckle@...aro.org>,
	Morten Rasmussen <Morten.Rasmussen@....com>,
	"mingo@...hat.com" <mingo@...hat.com>,
	"daniel.lezcano@...aro.org" <daniel.lezcano@...aro.org>,
	"yuyang.du@...el.com" <yuyang.du@...el.com>,
	"mturquette@...libre.com" <mturquette@...libre.com>,
	"rjw@...ysocki.net" <rjw@...ysocki.net>,
	Juri Lelli <Juri.Lelli@....com>,
	"sgurrappadi@...dia.com" <sgurrappadi@...dia.com>,
	"pang.xunlei@....com.cn" <pang.xunlei@....com.cn>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

On 8 September 2015 at 14:52, Peter Zijlstra <peterz@...radead.org> wrote:
> On Tue, Sep 08, 2015 at 02:26:06PM +0200, Peter Zijlstra wrote:
>> On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
>> > No, but
>> > sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>> > will fix the unit issue.
>>
>> Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>>
>> And where load_sum already gets a factor 1024 from the weight
>> multiplication, util_sum does not get such a factor, and all the scaling
>> we do on it loose bits.
>>
>> So at the moment we go compute the util_avg value, we need to inflate
>> util_sum with an extra factor 1024 in order to make it work.
>>
>> And seeing that we do the shift up on sa->util_sum without consideration
>> of overflow, would it not make sense to add that factor before the
>> scaling and into the addition?
>>
>> Now, given all that, units are a complete mess here, and I'd not mind
>> something like:
>>
>> #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
>> #error "something usefull"
>> #endif
>>
>> somewhere near here.
>
> Something like teh below..
>
> Another thing to ponder; the downside of scaled_delta_w is that its
> fairly likely delta is small and you loose all bits, whereas the weight
> is likely to be large can could loose a fwe bits without issue.
>
> That is, in fixed point scaling like this, you want to start with the
> biggest numbers, not the smallest, otherwise you loose too much.
>
> The flip side is of course that now you can share a multiplcation.
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -682,7 +682,7 @@ void init_entity_runnable_average(struct
>         sa->load_avg = scale_load_down(se->load.weight);
>         sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
>         sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
> -       sa->util_sum = LOAD_AVG_MAX;
> +       sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
>         /* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
>  }
>
> @@ -2515,6 +2515,10 @@ static u32 __compute_runnable_contrib(u6
>         return contrib + runnable_avg_yN_sum[n];
>  }
>
> +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT != 10
> +#error "load tracking assumes 2^10 as unit"
> +#endif

so why don't we set SCHED_CAPACITY_SHIFT to SCHED_LOAD_SHIFT ?

> +
>  #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>
>  /*
> @@ -2599,7 +2603,7 @@ __update_load_avg(u64 now, int cpu, stru
>                         }
>                 }
>                 if (running)
> -                       sa->util_sum += cap_scale(scaled_delta_w, scale_cpu);
> +                       sa->util_sum += scaled_delta_w * scale_cpu;
>
>                 delta -= delta_w;
>
> @@ -2623,7 +2627,7 @@ __update_load_avg(u64 now, int cpu, stru
>                                 cfs_rq->runnable_load_sum += weight * contrib;
>                 }
>                 if (running)
> -                       sa->util_sum += cap_scale(contrib, scale_cpu);
> +                       sa->util_sum += contrib * scale_cpu;
>         }
>
>         /* Remainder of delta accrued against u_0` */
> @@ -2634,7 +2638,7 @@ __update_load_avg(u64 now, int cpu, stru
>                         cfs_rq->runnable_load_sum += weight * scaled_delta;
>         }
>         if (running)
> -               sa->util_sum += cap_scale(scaled_delta, scale_cpu);
> +               sa->util_sum += scaled_delta * scale_cpu;
>
>         sa->period_contrib += delta;
>
> @@ -2644,7 +2648,7 @@ __update_load_avg(u64 now, int cpu, stru
>                         cfs_rq->runnable_load_avg =
>                                 div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
>                 }
> -               sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
> +               sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
>         }
>
>         return decayed;
> @@ -2686,8 +2690,7 @@ static inline int update_cfs_rq_load_avg
>         if (atomic_long_read(&cfs_rq->removed_util_avg)) {
>                 long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
>                 sa->util_avg = max_t(long, sa->util_avg - r, 0);
> -               sa->util_sum = max_t(s32, sa->util_sum -
> -                       ((r * LOAD_AVG_MAX) >> SCHED_LOAD_SHIFT), 0);
> +               sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);

looks good to me

>         }
>
>         decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/