linux-kernel - Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170330121658.6mo7datma4ssw7st@hirez.programming.kicks-ass.net>
Date:   Thu, 30 Mar 2017 14:16:58 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Paul Turner <pjt@...gle.com>
Cc:     Yuyang Du <yuyang.du@...el.com>, Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Benjamin Segall <bsegall@...gle.com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        umgwanakikbuti@...il.com
Subject: Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched_avg()

On Thu, Mar 30, 2017 at 04:21:08AM -0700, Paul Turner wrote:
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2767,7 +2767,7 @@ static const u32 __accumulated_sum_N32[]
> >   * Approximate:
> >   *   val * y^n,    where y^32 ~= 0.5 (~1 scheduling period)
> >   */
> > -static __always_inline u64 decay_load(u64 val, u64 n)
> > +static u64 decay_load(u64 val, u64 n)
> >  {
> >         unsigned int local_n;
> >
> > @@ -2795,32 +2795,113 @@ static __always_inline u64 decay_load(u6
> >         return val;
> >  }
> >
> > -/*
> > - * For updates fully spanning n periods, the contribution to runnable
> > - * average will be: \Sum 1024*y^n
> > - *
> > - * We can compute this reasonably efficiently by combining:
> > - *   y^PERIOD = 1/2 with precomputed \Sum 1024*y^n {for  n <PERIOD}
> > - */
> > -static u32 __compute_runnable_contrib(u64 n)
> > +static u32 __accumulate_sum(u64 periods, u32 period_contrib, u32 remainder)
> 
> - The naming here is really ambiguous:
>     "__accumulate_sum" -> "__accumulate_pelt_segments"?

OK, I did struggle with that a bit too but failed to improve, I'll change it.

> - Passing in "remainder" seems irrelevant to the sum accumulation.  It would be
>   more clear to handle it from the caller.

Well, this way we have all 3 delta parts in one function. I'll try it
and see what it looks like though.

> >
> >  {
> > -       u32 contrib = 0;
> > +       u32 c1, c2, c3 = remainder; /* y^0 == 1 */
> >
> > -       if (likely(n <= LOAD_AVG_PERIOD))
> > -               return runnable_avg_yN_sum[n];
> > -       else if (unlikely(n >= LOAD_AVG_MAX_N))
> > +       if (!periods)
> > +               return remainder - period_contrib;
> 
> This is super confusing.  It only works because remainder already had
> period_contrib aggregated _into_ it.  We're literally computing:
>   remainder + period_contrib - period_contrib

Correct; although I didn't find it too confusing. Could be because I'd
been staring at this code for a few hours though.

> We should just not call this in the !periods case and handle the remainder
> below.

I'll change it see what it looks like.

> > +
> > +       if (unlikely(periods >= LOAD_AVG_MAX_N))
> >                 return LOAD_AVG_MAX;
> >
> > -       return contrib + runnable_avg_yN_sum[n];
> > +       /*
> > +        * c1 = d1 y^(p+1)
> > +        */
> > +       c1 = decay_load((u64)(1024 - period_contrib), periods);
> > +
> > +       periods -= 1;
> > +       /*
> > +        * For updates fully spanning n periods, the contribution to runnable
> > +        * average will be:
> > +        *
> > +        *   c2 = 1024 \Sum y^n
> > +        *
> > +        * We can compute this reasonably efficiently by combining:
> > +        *
> > +        *   y^PERIOD = 1/2 with precomputed 1024 \Sum y^n {for: n < PERIOD}
> > +        */
> > +       if (likely(periods <= LOAD_AVG_PERIOD)) {
> > +               c2 = runnable_avg_yN_sum[periods];
> > +       } else {
> > +               c2 = __accumulated_sum_N32[periods/LOAD_AVG_PERIOD];
> 
> This still wants the comment justifying why we can't overflow.

You mean why periods/LOAD_AVG_PERIOD < ARRAY_SIZE(__accumulated_sum_N32)
? Or something else?

> > +               periods %= LOAD_AVG_PERIOD;
> > +               c2 = decay_load(c2, periods);
> > +               c2 += runnable_avg_yN_sum[periods];
> > +       }
> > +
> > +       return c1 + c2 + c3;
> >  }


> > +       /*
> > +        * Step 2
> > +        */
> > +       delta %= 1024;
> > +       contrib = __accumulate_sum(periods, sa->period_contrib, delta);
> > +       sa->period_contrib = delta;
> > +
> > +       contrib = cap_scale(contrib, scale_freq);
> > +       if (weight) {
> > +               sa->load_sum += weight * contrib;
> 
> Is this correct in the iterated periods > LOAD_AVG_MAX_N case?
> I don't think the decay above is guaranteed to return these to zero.

Ah!

Indeed, so decay_load() needs LOAD_AVG_PERIOD * 63 before it truncates
to 0, because every LOAD_AVG_PERIOD we half the value; loose 1 bit; so
63 of those and we're 0.

But __accumulate_sum() OTOH returns LOAD_AVG_MAX after only
LOAD_AVG_MAX_N, which < LOAD_AVG_PERIOD * 63.

So yes, combined we exceed LOAD_AVG_MAX, which is bad. Let me think what
to do about that.

> > +               if (cfs_rq)
> > +                       cfs_rq->runnable_load_sum += weight * contrib;
> > +       }
> > +       if (running)
> > +               sa->util_sum += contrib * scale_cpu;
> > +
> > +       return periods;
> > +}