[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160408104432.GW3430@twins.programming.kicks-ass.net>
Date: Fri, 8 Apr 2016 12:44:32 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Yuyang Du <yuyang.du@...el.com>
Cc: mingo@...nel.org, linux-kernel@...r.kernel.org, bsegall@...gle.com,
pjt@...gle.com, morten.rasmussen@....com,
vincent.guittot@...aro.org, dietmar.eggemann@....com
Subject: Re: [PATCH] sched/fair: Optimize sum computation with a lookup table
On Fri, Apr 08, 2016 at 10:07:20AM +0800, Yuyang Du wrote:
> __compute_runnable_contrib() uses a loop to compute sum, whereas a
> table loopup can do it faster in a constant time.
> - /* Compute \Sum k^n combining precomputed values for k^i, \Sum k^j */
> - do {
> - contrib /= 2; /* y^LOAD_AVG_PERIOD = 1/2 */
> - contrib += runnable_avg_yN_sum[LOAD_AVG_PERIOD];
> -
> - n -= LOAD_AVG_PERIOD;
> - } while (n > LOAD_AVG_PERIOD);
> -
> + /* Since n < LOAD_AVG_MAX_N, n/LOAD_AVG_PERIOD < 11 */
> + contrib = __accumulated_sum_N32[n/LOAD_AVG_PERIOD];
> + n %= LOAD_AVG_PERIOD;
> contrib = decay_load(contrib, n);
> return contrib + runnable_avg_yN_sum[n];
You replace a simple loop with a DIV instruction and a potential extra
cachemiss.
Is that really faster? What is the median 'n' for which we run that
loop? IOW how many loops do we normally do?
And remember that while recent Intel chips are really good at divisions,
not everybody is (and even then they're still slow).
Powered by blists - more mailing lists