linux-kernel - Re: [PATCH] sched/fair: Optimize sum computation with a lookup table

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 8 Apr 2016 12:44:32 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Yuyang Du <yuyang.du@...el.com>
Cc:	mingo@...nel.org, linux-kernel@...r.kernel.org, bsegall@...gle.com,
	pjt@...gle.com, morten.rasmussen@....com,
	vincent.guittot@...aro.org, dietmar.eggemann@....com
Subject: Re: [PATCH] sched/fair: Optimize sum computation with a lookup table

On Fri, Apr 08, 2016 at 10:07:20AM +0800, Yuyang Du wrote:
> __compute_runnable_contrib() uses a loop to compute sum, whereas a
> table loopup can do it faster in a constant time.

> -	/* Compute \Sum k^n combining precomputed values for k^i, \Sum k^j */
> -	do {
> -		contrib /= 2; /* y^LOAD_AVG_PERIOD = 1/2 */
> -		contrib += runnable_avg_yN_sum[LOAD_AVG_PERIOD];
> -
> -		n -= LOAD_AVG_PERIOD;
> -	} while (n > LOAD_AVG_PERIOD);
> -
> +	/* Since n < LOAD_AVG_MAX_N, n/LOAD_AVG_PERIOD < 11 */
> +	contrib = __accumulated_sum_N32[n/LOAD_AVG_PERIOD];
> +	n %= LOAD_AVG_PERIOD;
>  	contrib = decay_load(contrib, n);
>  	return contrib + runnable_avg_yN_sum[n];

You replace a simple loop with a DIV instruction and a potential extra
cachemiss.

Is that really faster? What is the median 'n' for which we run that
loop? IOW how many loops do we normally do?

And remember that while recent Intel chips are really good at divisions,
not everybody is (and even then they're still slow).