lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 8 Apr 2016 12:44:32 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Yuyang Du <yuyang.du@...el.com>
Cc:	mingo@...nel.org, linux-kernel@...r.kernel.org, bsegall@...gle.com,
	pjt@...gle.com, morten.rasmussen@....com,
	vincent.guittot@...aro.org, dietmar.eggemann@....com
Subject: Re: [PATCH] sched/fair: Optimize sum computation with a lookup table

On Fri, Apr 08, 2016 at 10:07:20AM +0800, Yuyang Du wrote:
> __compute_runnable_contrib() uses a loop to compute sum, whereas a
> table loopup can do it faster in a constant time.

> -	/* Compute \Sum k^n combining precomputed values for k^i, \Sum k^j */
> -	do {
> -		contrib /= 2; /* y^LOAD_AVG_PERIOD = 1/2 */
> -		contrib += runnable_avg_yN_sum[LOAD_AVG_PERIOD];
> -
> -		n -= LOAD_AVG_PERIOD;
> -	} while (n > LOAD_AVG_PERIOD);
> -
> +	/* Since n < LOAD_AVG_MAX_N, n/LOAD_AVG_PERIOD < 11 */
> +	contrib = __accumulated_sum_N32[n/LOAD_AVG_PERIOD];
> +	n %= LOAD_AVG_PERIOD;
>  	contrib = decay_load(contrib, n);
>  	return contrib + runnable_avg_yN_sum[n];

You replace a simple loop with a DIV instruction and a potential extra
cachemiss.

Is that really faster? What is the median 'n' for which we run that
loop? IOW how many loops do we normally do?

And remember that while recent Intel chips are really good at divisions,
not everybody is (and even then they're still slow).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ