lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 24 Mar 2015 20:22:48 -0500
From:	Scott Wood <scottwood@...escale.com>
To:	LEROY Christophe <christophe.leroy@....fr>
CC:	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Paul Mackerras <paulus@...ba.org>,
	<linuxppc-dev@...ts.ozlabs.org>, <linux-kernel@...r.kernel.org>
Subject: Re: powerpc32: rearrange instructions order in ip_fast_csum()

On Tue, Feb 03, 2015 at 12:39:27PM +0100, LEROY Christophe wrote:
> On PPC_8xx, lwz has a 2 cycles latency, and branching also takes 2 cycles.
> As the size of the header is minimum 5 words, we can unroll the loop for the
> first words to reduce number of branching, and we can re-order the instructions
> to limit loading latency.

Please wrap commit messages at around 70 characters.

> Signed-off-by: Christophe Leroy <christophe.leroy@....fr>
> ---
>  arch/powerpc/lib/checksum_32.S | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
> index 6d67e05..5500704 100644
> --- a/arch/powerpc/lib/checksum_32.S
> +++ b/arch/powerpc/lib/checksum_32.S
> @@ -26,13 +26,17 @@
>  _GLOBAL(ip_fast_csum)
>  	lwz	r0,0(r3)
>  	lwzu	r5,4(r3)
> -	addic.	r4,r4,-2
> +	addic.	r4,r4,-4
>  	addc	r0,r0,r5
>  	mtctr	r4
>  	blelr-
> -1:	lwzu	r4,4(r3)
> -	adde	r0,r0,r4
> +	lwzu	r5,4(r3)
> +	lwzu	r4,4(r3)

The blelr is pointless since len is guaranteed to be >= 5 (assuming that
comment is accurate), but now it's both pointless and in the wrong place,
since you haven't yet finished the four words that you subtracted from
r4.

How about keeping the blelr, without the -, moving it after the initial
words, and changing the number of inital words to 5?  Also maybe do all
the loads up front, since many PPC chips have a three cycle load latency
rather than two.

-Scott
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ