lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 3 Mar 2016 16:12:16 +0000
From:	David Laight <David.Laight@...LAB.COM>
To:	'Tom Herbert' <tom@...bertland.com>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC:	"torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"mingo@...hat.com" <mingo@...hat.com>,
	"hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
	"kernel-team@...com" <kernel-team@...com>
Subject: RE: [PATCH v5 net-next] net: Implement fast csum_partial for x86_64

From: Tom Herbert
> Sent: 02 March 2016 22:19
...
> +	/* Main loop using 64byte blocks */
> +	for (; len > 64; len -= 64, buff += 64) {
> +		asm("addq 0*8(%[src]),%[res]\n\t"
> +		    "adcq 1*8(%[src]),%[res]\n\t"
> +		    "adcq 2*8(%[src]),%[res]\n\t"
> +		    "adcq 3*8(%[src]),%[res]\n\t"
> +		    "adcq 4*8(%[src]),%[res]\n\t"
> +		    "adcq 5*8(%[src]),%[res]\n\t"
> +		    "adcq 6*8(%[src]),%[res]\n\t"
> +		    "adcq 7*8(%[src]),%[res]\n\t"
> +		    "adcq $0,%[res]"
> +		    : [res] "=r" (result)
> +		    : [src] "r" (buff),
> +		    "[res]" (result));

Did you try the asm loop that used 'leax %rcx..., jcxz... jmps..'
without any unrolling?

...
> +	/* Sum over any remaining bytes (< 8 of them) */
> +	if (len & 0x7) {
> +		unsigned long val;
> +		/*
> +		 * Since "len" is > 8 here we backtrack in the buffer to load
> +		 * the outstanding bytes into the low order bytes of a quad and
> +		 * then shift to extract the relevant bytes. By doing this we
> +		 * avoid additional calls to load_unaligned_zeropad.

That comment is wrong. Maybe:
		 * Read the last 8 bytes of the buffer then shift to extract
		 * the required bytes.
		 * This is safe because the original length was > 8 and avoids
		 * any problems reading beyond the end of the valid data.

	David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ