lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 28 May 2014 15:32:59 -0700
From:	Tim Chen <tim.c.chen@...ux.intel.com>
To:	George Spelvin <linux@...izon.com>
Cc:	herbert@...dor.apana.org.au, JBeulich@...e.com,
	linux-kernel@...r.kernel.org, sandyw@...tter.com,
	James Guilford <james.guilford@...el.com>
Subject: Re: [RFC PATCH] crypto: crc32c-pclmul - Use pmovzxdq to shrink
 K_table

On Wed, 2014-05-28 at 10:40 -0400, George Spelvin wrote:
> While following a number of tangents in the code (I was figuring out
> how to edit lib/Kconfig; don't ask), I came across a table of 256 64-bit
> words, all of which had the high half set to zero.
> 
> Since the code depends on both pclmulq and crc32, SSE 4.1 is obviously
> present, so it could use pmovzxdq and save 1K of kernel data.
> 
> The following patch obviously lacks the kludges for old binutils,
> but should convey the general idea.
> 
> Jan: Is support for SLE10's pre-2.18 binutils still required?
> Your PEXTRD fix was only a year ago, so I expect, but I wanted to ask.
> 
> Two other minor additional changes:
> 
> 1. The current code unnecessarily puts the table in the read-write
>    .data section.  Moved to .text.
> 2. I'm also not sure why it's necessary to force such large alignment
>    on K_table.  Comments on reducing it?
> 
> Signed-off-by: George Spelvin <linux@...izon.com>
> 
> 
> diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
> index dbc4339b..9f885ee4 100644
> --- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
> +++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
> @@ -216,15 +216,11 @@ LABEL crc_ %i
>  	## 4) Combine three results:
>  	################################################################
>  
> -	lea	(K_table-16)(%rip), bufp	# first entry is for idx 1
> +	lea	(K_table-8)(%rip), bufp		# first entry is for idx 1
>  	shlq    $3, %rax			# rax *= 8
> -	subq    %rax, tmp			# tmp -= rax*8
> -	shlq    $1, %rax
> -	subq    %rax, tmp			# tmp -= rax*16
> -						# (total tmp -= rax*24)
> -	addq    %rax, bufp
> -
> -	movdqa  (bufp), %xmm0			# 2 consts: K1:K2
> +	pmovzxdq (bufp,%rax), %xmm0		# 2 consts: K1:K2

Changing from the aligned move (movdqa) to unaligned move and zeroing
(pmovzxdq), is going to make things slower.  If the table is aligned
on 8 byte boundary, some of the table can span 2 cache lines, which
can slow things further.

We are trading speed for only 4096 bytes of memory save,
which is likely not a good trade for most systems except for 
those really constrained of memory.  For this kind of non-performance
critical system, it may as well use the generic crc32c algorithm and
compile out this module.

Thanks.

Tim



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ