lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 28 Jul 2011 15:16:19 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	frank zago <fzago@...temfabricworks.com>
Cc:	linux-kernel@...r.kernel.org,
	Bob Pearson <rpearson@...temfabricworks.com>,
	Roland Dreier <roland@...nel.org>
Subject: Re: [PATCH] add slice by 8 algorithm to crc32.c


(I don't think rdreier@...co.com exists any more)

On Wed, 20 Jul 2011 17:19:42 -0500
frank zago <fzago@...temfabricworks.com> wrote:

> Hello,
> 
> Added support for slice by 8 to existing crc32 algorithm. Also
> modified gen_crc32table.c to only produce table entries that are
> actually used. The parameters CRC_LE_BITS and CRC_BE_BITS determine
> the number of bits in the input array that are processed during each
> step. Generally the more bits the faster the algorithm is but the
> more table data required.
> 
> Using an x86_64 Opteron machine running at 2100MHz the following table
> was collected with a pre-warmed cache by computing the crc 1000 times
> on a buffer of 4096 bytes.
> 
> 	BITS	Size	LE Cycles/byte	BE Cycles/byte
> 	----------------------------------------------
> 	1	873	41.65		34.60
> 	2	1097	25.43		29.61
> 	4	1057	13.29		15.28
> 	8	2913	7.13		8.19
> 	32	9684	2.80		2.82
> 	64	18178	1.53		1.53
> 
> 	BITS is the value of CRC_LE_BITS or CRC_BE_BITS. The old
> 	default was 8 which actually selected the 32 bit algorithm. In
> 	this version the value 8 is used to select the standard
> 	8 bit algorithm and two new values: 32 and 64 are introduced
> 	to select the slice by 4 and slice by 8 algorithms respectively.
> 
> 	Where Size is the size of crc32.o's text segment which includes
> 	code and table data when both LE and BE versions are set to BITS.
> 
> The current version of crc32.c by default uses the slice by 4 algorithm
> which requires about 2.8 cycles per byte. The slice by 8 algorithm is
> roughly 2X faster and enables packet processing at over 1GB/sec on a typical
> 2-3GHz system.

Are you sure that the code doesn't add any unaligned accesses?  Those
are very bad on some non-x86 architectures.

> --- lib/crc32.c
> +++ lib/crc32.c

Please prepare patches in `patch -p1' form.  So that should have been

--- a/lib/crc32.c
+++ a/lib/crc32.c

>
> ...
>
> @@ -28,14 +31,17 @@
>  #include <linux/init.h>
>  #include <asm/atomic.h>
>  #include "crc32defs.h"
> -#if CRC_LE_BITS == 8
> -# define tole(x) __constant_cpu_to_le32(x)
> +
> +#include <asm/msr.h>

That breaks the build on all non-x86.  Fortunately the inclusion
appears to be unnecessary.

> +#if CRC_LE_BITS > 8
> +# define tole(x) (__force u32) __constant_cpu_to_le32(x)
>  #else
>  # define tole(x) (x)
>  #endif
>  
> -#if CRC_BE_BITS == 8
> -# define tobe(x) __constant_cpu_to_be32(x)
> +#if CRC_BE_BITS > 8
> +# define tobe(x) (__force u32) __constant_cpu_to_be32(x)
>  #else
>  # define tobe(x) (x)
>  #endif
> @@ -45,54 +51,228 @@ MODULE_AUTHOR("Matt Domsch <Matt_Domsch@
>  MODULE_DESCRIPTION("Ethernet CRC32 calculations");
>  MODULE_LICENSE("GPL");
>  
> -#if CRC_LE_BITS == 8 || CRC_BE_BITS == 8
> +#if CRC_LE_BITS > 8
> +static inline u32 crc32_le_body(u32 crc, u8 const *buf, size_t len)

`inline' is largely ignored by gcc, with gcc-version dependencies. 
__always_inline is more likely to succeed.  But the compiler would
inline this all by itself anyway.

> +{
> +	const u8 *p8;
> +	const u32 *p32;
> +	int init_bytes, end_bytes;
> +	size_t words;
> +	int i;
> +	u32 q;
> +	u8 i0, i1, i2, i3;
> +
> +	crc = (__force u32) __cpu_to_le32(crc);
> +
> +#if CRC_LE_BITS == 64
> +	p8 = buf;
> +	p32 = (u32 *)(((uintptr_t)p8 + 7) & ~7);

Perhaps using ALIGN() would be clearer.

> +
> +	init_bytes = (uintptr_t)p32 - (uintptr_t)p8;

Please take a look at the types of init_bytes and end_bytes. 
ptrdiff_t, size_t and uint would all eb more appropriate than `int'.

> +	if (init_bytes > len)
> +		init_bytes = len;
> +	words = (len - init_bytes) >> 3;
> +	end_bytes = (len - init_bytes) & 7;
> +#else
> +	p8 = buf;
> +	p32 = (u32 *)(((uintptr_t)p8 + 3) & ~3);

ALIGN()?

> +	init_bytes = (uintptr_t)p32 - (uintptr_t)p8;
> +	if (init_bytes > len)
> +		init_bytes = len;

max()?

> +	words = (len - init_bytes) >> 2;
> +	end_bytes = (len - init_bytes) & 3;
> +#endif
> +
> +	for (i = 0; i < init_bytes; i++) {
> +#ifdef __LITTLE_ENDIAN
> +		i0 = *p8++ ^ crc;
> +		crc = t0_le[i0] ^ (crc >> 8);
> +#else
> +		i0 = *p8++ ^ (crc >> 24);
> +		crc = t0_le[i0] ^ (crc << 8);
> +#endif
> +	}

All the ifdeffing in here bursts my brain.

Has this code been carefully tested in LE and BE?

>
> ...
>
> +#if CRC_LE_BITS == 64
> +	p8 = buf;
> +	p32 = (u32 *)(((uintptr_t)p8 + 7) & ~7);
> +
> +	init_bytes = (uintptr_t)p32 - (uintptr_t)p8;
> +	if (init_bytes > len)
> +		init_bytes = len;
> +	words = (len - init_bytes) >> 3;
> +	end_bytes = (len - init_bytes) & 7;

Various dittoes.

> +#else
> +	p8 = buf;
> +	p32 = (u32 *)(((uintptr_t)p8 + 3) & ~3);
> +
> +	init_bytes = (uintptr_t)p32 - (uintptr_t)p8;
> +	if (init_bytes > len)
> +		init_bytes = len;
> +	words = (len - init_bytes) >> 2;
> +	end_bytes = (len - init_bytes) & 3;
> +#endif
> +
> +	for (i = 0; i < init_bytes; i++) {
> +#ifdef __LITTLE_ENDIAN
> +		i0 = *p8++ ^ crc;
> +		crc = t0_be[i0] ^ (crc >> 8);
> +#else
> +		i0 = *p8++ ^ (crc >> 24);
> +		crc = t0_be[i0] ^ (crc << 8);
> +#endif
> +	}
>  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ