[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <538F1535.3080500@redhat.com>
Date: Wed, 04 Jun 2014 14:46:45 +0200
From: Daniel Borkmann <dborkman@...hat.com>
To: George Spelvin <linux@...izon.com>
CC: davem@...emloft.net, linux-kernel@...r.kernel.org,
akpm@...ux-foundation.org
Subject: Re: [PATCH 1/3] lib: crc32: Greatly shrink CRC combining code
Sorry for the late answer, this slipped somehow through ...
I think you might want to cc Andrew Morton <akpm@...ux-foundation.org>
to let this go via akpm's tree for misc changes, perhaps?
The work from my side went via Dave's net-next tree as it had some
follow-up SCTP work that depended on these below bits.
No objections from my side if you really want this to go through
<netdev@...r.kernel.org>, though.
Anyway, some more comments inline:
On 05/30/2014 07:35 AM, George Spelvin wrote:
> There's no need for a full 32x32 matrix, when rows before the last are
> just shifted copies of the rows after them.
>
> There's still room for improvement (especially on X86 processors with
> CRC32 and PCLMUL instructions), but this is a large step in the
> right direction.
>
> The internal primitive is now called crc32_generic_shift and takes one
> less argument; the XOR with crc2 is done in inline wrappers.
>
> Signed-off-by: George Spelvin <linux@...izon.com>
> ---
> This is the significant patch; there other two patches are minor
> cleanups that I did while in the area.
>
> Tested extensively in user-space (the "power" variable is identical
> to the last row of the current even/odd matrix in the previous code)
> and with CONFIG_CRC32_SELFTEST:
>
> [ 0.587152] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
> [ 0.587186] crc32: self tests passed, processed 225944 bytes in 115727 nsec
> [ 0.587341] crc32c: CRC_LE_BITS = 64
> [ 0.587373] crc32c: self tests passed, processed 225944 bytes in 59505 nsec
> [ 0.596827] crc32_combine: 8373 self tests passed
> [ 0.606259] crc32c_combine: 8373 self tests passed
>
> Code shrunk; comments increased to match.
> Pulling 2x32x32 bits = 256 bytes off the stack is a nice bonus.
Looks good to me! Do you have any performance numbers to share?
Some very minor nit-picking:
> include/linux/crc32.h | 12 ++++-
> lib/crc32.c | 142 +++++++++++++++++++++++---------------------------
> 2 files changed, 76 insertions(+), 78 deletions(-)
>
> diff --git a/include/linux/crc32.h b/include/linux/crc32.h
> index 7d275c4f..bcb8e9d3 100644
> --- a/include/linux/crc32.h
> +++ b/include/linux/crc32.h
> @@ -29,7 +29,11 @@ extern u32 crc32_be(u32 crc, unsigned char const *p, size_t len);
> * with the same initializer as crc1, and crc2 seed was 0. See
> * also crc32_combine_test().
> */
> -extern u32 crc32_le_combine(u32 crc1, u32 crc2, size_t len2);
> +u32 crc32_le_shift(u32 crc, size_t len) __attribute_const__;
Perhaps a newline here.
> +static inline u32 crc32_le_combine(u32 crc1, u32 crc2, size_t len2)
> +{
> + return crc32_le_shift(crc1, len2) ^ crc2;
> +}
>
> extern u32 __crc32c_le(u32 crc, unsigned char const *p, size_t len);
>
> @@ -51,7 +55,11 @@ extern u32 __crc32c_le(u32 crc, unsigned char const *p, size_t len);
> * seeded with the same initializer as crc1, and crc2 seed
> * was 0. See also crc32c_combine_test().
> */
> -extern u32 __crc32c_le_combine(u32 crc1, u32 crc2, size_t len2);
> +u32 __crc32c_le_shift(u32 crc, size_t len) __attribute_const__;
Ditto. Or, put both *_shift() helper signatures before the crc32_le_combine
kdoc comment.
> +static inline u32 __crc32c_le_combine(u32 crc1, u32 crc2, size_t len2)
> +{
> + return __crc32c_le_shift(crc1, len2) ^ crc2;
> +}
>
> #define crc32(seed, data, length) crc32_le(seed, (unsigned char const *)(data), length)
>
> diff --git a/lib/crc32.c b/lib/crc32.c
> index 70f00ca5..a4a576f3 100644
...
>
> -u32 __pure __crc32c_le_combine(u32 crc1, u32 crc2, size_t len2)
> +/**
> + * crc32_generic_shift - Append len 0 bytes to crc, in logarithmic time
> + * @crc: The original little-endian CRC (i.e. lsbit is x^31 coefficient)
> + * @len: The number of bytes. @crc is multiplied by x^(8*@len)
> + # @polynomial: The modulus used to reduce the result to 32 bits.
^^ seems this should have been a '*'
> + *
> + * It's possible to parallelize CRC computations by computing a CRC
> + * over separate ranges of a buffer, then summing them.
> + * This shifts the given CRC by 8*len bits (i.e. produces the same effect
> + * as appending len bytes of zero to the data), in time proportional
> + * to log(len).
> + */
> +static u32 __attribute_const__ crc32_generic_shift(u32 crc, size_t len,
> + u32 polynomial)
u32 polynomial is not correctly aligned to the opening '(' from the previous line.
> {
> - return crc32_generic_combine(crc1, crc2, len2, CRC32C_POLY_LE);
> + u32 power = polynomial; /* CRC of x^32 */
> + int i;
> +
> + /* Shift up to 32 bits in the simple linear way */
...
The other two patches are fine, imho. Thanks George! :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists