[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <011501cc56a9$0b1ca470$2155ed50$@systemfabricworks.com>
Date: Tue, 9 Aug 2011 10:28:50 -0500
From: "Bob Pearson" <rpearson@...temfabricworks.com>
To: "'George Spelvin'" <linux@...izon.com>,
<akpm@...ux-foundation.org>, <fzago@...temfabricworks.com>,
<joakim.tjernlund@...nsmode.se>, <linux-kernel@...r.kernel.org>
Subject: RE: [patch v3 6/7] crc32: add-slicing-by-8.diff
> -----Original Message-----
> From: George Spelvin [mailto:linux@...izon.com]
> Sent: Tuesday, August 09, 2011 6:21 AM
> To: akpm@...ux-foundation.org; fzago@...temfabricworks.com;
> joakim.tjernlund@...nsmode.se; linux-kernel@...r.kernel.org;
> linux@...izon.com; rpearson@...temfabricworks.com
> Subject: Re: [patch v3 6/7] crc32: add-slicing-by-8.diff
>
> While writing up some documentation of this algorithm, I came up with
> a potential speedup. Or, at least, realized why slicing by more than
> 4 is so much faster than slicing by 4 or less.
>
> Note that the inner loop of the algorithm is as follows:
>
> +# define DO_CRC8a (tab[7][(q) & 255] ^ \
> + tab[6][(q >> 8) & 255] ^ \
> + tab[5][(q >> 16) & 255] ^ \
> + tab[4][(q >> 24) & 255])
> +# define DO_CRC8b (tab[3][(q) & 255] ^ \
> + tab[2][(q >> 8) & 255] ^ \
> + tab[1][(q >> 16) & 255] ^ \
> + tab[0][(q >> 24) & 255])
>
> + for (--b; middle_len; --middle_len) {
> + u32 q;
> + q = crc ^ *++b;
> + crc = DO_CRC8a;
> + q = *++b;
> + crc ^= DO_CRC8b;
> }
>
> Note the data dependencies: DO_CRC8a depends on the
> previous crc, which depends on the previous DO_CRC8b.
> But DO_CRC8b does not depend on anything except the
> input data at *++b.
I think you've got it. That is exactly why it works. The hardware is pretty
good at finding the parallelism since this code runs at almost 2X the speed
of the CRC4 loop.
>
> It would increase parallelism to schedule DO_CRC8b before DO_CRC8a,
> to start those loads before the previous crc value is available.
>
> Maybe the compiler and/pr processor can find this parallelism already,
> but if not, it might be useful to try reordering it:
>
> # define DO_CRC8a(x) (tab[7][(x) & 255] ^ \
> tab[6][((x) >> 8) & 255] ^ \
> tab[5][((x) >> 16) & 255] ^ \
> tab[4][((x) >> 24) & 255])
> # define DO_CRC8b(x) (tab[3][(x) & 255] ^ \
> tab[2][((x) >> 8) & 255] ^ \
> tab[1][((x) >> 16) & 255] ^ \
> tab[0][((x) >> 24) & 255])
>
> for ( ; middle_len; --middle_len, b += 2) {
> u32 q = DO_CRC8b(b[1]);
> crc ^= b[0];
> crc = q ^ DO_CRC8a(crc);
> }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists