[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250123231752.67d40550@pumpkin>
Date: Thu, 23 Jan 2025 23:17:52 +0000
From: David Laight <david.laight.linux@...il.com>
To: Eric Biggers <ebiggers@...nel.org>
Cc: Theodore Ts'o <tytso@....edu>, Linus Torvalds
<torvalds@...ux-foundation.org>, linux-crypto@...r.kernel.org,
linux-kernel@...r.kernel.org, Ard Biesheuvel <ardb@...nel.org>, Chao Yu
<chao@...nel.org>, "Darrick J. Wong" <djwong@...nel.org>, Geert
Uytterhoeven <geert@...ux-m68k.org>, Kent Overstreet
<kent.overstreet@...ux.dev>, "Martin K. Petersen"
<martin.petersen@...cle.com>, Michael Ellerman <mpe@...erman.id.au>,
Vinicius Peixoto <vpeixoto@...amp.dev>, WangYuli
<wangyuli@...sls0nwwnnilyahiblcmlmlcaoki5s.yundunwaf1.com>
Subject: Re: [GIT PULL] CRC updates for 6.14
On Thu, 23 Jan 2025 13:16:03 -0800
Eric Biggers <ebiggers@...nel.org> wrote:
> On Thu, Jan 23, 2025 at 08:58:10PM +0000, David Laight wrote:
...
> > For a small memory footprint it might be worth considering 4 bits at a time.
> > So a 16 word (64 byte) lookup table.
> > Thinks....
> > You can xor a data byte onto the crc 'accumulator' and then do two separate
> > table lookups for each of the high nibbles and xor both onto it before the rotate.
> > That is probably a reasonable compromise.
>
> Yes, you can do less than a byte at a time (currently one of the choices is even
> one *bit* at a time!), but I think byte-at-a-time is small enough already.
I used '1 bit at a time' for a crc64 of a 5MB file.
Actually fast enough during a 'compile' phase (verified by a serial eeprom).
But the paired nibble one is something like:
crc ^= *data++ << 24;
crc ^= table[crc >> 28] ^ table1[(crc >> 24) & 15];
crc = rol(crc, 8);
which isn't going to be significantly slower than the byte one
where the middle line is:
crc ^= table[crc >> 24];
especially for a multi-issue cpu,
and the table drops from 1k to 128 bytes.
That is quite a lot of D-cache misses.
(Since you'll probably get them all twice when the program's working
set is reloaded!)
Actually you need to rol() the table[]s.
Then do:
crc = rol(crc, 8) ^ table[] ...
to reduce the register dependency chain to 5 per byte.
David
Powered by blists - more mailing lists