lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250123231752.67d40550@pumpkin>
Date: Thu, 23 Jan 2025 23:17:52 +0000
From: David Laight <david.laight.linux@...il.com>
To: Eric Biggers <ebiggers@...nel.org>
Cc: Theodore Ts'o <tytso@....edu>, Linus Torvalds
 <torvalds@...ux-foundation.org>, linux-crypto@...r.kernel.org,
 linux-kernel@...r.kernel.org, Ard Biesheuvel <ardb@...nel.org>, Chao Yu
 <chao@...nel.org>, "Darrick J. Wong" <djwong@...nel.org>, Geert
 Uytterhoeven <geert@...ux-m68k.org>, Kent Overstreet
 <kent.overstreet@...ux.dev>, "Martin K. Petersen"
 <martin.petersen@...cle.com>, Michael Ellerman <mpe@...erman.id.au>,
 Vinicius Peixoto <vpeixoto@...amp.dev>, WangYuli
 <wangyuli@...sls0nwwnnilyahiblcmlmlcaoki5s.yundunwaf1.com>
Subject: Re: [GIT PULL] CRC updates for 6.14

On Thu, 23 Jan 2025 13:16:03 -0800
Eric Biggers <ebiggers@...nel.org> wrote:

> On Thu, Jan 23, 2025 at 08:58:10PM +0000, David Laight wrote:
...
> > For a small memory footprint it might be worth considering 4 bits at a time.
> > So a 16 word (64 byte) lookup table.
> > Thinks....
> > You can xor a data byte onto the crc 'accumulator' and then do two separate
> > table lookups for each of the high nibbles and xor both onto it before the rotate.
> > That is probably a reasonable compromise.  
> 
> Yes, you can do less than a byte at a time (currently one of the choices is even
> one *bit* at a time!), but I think byte-at-a-time is small enough already.

I used '1 bit at a time' for a crc64 of a 5MB file.
Actually fast enough during a 'compile' phase (verified by a serial eeprom).

But the paired nibble one is something like:
	crc ^= *data++ << 24;
	crc ^= table[crc >> 28] ^ table1[(crc >> 24) & 15];
	crc = rol(crc, 8);
which isn't going to be significantly slower than the byte one
where the middle line is:	
	crc ^= table[crc >> 24];
especially for a multi-issue cpu,
and the table drops from 1k to 128 bytes.
That is quite a lot of D-cache misses.
(Since you'll probably get them all twice when the program's working
set is reloaded!)

Actually you need to rol() the table[]s.
Then do:
	crc = rol(crc, 8) ^ table[] ...
to reduce the register dependency chain to 5 per byte.

	David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ