linux-kernel - Re: [GIT PULL] CRC updates for 6.14

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250123181818.GA2117666@google.com>
Date: Thu, 23 Jan 2025 18:18:18 +0000
From: Eric Biggers <ebiggers@...nel.org>
To: Theodore Ts'o <tytso@....edu>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-crypto@...r.kernel.org, linux-kernel@...r.kernel.org,
	Ard Biesheuvel <ardb@...nel.org>, Chao Yu <chao@...nel.org>,
	"Darrick J. Wong" <djwong@...nel.org>,
	Geert Uytterhoeven <geert@...ux-m68k.org>,
	Kent Overstreet <kent.overstreet@...ux.dev>,
	"Martin K. Petersen" <martin.petersen@...cle.com>,
	Michael Ellerman <mpe@...erman.id.au>,
	Vinicius Peixoto <vpeixoto@...amp.dev>,
	WangYuli <wangyuli@...sls0nwwnnilyahiblcmlmlcaoki5s.yundunwaf1.com>
Subject: Re: [GIT PULL] CRC updates for 6.14

On Thu, Jan 23, 2025 at 09:07:44AM -0500, Theodore Ts'o wrote:
> On Wed, Jan 22, 2025 at 11:46:18PM -0800, Eric Biggers wrote:
> > 
> > Actually, I'm tempted to just provide slice-by-1 (a.k.a. byte-by-byte) as the
> > only generic CRC32 implementation.  The generic code has become increasingly
> > irrelevant due to the arch-optimized code existing.  The arch-optimized code
> > tends to be 10 to 100 times faster on long messages.
> 
> Yeah, that's my intuition as well; I would think the CPU's that
> don't have a CRC32 optimization instruction(s) would probably be the
> most sensitive to dcache thrashing.
> 
> But given that Geert ran into this on m68k (I assume), maybe we could
> have him benchmark the various crc32 generic implementation to see if
> we is the best for him?  That is, assuming that he cares (which he
> might not. :-).

FWIW, benchmarking the CRC library functions is easy now; just enable
CONFIG_CRC_KUNIT_TEST=y and CONFIG_CRC_BENCHMARK=y.

But, it's just a traditional benchmark that calls the functions in a loop, and
doesn't account for dcache thrashing.  It's exactly the sort of benchmark I
mentioned doesn't tell the whole story about the drawbacks of using a huge
table.  So focusing only on microbenchmarks of slice-by-n generally leads to a
value n > 1 seeming optimal --- potentially as high as n=16 depending on the
CPU, but really old CPUs like m68k should need much less.  So the rationale of
choosing "slice-by-1" in the kernel would be to consider the reduced dcache use
and code size, and the fact that arch-optimized code is usually used instead
these days anyway, to be more important than microbenchmark results.  (And also
the other CRC variants in the kernel like CRC64, CRC-T10DIF, CRC16, etc. already
just have slice-by-1, so this would make CRC32 consistent with that.)

- Eric