[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <913e23f9-d039-4de1-a0d3-d1067dcda8ac@hogyros.de>
Date: Tue, 5 Aug 2025 13:49:31 +0900
From: Simon Richter <Simon.Richter@...yros.de>
To: Eric Biggers <ebiggers@...nel.org>,
Christophe Leroy <christophe.leroy@...roup.eu>
Cc: linux-crypto@...r.kernel.org, linux-kernel@...r.kernel.org,
Ard Biesheuvel <ardb@...nel.org>, "Jason A . Donenfeld" <Jason@...c4.com>,
linux-mips@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
sparclinux@...r.kernel.org
Subject: Crypto use cases (was: Remove PowerPC optimized MD5 code)
Hi,
On 8/5/25 07:59, Eric Biggers wrote:
>> md5sum uses the kernel's MD5 code:
> What? That's crazy. Userspace MD5 code would be faster and more
> reliable. No need to make syscalls, transfer data to and from the
> kernel, have an external dependency, etc. Is this the coreutils md5sum?
> We need to get this reported and fixed.
The userspace API allows zero-copy transfers from userspace, and AFAIK
also directly operating on files without ever transferring the data to
userspace (so we save one copy).
Userspace requests are also where the asynchronous hardware offload
units get to chomp on large blocks of data while the CPU is doing
something else:
$ time dd if=test.bin of=/dev/zero bs=1G # warm up caches
real 0m1.541s
user 0m0.000s
sys 0m0.732s
$ time gzip -9 <test.bin >test.bin.gz # compress with the CPU
real 2m57.789s
user 2m55.986s
sys 0m1.508s
$ time ./gzfht_test test.bin # compress with NEST unit
real 0m3.207s
user 0m0.584s
sys 0m2.487s
$ time gzip -d <test.bin.nx.gz >test.bin.nx # decompress with CPU
real 1m0.103s
user 0m57.990s
sys 0m1.878s
$ time ./gunz_test test.bin.gz # decompress with NEST unit
real 0m2.722s
user 0m0.200s
sys 0m1.872s
That's why I'm objecting to measuring the general usefulness of hardware
crypto units by the standards of fscrypt, which has an artificial
limitation of never submitting blocks larger than 4kB: there are other
use cases that don't have that limitation, and where the overhead is
negligible because it is incurred only once for a few gigabytes of data.
That's why I suggested changing from a priority field to "speed" and
"overhead" fields, and calculate priority for each application as
(size/speed+overhead) -- smallest number wins, size is what the
application expects to use as the typical request size (which for
fscrypt and IPsec is on the small side, so it would always select the
CPU unless there was a low-overhead offload engine available)
This probably needs some adjustment to allow selecting a low-power
implementation (e.g. on mobile, I'd want to use offloading for fscrypt
even if it is slower), and model request batching which reduces the
overhead in a busy system, but it should be a good start.
Simon
Powered by blists - more mailing lists