[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MW5PR84MB18427894BEF615BD08F4603EAB259@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM>
Date: Thu, 13 Oct 2022 21:48:05 +0000
From: "Elliott, Robert (Servers)" <elliott@....com>
To: "Jason A. Donenfeld" <Jason@...c4.com>
CC: "herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>,
"davem@...emloft.net" <davem@...emloft.net>,
"tim.c.chen@...ux.intel.com" <tim.c.chen@...ux.intel.com>,
"ap420073@...il.com" <ap420073@...il.com>,
"ardb@...nel.org" <ardb@...nel.org>,
"linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit
> -----Original Message-----
> From: Jason A. Donenfeld <Jason@...c4.com>
> Sent: Wednesday, October 12, 2022 7:36 PM
> To: Elliott, Robert (Servers) <elliott@....com>
> Cc: herbert@...dor.apana.org.au; davem@...emloft.net;
> tim.c.chen@...ux.intel.com; ap420073@...il.com; ardb@...nel.org; linux-
> crypto@...r.kernel.org; linux-kernel@...r.kernel.org
> Subject: Re: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit
>
> On Wed, Oct 12, 2022 at 04:59:21PM -0500, Robert Elliott wrote:
> > Use a common macro name (FPU_BYTES) for the limit of the number of bytes
> > processed within kernel_fpu_begin and kernel_fpu_end rather than
> > using SZ_4K (which is a signed value), or a magic value of 4096U.
>
> Not sure I like this very much. The whole idea is that this is variable
> per algorithm, since not all algorithms have the same performance
> characteristics.
Good point.
I noticed the powerpc aes, sha1, and sha256 modules include explanations
of how their macros serving a similar purpose were calculated.
arch/powerpc/crypto/aes-spe-glue.c:
* MAX_BYTES defines the number of bytes that are allowed to be processed
* between preempt_disable() and preempt_enable(). e500 cores can issue two
* instructions per clock cycle using one 32/64 bit unit (SU1) and one 32
* bit unit (SU2). One of these can be a memory access that is executed via
* a single load and store unit (LSU). XTS-AES-256 takes ~780 operations per
* 16 byte block or 25 cycles per byte. Thus 768 bytes of input data
* will need an estimated maximum of 20,000 cycles. Headroom for cache misses
* included. Even with the low end model clocked at 667 MHz this equals to a
* critical time window of less than 30us. The value has been chosen to
* process a 512 byte disk block in one or a large 1400 bytes IPsec network
* packet in two runs.
#define MAX_BYTES 768
and arch/powerpc/crypto/sha1-spe-glue.c:
* MAX_BYTES defines the number of bytes that are allowed to be processed
* between preempt_disable() and preempt_enable(). SHA1 takes ~1000
* operations per 64 bytes. e500 cores can issue two arithmetic instructions
* per clock cycle using one 32/64 bit unit (SU1) and one 32 bit unit (SU2).
* Thus 2KB of input data will need an estimated maximum of 18,000 cycles.
* Headroom for cache misses included. Even with the low end model clocked
* at 667 MHz this equals to a critical time window of less than 27us.
#define MAX_BYTES 2048
arch/powerpc/crypto/sha256-spe-glue.c:
* MAX_BYTES defines the number of bytes that are allowed to be processed
* between preempt_disable() and preempt_enable(). SHA256 takes ~2,000
* operations per 64 bytes. e500 cores can issue two arithmetic instructions
* per clock cycle using one 32/64 bit unit (SU1) and one 32 bit unit (SU2).
* Thus 1KB of input data will need an estimated maximum of 18,000 cycles.
* Headroom for cache misses included. Even with the low end model clocked
* at 667 MHz this equals to a critical time window of less than 27us.
#define MAX_BYTES 1024
Perhaps we should declare a time goal like "30 us," measure the actual
speed of each algorithm with a tcrypt speed test, and calculate the
nominal value assuming some slow x86 CPU core speed?
That could be further adjusted at run-time based on the supposed
minimum CPU frequency (e.g., as reported in
/sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq).
If values less than 4 KiB are necessary (e.g., like the powerpc
values), that will require changes in all the modules using
skcipher_walk too.
> So in that sense, it's better to put this close to
> where it's actually used, rather than somewhere at the top of the file.
> When you do that, it makes it seem like "FPU_BYTES" is some universal
> constant, which of course it isn't.
>
> Instead, declare this as an untyped enum value within the function.
Many of these modules use the same value for both an _update and
a _finit function (usually pretty close to each other). Is it
important to avoid replication?
Powered by blists - more mailing lists