lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MW5PR84MB18427894BEF615BD08F4603EAB259@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM>
Date:   Thu, 13 Oct 2022 21:48:05 +0000
From:   "Elliott, Robert (Servers)" <elliott@....com>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
CC:     "herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "tim.c.chen@...ux.intel.com" <tim.c.chen@...ux.intel.com>,
        "ap420073@...il.com" <ap420073@...il.com>,
        "ardb@...nel.org" <ardb@...nel.org>,
        "linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit



> -----Original Message-----
> From: Jason A. Donenfeld <Jason@...c4.com>
> Sent: Wednesday, October 12, 2022 7:36 PM
> To: Elliott, Robert (Servers) <elliott@....com>
> Cc: herbert@...dor.apana.org.au; davem@...emloft.net;
> tim.c.chen@...ux.intel.com; ap420073@...il.com; ardb@...nel.org; linux-
> crypto@...r.kernel.org; linux-kernel@...r.kernel.org
> Subject: Re: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit
> 
> On Wed, Oct 12, 2022 at 04:59:21PM -0500, Robert Elliott wrote:
> > Use a common macro name (FPU_BYTES) for the limit of the number of bytes
> > processed within kernel_fpu_begin and kernel_fpu_end rather than
> > using SZ_4K (which is a signed value), or a magic value of 4096U.
> 
> Not sure I like this very much. The whole idea is that this is variable
> per algorithm, since not all algorithms have the same performance
> characteristics. 

Good point.

I noticed the powerpc aes, sha1, and sha256 modules include explanations
of how their macros serving a similar purpose were calculated.

arch/powerpc/crypto/aes-spe-glue.c:
* MAX_BYTES defines the number of bytes that are allowed to be processed
 * between preempt_disable() and preempt_enable(). e500 cores can issue two
 * instructions per clock cycle using one 32/64 bit unit (SU1) and one 32
 * bit unit (SU2). One of these can be a memory access that is executed via
 * a single load and store unit (LSU). XTS-AES-256 takes ~780 operations per
 * 16 byte block or 25 cycles per byte. Thus 768 bytes of input data
 * will need an estimated maximum of 20,000 cycles. Headroom for cache misses
 * included. Even with the low end model clocked at 667 MHz this equals to a
 * critical time window of less than 30us. The value has been chosen to
 * process a 512 byte disk block in one or a large 1400 bytes IPsec network
 * packet in two runs.
#define MAX_BYTES 768

and arch/powerpc/crypto/sha1-spe-glue.c:
 * MAX_BYTES defines the number of bytes that are allowed to be processed
 * between preempt_disable() and preempt_enable(). SHA1 takes ~1000
 * operations per 64 bytes. e500 cores can issue two arithmetic instructions
 * per clock cycle using one 32/64 bit unit (SU1) and one 32 bit unit (SU2).
 * Thus 2KB of input data will need an estimated maximum of 18,000 cycles.
 * Headroom for cache misses included. Even with the low end model clocked
 * at 667 MHz this equals to a critical time window of less than 27us.
#define MAX_BYTES 2048

arch/powerpc/crypto/sha256-spe-glue.c:
* MAX_BYTES defines the number of bytes that are allowed to be processed
 * between preempt_disable() and preempt_enable(). SHA256 takes ~2,000
 * operations per 64 bytes. e500 cores can issue two arithmetic instructions
 * per clock cycle using one 32/64 bit unit (SU1) and one 32 bit unit (SU2).
 * Thus 1KB of input data will need an estimated maximum of 18,000 cycles.
 * Headroom for cache misses included. Even with the low end model clocked
 * at 667 MHz this equals to a critical time window of less than 27us.
#define MAX_BYTES 1024

Perhaps we should declare a time goal like "30 us," measure the actual
speed of each algorithm with a tcrypt speed test, and calculate the
nominal value assuming some slow x86 CPU core speed?

That could be further adjusted at run-time based on the supposed
minimum CPU frequency (e.g., as reported in
/sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq).

If values less than 4 KiB are necessary (e.g., like the powerpc
values), that will require changes in all the modules using 
skcipher_walk too.

> So in that sense, it's better to put this close to
> where it's actually used, rather than somewhere at the top of the file.
> When you do that, it makes it seem like "FPU_BYTES" is some universal
> constant, which of course it isn't.
> 
> Instead, declare this as an untyped enum value within the function. 

Many of these modules use the same value for both an _update and
a _finit function (usually pretty close to each other). Is it
important to avoid replication?


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ