lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CAMj1kXHHm+L=qE5opDXhjoWZt+1eKXFeGVS=OdvyF0VNFZivCA@mail.gmail.com> Date: Fri, 25 Nov 2022 09:59:17 +0100 From: Ard Biesheuvel <ardb@...nel.org> To: Herbert Xu <herbert@...dor.apana.org.au> Cc: "Jason A. Donenfeld" <Jason@...c4.com>, Robert Elliott <elliott@....com>, davem@...emloft.net, tim.c.chen@...ux.intel.com, ap420073@...il.com, David.Laight@...lab.com, ebiggers@...nel.org, linux-crypto@...r.kernel.org, linux-kernel@...r.kernel.org Subject: Re: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption On Fri, 25 Nov 2022 at 09:41, Herbert Xu <herbert@...dor.apana.org.au> wrote: > > On Wed, Nov 16, 2022 at 12:13:51PM +0100, Jason A. Donenfeld wrote: > > On Tue, Nov 15, 2022 at 10:13:28PM -0600, Robert Elliott wrote: > > > +/* avoid kernel_fpu_begin/end scheduler/rcu stalls */ > > > +static const unsigned int bytes_per_fpu = 337 * 1024; > > > + > > > > Use an enum for constants like this: > > > > enum { BYTES_PER_FPU = ... }; > > > > You can even make it function-local, so it's near the code that uses it, > > which will better justify its existence. > > > > Also, where did you get this number? Seems kind of weird. > > These numbers are highly dependent on hardware and I think having > them hard-coded is wrong. > > Perhaps we should try a different approach. How about just limiting > the size to 4K, and then depending on need_resched we break out of > the loop? Something like: > > if (!len) > return 0; > > kernel_fpu_begin(); > for (;;) { > unsigned int chunk = min(len, 4096); > > sha1_base_do_update(desc, data, chunk, sha1_xform); > > len -= chunk; > data += chunk; > > if (!len) > break; > > if (need_resched()) { > kernel_fpu_end(); > cond_resched(); > kernel_fpu_begin(); > } > } > kernel_fpu_end(); > On arm64, this is implemented in an assembler macro 'cond_yield' so we don't need to preserve/restore the SIMD state state at all if the yield is not going to result in a call to schedule(). For example, the SHA3 code keeps 400 bytes of state in registers, which we don't want to save and reload unless needed. (5f6cb2e617681 'crypto: arm64/sha512-ce - simplify NEON yield') So the asm routines will call cond_yield, and return early if a yield is required, with the number of blocks or bytes left to process as the return value. The C wrapper just calls the asm routine in a loop until the return value becomes 0. That way, we don't need magic values at all, and the yield will occur as soon as the asm inner loop observes the yield condition so the latency should be much lower as well. Note that it is only used in shash implementations, given that they are the only ones that may receive unbounded inputs.
Powered by blists - more mailing lists