[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2cdb57f2cdbd49e9bb1034d01d054bb7@AcuMS.aculab.com>
Date: Mon, 20 Apr 2020 08:32:10 +0000
From: David Laight <David.Laight@...LAB.COM>
To: "'Jason A. Donenfeld'" <Jason@...c4.com>,
"herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>,
"linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"ebiggers@...gle.com" <ebiggers@...gle.com>,
"ardb@...nel.org" <ardb@...nel.org>
CC: "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: RE: [PATCH crypto-stable] crypto: arch/lib - limit simd usage to
PAGE_SIZE chunks
From: Jason A. Donenfeld
> Sent: 20 April 2020 08:57
>
> The initial Zinc patchset, after some mailing list discussion, contained
> code to ensure that kernel_fpu_enable would not be kept on for more than
> a PAGE_SIZE chunk, since it disables preemption. The choice of PAGE_SIZE
> isn't totally scientific, but it's not a bad guess either, and it's
> what's used in both the x86 poly1305 and blake2s library code already.
> Unfortunately it appears to have been left out of the final patchset
> that actually added the glue code. So, this commit adds back the
> PAGE_SIZE chunking.
>
...
> ---
> Eric, Ard - I'm wondering if this was in fact just an oversight in Ard's
> patches, or if there was actually some later discussion in which we
> concluded that the PAGE_SIZE chunking wasn't required, perhaps because
> of FPU changes. If that's the case, please do let me know, in which case
> I'll submit a _different_ patch that removes the chunking from x86 poly
> and blake. I can't find any emails that would indicate that, but I might
> be mistaken.
Maybe kernel_fp_begin() should be passed the address of somewhere
the address of an fpu save area buffer can be written to.
Then the pre-emption code can allocate the buffer and save the
state into it.
However that doesn't solve the problem for non-preemptive kernels.
The may need a cond_resched() in the loop if it might take 1ms (or so).
kernel_fpu_begin() ought also be passed a parameter saying which
fpu features are required, and return which are allocated.
On x86 this could be used to check for AVX512 (etc) which may be
available in an ISR unless it interrupted inside a kernel_fpu_begin()
section (etc).
It would also allow optimisations if only 1 or 2 fpu registers are
needed (eg for some of the crypto functions) rather than the whole
fpu register set.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists