lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMzpN2hMFioZR0ERS8B2cMy4DrNObQYcaO=q=B3HycA1qfvpDQ@mail.gmail.com>
Date:   Fri, 15 Jun 2018 14:33:18 -0400
From:   Brian Gerst <brgerst@...il.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     "Jason A. Donenfeld" <Jason@...c4.com>,
        LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        Andy Lutomirski <luto@...capital.net>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: Lazy FPU restoration / moving kernel_fpu_end() to context switch

On Fri, Jun 15, 2018 at 12:25 PM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Fri, 15 Jun 2018, Jason A. Donenfeld wrote:
>> In a loop this looks like:
>>
>> for (thing) {
>>   kernel_fpu_begin();
>>   encrypt(thing);
>>   kernel_fpu_end();
>> }
>>
>> This is obviously very bad, because begin() and end() are slow, so
>> WireGuard does the obvious:
>>
>> kernel_fpu_begin();
>> for (thing)
>>   encrypt(thing);
>> kernel_fpu_end();
>>
>> This is fine and well, and the crypto API I'm working on will enable
>
> It might be fine crypto performance wise, but it's a total nightmare
> latency wise because kernel_fpu_begin() disables preemption. We've seen
> latencies in the larger millisecond range due to processing large data sets
> with kernel FPU.
>
> If you want to go there then we really need a better approach which allows
> kernel FPU usage in preemptible context and in case of preemption a way to
> stash the preempted FPU context and restore it when the task gets scheduled
> in again. Just using the existing FPU stuff and moving the loops inside the
> begin/end section and keeping preemption disabled for arbitrary time spans
> is not going to fly.

One optimization that can be done is to delay restoring the user FPU
state until we exit to userspace.  That way the FPU is saved and
restored only once no matter how many times
kernel_fpu_begin()/kernel_fpu_end() are called.

--
Brian Gerst

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ