lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <650FC457-7E4C-473A-9E5F-EAFC74F6444B@amacapital.net>
Date:   Wed, 12 Sep 2018 08:47:19 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     linux-kernel@...r.kernel.org, x86@...nel.org,
        Andy Lutomirski <luto@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        kvm@...r.kernel.org, "Jason A. Donenfeld" <Jason@...c4.com>,
        Rik van Riel <riel@...riel.com>
Subject: Re: [RFC PATCH 10/10] x86/fpu: defer FPU state load until return to userspace


> On Sep 12, 2018, at 6:33 AM, Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:
> 
> From: Rik van Riel <riel@...riel.com>
> 
> Defer loading of FPU state until return to userspace. This gives
> the kernel the potential to skip loading FPU state for tasks that
> stay in kernel mode, or for tasks that end up with repeated
> invocations of kernel_fpu_begin.
> 
> It also increases the chances that a task's FPU state will remain
> valid in the FPU registers until it is scheduled back in, allowing
> us to skip restoring that task's FPU state altogether.
> 
> 

> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -101,14 +101,14 @@ void __kernel_fpu_begin(void)
> 
>    kernel_fpu_disable();
> 
> -    if (fpu->initialized) {
> +    __cpu_invalidate_fpregs_state();
> +
> +    if (!test_and_set_thread_flag(TIF_LOAD_FPU)) {

Since the already-TIF_LOAD_FPU path is supposed to be fast here, use test_thread_flag() instead. test_and_set operations do unconditional RMW operations and are always full barriers, so they’re slow.

Also, on top of this patch, there should be lots of cleanups available. In particular, all the fpu state accessors could probably be reworked to take TIF_LOAD_FPU into account, which would simplify the callers and maybe even the mess of variables tracking whether the state is in regs.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ