lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191127140754.GB3812@zn.tnic>
Date:   Wed, 27 Nov 2019 15:07:54 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     Barret Rhoden <brho@...gle.com>,
        Josh Bleecher Snyder <josharian@...il.com>,
        "Rik van Riel\"" <riel@...riel.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, ian@...s.com
Subject: Re: [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx

On Wed, Nov 27, 2019 at 01:42:43PM +0100, Sebastian Andrzej Siewior wrote:
> The state/owner of FPU is saved fpu_fpregs_owner_ctx by pointing to the
				 ^
				 to

> context that is currently loaded. It never changed during the life time
> of a task and remained stable/constant.
> 
> Since we deferred loading the FPU registers on return to userland, the

Drop those "we"s :)

> content of fpu_fpregs_owner_ctx may change during preemption and must
> not be cached.
> This went unnoticed for some time and was now noticed, in particular
> gcc-9 is able to cache that load in copy_fpstate_to_sigframe() and reuse
> it in the retry loop:
> 
>   copy_fpstate_to_sigframe()
>     load fpu_fpregs_owner_ctx and save on stack
>     fpregs_lock()
>     copy_fpregs_to_sigframe() /* failed */
>     fpregs_unlock()
>          *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx ***
> 
>     fault_in_pages_writeable() /* succeed, retry */
> 
>     fpregs_lock()
> 	__fpregs_load_activate()
> 	  fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */
>     copy_fpregs_to_sigframe() /* succeeds, random FPU content */
> 
> This is a comparison of the assembly of gcc-9, without vs with this
> patch:
> 
> | # arch/x86/kernel/fpu/signal.c:173:      if (!access_ok(buf, size))
> |        cmpq    %rdx, %rax      # tmp183, _4
> |        jb      .L190   #,
> |-# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> |-#APP
> |-# 512 "arch/x86/include/asm/fpu/internal.h" 1
> |-       movq %gs:fpu_fpregs_owner_ctx,%rax      #, pfo_ret__
> |-# 0 "" 2
> |-#NO_APP
> |-       movq    %rax, -88(%rbp) # pfo_ret__, %sfp
> …
> |-# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> |-       movq    -88(%rbp), %rcx # %sfp, pfo_ret__
> |-       cmpq    %rcx, -64(%rbp) # pfo_ret__, %sfp
> |+# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> |+#APP
> |+# 512 "arch/x86/include/asm/fpu/internal.h" 1
> |+       movq %gs:fpu_fpregs_owner_ctx(%rip),%rax        # fpu_fpregs_owner_ctx, pfo_ret__
> |+# 0 "" 2
> |+# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> |+#NO_APP
> |+       cmpq    %rax, -64(%rbp) # pfo_ret__, %sfp
> 
> Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of
> fpu_fpregs_owner_ctx during preemption points.
> 
> Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace")

Or

a352a3b7b792 ("x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD")

maybe, which adds the fpregs_unlock() ?

> ---
> 
> There is no Sign-off by here. Could this please be verified by the
> reporter?

Not the reporter, but I just tested it successfully too:

Tested-by: Borislav Petkov <bp@...e.de>

> Also I would like to add
> 	Debugged-by: Ian Lance Taylor

Yes, pls. CCed.

> 
> but I lack the complete address also I'm not sure if he wants to.
> Also please send a Reported-by line since I'm not sure who started this.
> 
>  arch/x86/include/asm/fpu/internal.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 4c95c365058aa..44c48e34d7994 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -509,7 +509,7 @@ static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
>  
>  static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
>  {
> -	return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> +	return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> }

And to add one more data point from IRC: this is also documented:

/*
 * this_cpu_read() makes gcc load the percpu variable every time it is
 * accessed while this_cpu_read_stable() allows the value to be cached.
							^^^^^^^^^^^^^^^

 * this_cpu_read_stable() is more efficient and can be used if its value
 * is guaranteed to be valid across cpus.  The current users include
 * get_current() and get_thread_info() both of which are actually
 * per-thread variables implemented as per-cpu variables and thus
 * stable for the duration of the respective task.
 */
#define this_cpu_read_stable(var)       percpu_stable_op("mov", var)


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ