linux-kernel - Re: [PATCH 04/11] x86/fpu: eager switch PKRU state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <76caafd5-c85d-61bb-62ec-8056cd6d95ac@linux.intel.com>
Date:   Fri, 12 Oct 2018 10:51:34 -0700
From:   Dave Hansen <dave.hansen@...ux.intel.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        linux-kernel@...r.kernel.org
Cc:     x86@...nel.org, Andy Lutomirski <luto@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        kvm@...r.kernel.org, "Jason A. Donenfeld" <Jason@...c4.com>,
        Rik van Riel <riel@...riel.com>
Subject: Re: [PATCH 04/11] x86/fpu: eager switch PKRU state

On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> From: Rik van Riel <riel@...riel.com>
> 
> While most of a task's FPU state is only needed in user space,
> the protection keys need to be in place immediately after a
> context switch.
> 
> The reason is that any accesses to userspace memory while running
> in kernel mode also need to abide by the memory permissions
> specified in the protection keys.
> 
> The "eager switch" is a preparation for loading the FPU state on return
> to userland. Instead of decoupling PKRU state from xstate I update PKRU
> within xstate on write operations by the kernel.
> 
> Signed-off-by: Rik van Riel <riel@...riel.com>
> [bigeasy: save pkru to xstate, no cache]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> ---
>  arch/x86/include/asm/fpu/internal.h | 20 +++++++++++++++----
>  arch/x86/include/asm/fpu/xstate.h   |  2 ++
>  arch/x86/include/asm/pgtable.h      |  6 +-----
>  arch/x86/include/asm/pkeys.h        |  2 +-
>  arch/x86/kernel/fpu/core.c          |  2 +-
>  arch/x86/mm/pkeys.c                 | 31 ++++++++++++++++++++++-------
>  include/linux/pkeys.h               |  2 +-
>  7 files changed, 46 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 16c4077ffc945..956d967ca824a 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -570,11 +570,23 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
>   */
>  static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
>  {
> -	bool preload = static_cpu_has(X86_FEATURE_FPU) &&
> -		       new_fpu->initialized;
> +	bool load_fpu;
>  
> -	if (preload)
> -		__fpregs_load_activate(new_fpu, cpu);
> +	load_fpu = static_cpu_has(X86_FEATURE_FPU) && new_fpu->initialized;
> +	if (!load_fpu)
> +		return;

Needs comments, please.  Especially around what an uninitialized new_fpu
means.

> +	__fpregs_load_activate(new_fpu, cpu);
> +
> +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> +	if (static_cpu_has(X86_FEATURE_OSPKE)) {

FWIW, you should be able to use cpu_feature_enabled() instead of an
explicit #ifdef here.

> +		struct pkru_state *pk;
> +
> +		pk = __raw_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
> +		if (pk->pkru != __read_pkru())
> +			__write_pkru(pk->pkru);
> +	}
> +#endif
>  }

Comments here as well, please.

I think the goal is to keep the PKRU state in the 'init state' when
possible and also to save the cost of WRPKRU.  But, it would be really
nice to be explicit.

> -static inline void write_pkru(u32 pkru)
> -{
> -	if (boot_cpu_has(X86_FEATURE_OSPKE))
> -		__write_pkru(pkru);
> -}
> +void write_pkru(u32 pkru);

One reason I inlined this was because it enables the the PK code to be
optimized away entirely.  Putting the checks behind a function call
makes this optimization impossible.

Could you elaborate on why you chose to do this and what you think the
impact is or is not?

> diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
> index 19b137f1b3beb..b184f916319e5 100644
> --- a/arch/x86/include/asm/pkeys.h
> +++ b/arch/x86/include/asm/pkeys.h
> @@ -119,7 +119,7 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
>  		unsigned long init_val);
>  extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
>  		unsigned long init_val);
> -extern void copy_init_pkru_to_fpregs(void);
> +extern void pkru_set_init_value(void);

Could you elaborate on why the name is being changed?

> +void write_pkru(u32 pkru)
> +{
> +	struct pkru_state *pk;
> +
> +	if (!boot_cpu_has(X86_FEATURE_OSPKE))
> +		return;
> +
> +	pk = __raw_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> +	/*
> +	 * Update the PKRU value in cstate and then in the CPU. A context

"cstate"?  Did you mean xstate?

> +	 * switch between those two operation would load the new value from the
> +	 * updated xstate and then we would write (the same value) to the CPU.
> +	 */
> +	pk->pkru = pkru;
> +	__write_pkru(pkru);
> +
> +}

There's an unnecessary line there.

This also needs a lot more high-level context about why it is necessary.
 I think you had that in the changelog, but we also need the function
commented.


> -void copy_init_pkru_to_fpregs(void)
> +void pkru_set_init_value(void)
>  {
>  	u32 init_pkru_value_snapshot = READ_ONCE(init_pkru_value);
> +
>  	/*
>  	 * Any write to PKRU takes it out of the XSAVE 'init
>  	 * state' which increases context switch cost.  Avoid
> -	 * writing 0 when PKRU was already 0.
> +	 * writing then same value which is already written.
>  	 */

s/then/the/

> -	if (!init_pkru_value_snapshot && !read_pkru())
> +	if (init_pkru_value_snapshot == read_pkru())
>  		return;
> -	/*
> -	 * Override the PKRU state that came from 'init_fpstate'
> -	 * with the baseline from the process.
> -	 */
> +
>  	write_pkru(init_pkru_value_snapshot);
>  }

Isn't this doing some of the same work (including rdpkru()) as write_pkru()?