[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190128184959.GI20487@zn.tnic>
Date: Mon, 28 Jan 2019 19:49:59 +0100
From: Borislav Petkov <bp@...en8.de>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org,
Andy Lutomirski <luto@...nel.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Radim Krčmář <rkrcmar@...hat.com>,
kvm@...r.kernel.org, "Jason A. Donenfeld" <Jason@...c4.com>,
Rik van Riel <riel@...riel.com>,
Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH 11/22] x86/fpu: Make get_xsave_field_ptr() and
get_xsave_addr() use feature number instead of mask
On Wed, Jan 09, 2019 at 12:47:33PM +0100, Sebastian Andrzej Siewior wrote:
> After changing the argument of __raw_xsave_addr() from a mask to number
> Dave suggested to check if it makes sense to do the same for
> get_xsave_addr(). As it turns out it does. Only get_xsave_addr() needs
> the mask to check if the requested feature is part of what is
> support/saved and then uses the number again. The shift operation is
> cheaper compared to "find last bit set". Also, the feature number uses
> less opcode space compared to the mask :)
>
> Make get_xsave_addr() argument a xfeature number instead of mask and fix
> up its callers.
> As part of this use xfeature_nr and xfeature_mask consistently.
Good!
> This results in changes to the kvm code as:
> feature -> xfeature_mask
> index -> xfeature_nr
>
> Suggested-by: Dave Hansen <dave.hansen@...ux.intel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> ---
> arch/x86/include/asm/fpu/xstate.h | 4 ++--
> arch/x86/kernel/fpu/xstate.c | 23 +++++++++++------------
> arch/x86/kernel/traps.c | 2 +-
> arch/x86/kvm/x86.c | 28 ++++++++++++++--------------
> arch/x86/mm/mpx.c | 6 +++---
> 5 files changed, 31 insertions(+), 32 deletions(-)
>
> diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
> index 48581988d78c7..fbe41f808e5d8 100644
> --- a/arch/x86/include/asm/fpu/xstate.h
> +++ b/arch/x86/include/asm/fpu/xstate.h
> @@ -46,8 +46,8 @@ extern void __init update_regset_xstate_info(unsigned int size,
> u64 xstate_mask);
>
> void fpu__xstate_clear_all_cpu_caps(void);
> -void *get_xsave_addr(struct xregs_state *xsave, int xstate);
> -const void *get_xsave_field_ptr(int xstate_field);
> +void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
> +const void *get_xsave_field_ptr(int xfeature_nr);
> int using_compacted_format(void);
> int copy_xstate_to_kernel(void *kbuf, struct xregs_state *xsave, unsigned int offset, unsigned int size);
> int copy_xstate_to_user(void __user *ubuf, struct xregs_state *xsave, unsigned int offset, unsigned int size);
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 0e759a032c1c7..d288e4d271b71 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -830,15 +830,15 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
> *
> * Inputs:
> * xstate: the thread's storage area for all FPU data
> - * xstate_feature: state which is defined in xsave.h (e.g.
> - * XFEATURE_MASK_FP, XFEATURE_MASK_SSE, etc...)
> + * xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP,
> + * XFEATURE_SSE, etc...)
> * Output:
> * address of the state in the xsave area, or NULL if the
> * field is not present in the xsave buffer.
> */
> -void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
> +void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
> {
> - int xfeature_nr;
> + u64 xfeature_mask = 1ULL << xfeature_nr;
You can paste directly BIT_ULL(xfeature_nr) where you need it in this
function...
> /*
> * Do we even *have* xsave state?
> */
> @@ -850,11 +850,11 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
> * have not enabled. Remember that pcntxt_mask is
> * what we write to the XCR0 register.
> */
> - WARN_ONCE(!(xfeatures_mask & xstate_feature),
> + WARN_ONCE(!(xfeatures_mask & xfeature_mask),
... and turn this into:
WARN_ONCE(!(xfeatures_mask & BIT_ULL(xfeature_nr))
which is more readable than the AND of two variables which I had to
re-focus my eyes to see the difference. :)
Oh and this way, gcc generates better code by doing simply a BT
directly:
# arch/x86/kernel/fpu/xstate.c:852: WARN_ONCE(!(xfeatures_mask & BIT_ULL(xfeature_nr)),
.loc 1 852 2 view .LVU258
movq xfeatures_mask(%rip), %rax # xfeatures_mask, tmp124
btq %rsi, %rax # xfeature_nr, tmp124
without first computing the shift into xfeature_mask:
# arch/x86/kernel/fpu/xstate.c:841: u64 xfeature_mask = 1ULL << xfeature_nr;
.loc 1 841 6 view .LVU249
movl %esi, %ecx # xfeature_nr, tmp120
movl $1, %ebp #, tmp105
salq %cl, %rbp # tmp120, xfeature_mask
and then testing it:
# arch/x86/kernel/fpu/xstate.c:853: WARN_ONCE(!(xfeatures_mask & xfeature_mask),
.loc 1 853 2 view .LVU256
testq %rbp, xfeatures_mask(%rip) # xfeature_mask, xfeatures_mask
movq %rdi, %rbx # xsave, xsave
Otherwise a nice cleanup!
Thx.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
Powered by blists - more mailing lists