linux-kernel - Re: [PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALMp9eSv4Oar4CixipUNSUeWiThvVZEtSxc9wTYsKNizQUeMQA@mail.gmail.com>
Date:   Mon, 29 Jan 2018 10:43:22 -0800
From:   Jim Mattson <jmattson@...gle.com>
To:     KarimAllah Ahmed <karahmed@...zon.de>
Cc:     kvm list <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Asit Mallick <asit.k.mallick@...el.com>,
        Arjan Van De Ven <arjan.van.de.ven@...el.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Dan Williams <dan.j.williams@...el.com>,
        Jun Nakajima <jun.nakajima@...el.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        David Woodhouse <dwmw@...zon.co.uk>,
        Greg KH <gregkh@...uxfoundation.org>,
        Andy Lutomirski <luto@...nel.org>,
        Ashok Raj <ashok.raj@...el.com>
Subject: Re: [PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL

On Sun, Jan 28, 2018 at 11:29 AM, KarimAllah Ahmed <karahmed@...zon.de> wrote:
> Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests
> that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a
> retpoline+IBPB based approach.
>
> To avoid the overhead of atomically saving and restoring the MSR_IA32_SPEC_CTRL
> for guests that do not actually use the MSR, only add_atomic_switch_msr when a
> non-zero is written to it.
>
> Cc: Asit Mallick <asit.k.mallick@...el.com>
> Cc: Arjan Van De Ven <arjan.van.de.ven@...el.com>
> Cc: Dave Hansen <dave.hansen@...el.com>
> Cc: Andi Kleen <ak@...ux.intel.com>
> Cc: Andrea Arcangeli <aarcange@...hat.com>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Tim Chen <tim.c.chen@...ux.intel.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Dan Williams <dan.j.williams@...el.com>
> Cc: Jun Nakajima <jun.nakajima@...el.com>
> Cc: Paolo Bonzini <pbonzini@...hat.com>
> Cc: David Woodhouse <dwmw@...zon.co.uk>
> Cc: Greg KH <gregkh@...uxfoundation.org>
> Cc: Andy Lutomirski <luto@...nel.org>
> Signed-off-by: KarimAllah Ahmed <karahmed@...zon.de>
> Signed-off-by: Ashok Raj <ashok.raj@...el.com>
> ---
>  arch/x86/kvm/cpuid.c |  4 +++-
>  arch/x86/kvm/cpuid.h |  1 +
>  arch/x86/kvm/vmx.c   | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 67 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0099e10..dc78095 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void)
>  /* These are scattered features in cpufeatures.h. */
>  #define KVM_CPUID_BIT_AVX512_4VNNIW     2
>  #define KVM_CPUID_BIT_AVX512_4FMAPS     3
> +#define KVM_CPUID_BIT_SPEC_CTRL         26
>  #define KF(x) bit(KVM_CPUID_BIT_##x)
>
>  int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> @@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>
>         /* cpuid 7.0.edx*/
>         const u32 kvm_cpuid_7_0_edx_x86_features =
> -               KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
> +               KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \
> +               (boot_cpu_has(X86_FEATURE_SPEC_CTRL) ? KF(SPEC_CTRL) : 0);

Isn't 'boot_cpu_has()' superflous here? And aren't there two bits to
pass through for existing CPUs (26 and 27)?

>
>         /* all calls to cpuid_count() should be made on the same cpu */
>         get_cpu();
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index cdc70a3..dcfe227 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
>         [CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX},
>         [CPUID_7_ECX]         = {         7, 0, CPUID_ECX},
>         [CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
> +       [CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
>  };
>
>  static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index aa8638a..1b743a0 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -920,6 +920,9 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
>  static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
>                                             u16 error_code);
>  static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
> +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
> +                                                         u32 msr, int type);
> +
>
>  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
>  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
> @@ -2007,6 +2010,28 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
>         m->host[i].value = host_val;
>  }
>
> +/* do not touch guest_val and host_val if the msr is not found */
> +static int read_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
> +                                 u64 *guest_val, u64 *host_val)
> +{
> +       unsigned i;
> +       struct msr_autoload *m = &vmx->msr_autoload;
> +
> +       for (i = 0; i < m->nr; ++i)
> +               if (m->guest[i].index == msr)
> +                       break;
> +
> +       if (i == m->nr)
> +               return 1;
> +
> +       if (guest_val)
> +               *guest_val = m->guest[i].value;
> +       if (host_val)
> +               *host_val = m->host[i].value;
> +
> +       return 0;
> +}
> +
>  static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>  {
>         u64 guest_efer = vmx->vcpu.arch.efer;
> @@ -3203,7 +3228,9 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
>   */
>  static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  {
> +       u64 spec_ctrl = 0;
>         struct shared_msr_entry *msr;
> +       struct vcpu_vmx *vmx = to_vmx(vcpu);
>
>         switch (msr_info->index) {
>  #ifdef CONFIG_X86_64
> @@ -3223,6 +3250,19 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>         case MSR_IA32_TSC:
>                 msr_info->data = guest_read_tsc(vcpu);
>                 break;
> +       case MSR_IA32_SPEC_CTRL:
> +               if (!msr_info->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))

Shouldn't this conjunct be:
!(guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
guest_cpuid_has(vcpu, X86_FEATURE_STIBP))?

> +                       return 1;

What if !boot_cpu_has(X86_FEATURE_SPEC_CTRL) &&
!boot_cpu_has(X86_FEATURE_STIBP)? That should also return 1, I think.

> +
> +               /*
> +                * If the MSR is not in the atomic list yet, then it was never
> +                * written to. So the MSR value will be '0'.
> +                */
> +               read_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, &spec_ctrl, NULL);

Why not just add msr_ia32_spec_ctrl to struct vcpu_vmx, so that you
don't have to search the atomic switch list?

> +
> +               msr_info->data = spec_ctrl;
> +               break;
>         case MSR_IA32_SYSENTER_CS:
>                 msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
>                 break;
> @@ -3289,6 +3329,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>         int ret = 0;
>         u32 msr_index = msr_info->index;
>         u64 data = msr_info->data;
> +       unsigned long *msr_bitmap;
> +
> +       /*
> +        * IBRS is not used (yet) to protect the host. Once it does, this
> +        * variable needs to be a bit smarter.
> +        */
> +       u64 host_spec_ctrl = 0;
>
>         switch (msr_index) {
>         case MSR_EFER:
> @@ -3330,6 +3377,22 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>         case MSR_IA32_TSC:
>                 kvm_write_tsc(vcpu, msr_info);
>                 break;
> +       case MSR_IA32_SPEC_CTRL:
> +               if (!msr_info->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
> +                       return 1;

This looks incomplete. As above, what if
!boot_cpu_has(X86_FEATURE_SPEC_CTRL) &&
!boot_cpu_has(X86_FEATURE_STIBP)?
If the host doesn't support MSR_IA32_SPEC_CTRL, you'll get a VMX-abort
on loading the host MSRs from the VM-exit MSR load list.

Also, what if the value being written is illegal?

/*
* Processors that support IBRS but not STIBP
* (CPUID.(EAX=07H, ECX=0):EDX[27:26] = 01b) will
* ignore attempts to set STIBP instead of causing an
* exception due to setting that reserved bit.
*/
if ((data & ~(u64)(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) ||
    ((data & SPEC_CTRL_IBRS) &&
     !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)))
return 1;

> +
> +               /*
> +                * Now we know that the guest is actually using the MSR, so
> +                * atomically load and save the SPEC_CTRL MSR and pass it
> +                * through to the guest.
> +                */
> +               add_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, msr_info->data,
> +                                     host_spec_ctrl);
> +               msr_bitmap = vmx->vmcs01.msr_bitmap;
> +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW);

I assume you mean MSR_IA32_SPEC_CTRL rather than MSR_FS_BASE.

Also, what if the host and the guest support a different set of bits
in MSR_IA32_SPEC_CTRL, due to a userspace modification of the guest's
CPUID info?

> +
> +               break;
>         case MSR_IA32_CR_PAT:
>                 if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
>                         if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
> --
> 2.7.4
>

Where do you preserve the guest's MSR_IA32_SPEC_CTRL value on VM-exit,
if the guest has been given permission to write the MSR?

You also have to clear the guest's MSR_IA32_SPEC_CTRL on
vmx_vcpu_reset, don't you?