linux-kernel - Re: [PATCH v3] X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANRm+Cx_j3n7O=KnuW9t3XW9RaUhhdboCE6TFdSwmPomi6asVw@mail.gmail.com>
Date:   Wed, 11 Apr 2018 09:24:20 +0800
From:   Wanpeng Li <kernellwp@...il.com>
To:     KarimAllah Ahmed <karahmed@...zon.de>
Cc:     kvm <kvm@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        "the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: [PATCH v3] X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted

2018-04-10 20:15 GMT+08:00 KarimAllah Ahmed <karahmed@...zon.de>:
> The VMX-preemption timer is used by KVM as a way to set deadlines for the
> guest (i.e. timer emulation). That was safe till very recently when
> capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was
> introduced. According to Intel SDM 25.5.1:
>
> """
> The VMX-preemption timer operates in the C-states C0, C1, and C2; it also
> operates in the shutdown and wait-for-SIPI states. If the timer counts down
> to zero in any state other than the wait-for SIPI state, the logical
> processor transitions to the C0 C-state and causes a VM exit; the timer
> does not cause a VM exit if it counts down to zero in the wait-for-SIPI
> state. The timer is not decremented in C-states deeper than C2.
> """

Thanks for the patch. In addition, does it also mean we should prevent
host from entering deeper C-states than C2 even if w/o disable
intercept stuffs?

Regards,
Wanpeng Li

>
> Now once the guest issues the MWAIT with a c-state deeper than
> C2 the preemption timer will never wake it up again since it stopped
> ticking! Usually this is compensated by other activities in the system that
> would wake the core from the deep C-state (and cause a VMExit). For
> example, if the host itself is ticking or it received interrupts, etc!
>
> So disable the VMX-preemption timer if MWAIT is exposed to the guest!
>
> Cc: Paolo Bonzini <pbonzini@...hat.com>
> Cc: Radim Krčmář <rkrcmar@...hat.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: H. Peter Anvin <hpa@...or.com>
> Cc: x86@...nel.org
> Cc: kvm@...r.kernel.org
> Cc: linux-kernel@...r.kernel.org
> Signed-off-by: KarimAllah Ahmed <karahmed@...zon.de>
> ---
> v2 -> v3:
> - return -EOPNOTSUPP before any other operation in vmx_set_hv_timer
>
> v1 -> v2:
> - Drop everything .. just return -EOPNOTSUPP (pbonzini@) :D
> ---
>  arch/x86/kvm/vmx.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d2e54e7..31a4204 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -11903,10 +11903,16 @@ static inline int u64_shl_div_u64(u64 a, unsigned int shift,
>
>  static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
>  {
> -       struct vcpu_vmx *vmx = to_vmx(vcpu);
> -       u64 tscl = rdtsc();
> -       u64 guest_tscl = kvm_read_l1_tsc(vcpu, tscl);
> -       u64 delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl;
> +       struct vcpu_vmx *vmx;
> +       u64 tscl, guest_tscl, delta_tsc;
> +
> +       if (kvm_pause_in_guest(vcpu->kvm))
> +               return -EOPNOTSUPP;
> +
> +       vmx = to_vmx(vcpu);
> +       tscl = rdtsc();
> +       guest_tscl = kvm_read_l1_tsc(vcpu, tscl);
> +       delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl;
>
>         /* Convert to host delta tsc if tsc scaling is enabled */
>         if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
> --
> 2.7.4
>