lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87r13gyde8.fsf@redhat.com>
Date:   Wed, 22 Jun 2022 16:35:27 +0200
From:   Vitaly Kuznetsov <vkuznets@...hat.com>
To:     Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Anirudh Rayabharam <anrayabh@...ux.microsoft.com>
Cc:     Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>,
        Ilias Stamatis <ilstam@...zon.com>,
        Maxim Levitsky <mlevitsk@...hat.com>, mail@...rudhrb.com,
        kumarpraveen@...ux.microsoft.com, wei.liu@...nel.org,
        robert.bradford@...el.com, liuwe@...rosoft.com,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

Anirudh Rayabharam <anrayabh@...ux.microsoft.com> writes:

> On Wed, Jun 22, 2022 at 10:00:29AM +0200, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@...gle.com> writes:
>> 
>> > On Tue, Jun 14, 2022, Anirudh Rayabharam wrote:
>> >> On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote:
>> 
>> ...
>> 
>> >> > 
>> >> > Any reason not to use the already sanitized vmcs_config?  I can't think of any
>> >> > reason why the nested path should blindly use the raw MSR values from hardware.
>> >> 
>> >> vmcs_config has the sanitized exec controls. But how do we construct MSR
>> >> values using them?
>> >
>> > I was thinking we could use the sanitized controls for the allowed-1 bits, and then
>> > take the required-1 bits from the CPU.  And then if we wanted to avoid the redundant
>> > RDMSRs in a follow-up patch we could add required-1 fields to vmcs_config.
>> >
>> > Hastily constructed and compile-tested only, proceed with caution :-)
>> 
>> Independently from "[PATCH 00/11] KVM: VMX: Support TscScaling and
>> EnclsExitingBitmap whith eVMCS" which is supposed to fix the particular
>> TSC scaling issue, I like the idea to make nested_vmx_setup_ctls_msrs()
>> use both allowed-1 and required-1 bits from vmcs_config. I'll pick up
>> the suggested patch and try to construct something for required-1 bits.
>
> I tried this patch today but it causes some regression which causes
> /dev/kvm to be unavailable in L1. I didn't get a chance to look into it
> closely but I am guessing it has something to do with the fact that
> vmcs_config reflects the config that L0 chose to use rather than what is
> available to use. So constructing allowed-1 MSR bits based on what bits
> are set in exec controls maybe isn't correct.

I've tried to pick it up but it's actually much harder than I think. The
patch has some minor issues ('&vmcs_config.nested' needs to be switched
to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main
problem is that the set of controls nested_vmx_setup_ctls_msrs() needs
is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to
identify at least:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5e14e4c40007..8076352174ad 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2483,8 +2483,14 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
              CPU_BASED_INVLPG_EXITING |
              CPU_BASED_RDPMC_EXITING;
 
-       opt = CPU_BASED_TPR_SHADOW |
+       opt = CPU_BASED_INTR_WINDOW_EXITING |
+             CPU_BASED_NMI_WINDOW_EXITING |
+             CPU_BASED_TPR_SHADOW |
+             CPU_BASED_USE_IO_BITMAPS |
              CPU_BASED_USE_MSR_BITMAPS |
+             CPU_BASED_MONITOR_TRAP_FLAG |
+             CPU_BASED_RDTSC_EXITING |
+             CPU_BASED_PAUSE_EXITING |
              CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
              CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
        if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS,
@@ -2582,6 +2588,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 #endif
        opt = VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
              VM_EXIT_LOAD_IA32_PAT |
+             VM_EXIT_SAVE_IA32_PAT |
              VM_EXIT_LOAD_IA32_EFER |
              VM_EXIT_CLEAR_BNDCFGS |
              VM_EXIT_PT_CONCEAL_PIP |
@@ -2604,7 +2611,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
                _pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;
 
        min = VM_ENTRY_LOAD_DEBUG_CONTROLS;
-       opt = VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
+       opt =
+#ifdef CONFIG_X86_64
+             VM_ENTRY_IA32E_MODE |
+#endif
+             VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
              VM_ENTRY_LOAD_IA32_PAT |
              VM_ENTRY_LOAD_IA32_EFER |
              VM_ENTRY_LOAD_BNDCFGS |

but it is 1) not sufficient because some controls are smartly filtered
out just because we don't want them for L1 -- and this doesn't mean that
L2 doesn't need them and 2) because if we add some 'opt' controls to
setup_vmcs_config() we need to filter them out somewhere else.

I'm starting to think we may just want to store raw VMX MSR values in
vmcs_config first, then sanitize them (eVMCS, vmx preemtoion timer bug,
perf_ctrl bug,..) and then do the adjust_vmx_controls() magic. 

I'm not giving up yet but don't expect something small and backportable
to stable :-) 

-- 
Vitaly

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ