[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <14eab14d368e68cb9c94c655349f94f44a9a15b4.camel@redhat.com>
Date: Thu, 01 May 2025 16:35:07 -0400
From: mlevitsk@...hat.com
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>, Borislav
Petkov <bp@...en8.de>, Paolo Bonzini <pbonzini@...hat.com>, x86@...nel.org,
Dave Hansen <dave.hansen@...ux.intel.com>, Ingo Molnar <mingo@...hat.com>,
linux-kernel@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH 1/3] x86: KVM: VMX: Wrap GUEST_IA32_DEBUGCTL read/write
with access functions
On Tue, 2025-04-22 at 16:33 -0700, Sean Christopherson wrote:
> On Tue, Apr 15, 2025, Maxim Levitsky wrote:
> > Instead of reading and writing GUEST_IA32_DEBUGCTL vmcs field directly,
> > wrap the logic with get/set functions.
>
> Why? I know why the "set" helper is being added, but it needs to called out.
>
> Please omit the getter entirely, it does nothing more than obfuscate a very
> simple line of code.
In this patch yes. But in the next patch I switch to reading from 'vmx->msr_ia32_debugctl'
You want me to open code this access? I don't mind, if you insist.
>
> > Also move the checks that the guest's supplied value is valid to the new
> > 'set' function.
>
> Please do this in a separate patch. There's no need to mix refactoring and
> functional changes.
I thought that it was natural to do this in a the same patch. In this patch I introduce
a 'vmx_set_guest_debugctl' which should be used any time we set the msr given
the guest value, and VM entry is one of these cases.
I can split this if you want.
>
> > In particular, the above change fixes a minor security issue in which L1
>
> Bug, yes. Not sure it constitutes a meaningful security issue though.
I also think so, but I wanted to mention this just in case.
>
> > hypervisor could set the GUEST_IA32_DEBUGCTL, and eventually the host's
> > MSR_IA32_DEBUGCTL
>
> No, the lack of a consistency check allows the guest to set the MSR in hardware,
> but that is not the host's value.
That's what I meant - the guest can set the real hardware MSR. Yes, after the
guest exits, the OS value is restored. I'll rephrase this in v2.
>
> > to any value by performing a VM entry to L2 with VM_ENTRY_LOAD_DEBUG_CONTROLS
> > set.
>
> Any *legal* value. Setting completely unsupported bits will result in VM-Enter
> failing with a consistency check VM-Exit.
True.
>
> > Signed-off-by: Maxim Levitsky <mlevitsk@...hat.com>
> > ---
> > arch/x86/kvm/vmx/nested.c | 15 +++++++---
> > arch/x86/kvm/vmx/pmu_intel.c | 9 +++---
> > arch/x86/kvm/vmx/vmx.c | 58 +++++++++++++++++++++++-------------
> > arch/x86/kvm/vmx/vmx.h | 3 ++
> > 4 files changed, 57 insertions(+), 28 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index e073e3008b16..b7686569ee09 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -2641,6 +2641,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> > struct vcpu_vmx *vmx = to_vmx(vcpu);
> > struct hv_enlightened_vmcs *evmcs = nested_vmx_evmcs(vmx);
> > bool load_guest_pdptrs_vmcs12 = false;
> > + u64 new_debugctl;
> >
> > if (vmx->nested.dirty_vmcs12 || nested_vmx_is_evmptr12_valid(vmx)) {
> > prepare_vmcs02_rare(vmx, vmcs12);
> > @@ -2653,11 +2654,17 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> > if (vmx->nested.nested_run_pending &&
> > (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
> > kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
> > - vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
> > + new_debugctl = vmcs12->guest_ia32_debugctl;
> > } else {
> > kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
> > - vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl);
> > + new_debugctl = vmx->nested.pre_vmenter_debugctl;
> > }
> > +
> > + if (CC(!vmx_set_guest_debugctl(vcpu, new_debugctl, false))) {
>
> The consistency check belongs in nested_vmx_check_guest_state(), only needs to
> check the VM_ENTRY_LOAD_DEBUG_CONTROLS case, and should be posted as a separate
> patch.
I can move it there. Can you explain why though you want this? Is it because of the
order of checks specified in the PRM?
Currently GUEST_IA32_DEBUGCTL of the host is *written* in prepare_vmcs02.
Should I also move this write to nested_vmx_check_guest_state?
Or should I write the value blindly in prepare_vmcs02 and then check the value
of 'vmx->msr_ia32_debugctl' in nested_vmx_check_guest_state and fail if the value
contains reserved bits?
I don't like that idea that much IMHO.
>
> > + *entry_failure_code = ENTRY_FAIL_DEFAULT;
> > + return -EINVAL;
> > + }
> > +
> > +static void __vmx_set_guest_debugctl(struct kvm_vcpu *vcpu, u64 data)
> > +{
> > + vmcs_write64(GUEST_IA32_DEBUGCTL, data);
> > +}
> > +
> > +bool vmx_set_guest_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
> > +{
> > + u64 invalid = data & ~vmx_get_supported_debugctl(vcpu, host_initiated);
> > +
> > + if (invalid & (DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR)) {
> > + kvm_pr_unimpl_wrmsr(vcpu, MSR_IA32_DEBUGCTLMSR, data);
> > + data &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR);
> > + invalid &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR);
> > + }
> > +
> > + if (invalid)
> > + return false;
> > +
> > + if (is_guest_mode(vcpu) && (get_vmcs12(vcpu)->vm_exit_controls &
> > + VM_EXIT_SAVE_DEBUG_CONTROLS))
> > + get_vmcs12(vcpu)->guest_ia32_debugctl = data;
> > +
> > + if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
> > + (data & DEBUGCTLMSR_LBR))
> > + intel_pmu_create_guest_lbr_event(vcpu);
> > +
> > + __vmx_set_guest_debugctl(vcpu, data);
> > + return true;
>
> Return 0/-errno, not true/false.
There are plenty of functions in this file and KVM that return boolean.
e.g:
static bool nested_vmx_check_eptp(struct kvm_vcpu *vcpu, u64 new_eptp)
static inline bool vmx_control_verify(u32 control, u32 low, u32 high)
static bool nested_evmcs_handle_vmclear(struct kvm_vcpu *vcpu, gpa_t vmptr)
static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
static bool nested_vmx_check_eptp(struct kvm_vcpu *vcpu, u64 new_eptp)
static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
...
I personally think that functions that emulate hardware should return boolean values
or some hardware specific status code (e.g VMX failure code) because the real hardware
never returns -EINVAL and such.
Best regards,
Maxim Levitsky
>
Powered by blists - more mailing lists