[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aThIQzni6fC1qdgj@google.com>
Date: Tue, 9 Dec 2025 08:03:15 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Jim Mattson <jmattson@...gle.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 10/13] KVM: nSVM: Restrict mapping VMCB12 on nested VMRUN
On Mon, Nov 10, 2025, Yosry Ahmed wrote:
> All accesses to the VMCB12 in the guest memory are limited to
> nested_svm_vmrun(). However, the VMCB12 remains mapped until the end of
> the function execution. Unmapping right after the consistency checks is
> possible, but it becomes easy-ish to introduce bugs where 'vmcb12' is
> used after being unmapped.
>
> Move all accesses to the VMCB12 into a new helper,
> nested_svm_vmrun_read_vmcb12(), that maps the VMCB12,
> caches the needed fields, performs consistency checks, and unmaps it.
> This limits the scope of the VMCB12 mapping appropriately. It also
> slightly simplifies the cleanup path of nested_svm_vmrun().
>
> nested_svm_vmrun_read_vmcb12() returns -1 if the consistency checks
> fail, maintaining the current behavior of skipping the instructions and
> unmapping the VMCB12 (although in the opposite order).
>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@...ux.dev>
> ---
> arch/x86/kvm/svm/nested.c | 59 ++++++++++++++++++++++-----------------
> 1 file changed, 34 insertions(+), 25 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index ddcd545ec1c3c..a48668c36a191 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1023,12 +1023,39 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, bool from_vmrun)
> return 0;
> }
>
> +static int nested_svm_vmrun_read_vmcb12(struct kvm_vcpu *vcpu, u64 vmcb12_gpa)
"read_vmcb12"() sounds like a generic helper to read a specific field. And if
the name is more specific, then I think we can drop the "vmrun" scoping. To
aligh with similar functions in VMX and __nested_copy_vmcb_save_to_cache(), how
about nested_svm_copy_vmcb12_to_cache()?
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> + struct kvm_host_map map;
> + struct vmcb *vmcb12;
> + int ret;
> +
> + ret = kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map);
> + if (ret)
> + return ret;
> +
> + vmcb12 = map.hva;
> +
> + nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
> + nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
> +
> + if (!nested_vmcb_check_save(vcpu) ||
> + !nested_vmcb_check_controls(vcpu)) {
> + vmcb12->control.exit_code = SVM_EXIT_ERR;
> + vmcb12->control.exit_code_hi = 0;
> + vmcb12->control.exit_info_1 = 0;
> + vmcb12->control.exit_info_2 = 0;
> + ret = -1;
I don't love shoving the consistency checks in here. I get why you did it, but
it's very surprising to see (and/or easy to miss) these consistency checks. The
caller also ends up quite wonky:
if (ret == -EINVAL) {
kvm_inject_gp(vcpu, 0);
return 1;
} else if (ret) {
return kvm_skip_emulated_instruction(vcpu);
}
ret = kvm_skip_emulated_instruction(vcpu);
Ha! And it's buggy. __kvm_vcpu_map() can return -EFAULT if creating a host
mapping fails. Eww, and blindly using '-1' as the "failed a consistency check"
is equally cross, as it relies on kvm_vcpu_map() not returning -EPERM in a very
weird way.
Ugh, and there's also this nastiness in nested_vmcb_check_controls():
* Make sure we did not enter guest mode yet, in which case
* kvm_read_cr0() could return L2's CR0.
*/
WARN_ON_ONCE(is_guest_mode(vcpu));
return __nested_vmcb_check_controls(vcpu, ctl, kvm_read_cr0(vcpu));
nested_vmcb_check_save() and nested_vmcb_check_controls() really shouldn't exist.
They just make it harder to see what KVM is checking in the "normal" flow.
Aha! And I'm fairly certain there are at least two pre-existing bugs due to KVM
doing "early" consistency checks in nested_svm_vmrun().
1. KVM doesn't clear GIF on the early #VMEXIT. In classic APM fashion, nothing
_requires_ GIF=0 before VMRUN:
It is assumed that VMM software cleared GIF some time before executing
the VMRUN instruction, to ensure an atomic state switch.
And so VMRUN with GIF=1 that hits an "early" consistency check #VMEXIT would
incorrectly leave GIF=1.
2. svm_leave_smm() is missing consistency checks on the newly loaded guest state,
because the checks aren't performed by enter_svm_guest_mode(). I don't see
anything that would prevent vmcb12 from being modified by the guest bewteen
SMI and RSM.
Moving the consistency checks into enter_svm_guest_mode() would solve all three
(four?) problems. And as a bonus, nested_svm_copy_vmcb12_to_cache() can use
kvm_vcpu_map_readonly().
Compile tested only, but I think we can end up with delta like so:
---
arch/x86/kvm/svm/nested.c | 67 ++++++++++++---------------------------
1 file changed, 20 insertions(+), 47 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 7c86987fdaca..8a0df6c535b5 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -372,9 +372,9 @@ static bool nested_svm_check_event_inj(struct kvm_vcpu *vcpu, u32 event_inj)
return true;
}
-static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
- struct vmcb_ctrl_area_cached *control,
- unsigned long l1_cr0)
+static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
+ struct vmcb_ctrl_area_cached *control,
+ unsigned long l1_cr0)
{
if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
return false;
@@ -408,8 +408,8 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
}
/* Common checks that apply to both L1 and L2 state. */
-static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu,
- struct vmcb_save_area_cached *save)
+static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu,
+ struct vmcb_save_area_cached *save)
{
if (CC(!(save->efer & EFER_SVME)))
return false;
@@ -448,27 +448,6 @@ static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu,
return true;
}
-static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu)
-{
- struct vcpu_svm *svm = to_svm(vcpu);
- struct vmcb_save_area_cached *save = &svm->nested.save;
-
- return __nested_vmcb_check_save(vcpu, save);
-}
-
-static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
-{
- struct vcpu_svm *svm = to_svm(vcpu);
- struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
-
- /*
- * Make sure we did not enter guest mode yet, in which case
- * kvm_read_cr0() could return L2's CR0.
- */
- WARN_ON_ONCE(is_guest_mode(vcpu));
- return __nested_vmcb_check_controls(vcpu, ctl, kvm_read_cr0(vcpu));
-}
-
static
void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
struct vmcb_ctrl_area_cached *to,
@@ -1004,6 +983,12 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, bool from_vmrun)
nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
svm_switch_vmcb(svm, &svm->nested.vmcb02);
+
+ if (!nested_vmcb_check_save(vcpu, &svm->nested.save) ||
+ !nested_vmcb_check_controls(vcpu, &svm->nested.ctl,
+ svm->vmcb01.ptr->save.cr0))
+ return -EINVAL;
+
nested_vmcb02_prepare_control(svm, save->rip, save->cs.base);
nested_vmcb02_prepare_save(svm);
@@ -1025,33 +1010,24 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, bool from_vmrun)
return 0;
}
-static int nested_svm_vmrun_read_vmcb12(struct kvm_vcpu *vcpu, u64 vmcb12_gpa)
+static int nested_svm_copy_vmcb12_to_cache(struct kvm_vcpu *vcpu, u64 vmcb12_gpa)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct kvm_host_map map;
struct vmcb *vmcb12;
- int ret;
+ int r;
- ret = kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map);
- if (ret)
- return ret;
+ r = kvm_vcpu_map_readonly(vcpu, gpa_to_gfn(vmcb12_gpa), &map);
+ if (r)
+ return r;
vmcb12 = map.hva;
nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
- if (!nested_vmcb_check_save(vcpu) ||
- !nested_vmcb_check_controls(vcpu)) {
- vmcb12->control.exit_code = SVM_EXIT_ERR;
- vmcb12->control.exit_code_hi = -1u;
- vmcb12->control.exit_info_1 = 0;
- vmcb12->control.exit_info_2 = 0;
- ret = -1;
- }
-
kvm_vcpu_unmap(vcpu, &map);
- return ret;
+ return 0;
}
int nested_svm_vmrun(struct kvm_vcpu *vcpu)
@@ -1082,12 +1058,9 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
return -EINVAL;
vmcb12_gpa = svm->vmcb->save.rax;
- ret = nested_svm_vmrun_read_vmcb12(vcpu, vmcb12_gpa);
- if (ret == -EINVAL) {
+ if (nested_svm_copy_vmcb12_to_cache(vcpu, vmcb12_gpa)) {
kvm_inject_gp(vcpu, 0);
return 1;
- } else if (ret) {
- return kvm_skip_emulated_instruction(vcpu);
}
ret = kvm_skip_emulated_instruction(vcpu);
@@ -1919,7 +1892,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
ret = -EINVAL;
__nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
/* 'save' contains L1 state saved from before VMRUN */
- if (!__nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0))
+ if (!nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0))
goto out_free;
/*
@@ -1938,7 +1911,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
if (!(save->cr0 & X86_CR0_PG) ||
!(save->cr0 & X86_CR0_PE) ||
(save->rflags & X86_EFLAGS_VM) ||
- !__nested_vmcb_check_save(vcpu, &save_cached))
+ !nested_vmcb_check_save(vcpu, &save_cached))
goto out_free;
base-commit: 01597a665f5dcf8d7cfbedf36f4e6d46d045eb4f
--
Powered by blists - more mailing lists