[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <byogsjz2vljtzvr7ar4wefm3mrzqxboujz2yugsszgrtkluyks@phifb333vw45>
Date: Wed, 4 Feb 2026 23:30:04 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] KVM: nSVM: Do not track EFER.SVME toggling in guest
mode
On Wed, Feb 04, 2026 at 01:15:45PM -0800, Sean Christopherson wrote:
> The shortlog is *very* misleading. The changelog isn't much better. This isn't
> just removing "tracking", it's redefining guest visible behavior and effectively
> changing the KVM virtual CPU microarchitecture.
>
> On Fri, Jan 30, 2026, Yosry Ahmed wrote:
> > KVM tracks when EFER.SVME is set and cleared to initialize and tear down
> > nested state. However, it doesn't differentiate if EFER.SVME is getting
> > toggled in L1 or L2+. Toggling EFER.SVME in L2+ is inconsequential from
> > KVM's perspective, as the vCPU is still obviously using nested.
> >
> > This causes a problem if L2 sets then clears EFER.SVME without L1
> > interception, as KVM exits guest mode and tears down nested state while
> > L2 is running, executing L1 without injecting a proper #VMEXIT.
> >
> > Technically, it's not a bug as the APM states that an L1 hypervisor
> > should intercept EFER writes:
> >
> > The effect of turning off EFER.SVME while a guest is running is
> > undefined; therefore, the VMM should always prevent guests from
> > writing EFER.
> >
> > However, it would be nice if KVM handled it more gracefully.
>
> That's not sufficient justification for this change. Nothing will ever trip this
> code unless it's _trying_ to trip this code. I.e. Outside of a bespoke selftest
> that is little more than "make sure the kernel doesn't explode", and future
> malicious actors, KVM's behavior is largely irrelevant.
>
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@...ux.dev>
> > ---
> > arch/x86/kvm/svm/svm.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 4575a6a7d6c4e..eaf0f8053fbfb 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -208,6 +208,13 @@ static int svm_set_efer_svme(struct kvm_vcpu *vcpu, u64 old_efer, u64 new_efer)
> > if ((old_efer & EFER_SVME) == (new_efer & EFER_SVME))
> > return 0;
> >
> > + /*
> > + * An L2 guest setting or clearing EFER_SVME does not change whether or
> > + * not the vCPU can use nested from KVM's perspective.
>
> This should call out that the architectural behavior is undefined. "from KVM's
> perspective" is an obtuse way of saying "KVM is making up behavior because it
> can". E.g. something like
>
> /*
> * Architecturally, clearing EFER.SVME while a guest is running yields
> * undefined behavior, i.e. KVM has carte blance to do anything if L1
> * doesn't intercept writes to EFER. Simply do nothing, because XYZ.
> */
>
> > + */
> > + if (is_guest_mode(vcpu))
>
> This is fine, because svm_allocate_nested() plays nice with redundant calls, but
> this is a rather scary change for something that straight up doesn't happen in
> practice. Any hypervisor that doesn't intercept EFER is broken, period.
>
> E.g. if a future change adds novel code that's guarded by the
>
> if ((old_efer & EFER_SVME) == (new_efer & EFER_SVME))
> return 0;
>
> check, then doing nothing here could result in a guest-exploitable bug. In other
> words, from a kernel safety perspective, "doing nothing" is more dangerous than
> forcibly leaving nested mode, because doing nothing deliberately puts KVM in an
> inconsistent state. Given that there's basically zero benefit in practice, I'm
> strongly inclined to keep the call svm_leave_nested().
>
> All that said, I agree that pulling the rug out from under the VM is a terrible
> experience. What if we throw a triple fault at the vCPU so that L1 gets an
> immediate SHUTDOWN (not a VM-Exit, a SHUTDOWN of the L1 vCPU), instead of running
> random garbage from L2?
I am fine with this too, anything is better than pulling the rug. I will
send a v2 and probably drop patch 1 (unless you prefer that we keep it).
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 5f0136dbdde6..ccd73a3be3f9 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -216,6 +216,17 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>
> if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
> if (!(efer & EFER_SVME)) {
> + /*
> + * Architecturally, clearing EFER.SVME while a guest is
> + * running yields undefined behavior, i.e. KVM can do
> + * literally anything. Force the vCPU back into L1 as
> + * that is the safest option for KVM, but synthesize a
> + * triple fault (for L1!) so that KVM at least doesn't
> + * run random L2 code in the context of L1.
> + */
> + if (is_guest_mode(vcpu))
> + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> +
> svm_leave_nested(vcpu);
> /* #GP intercept is still needed for vmware backdoor */
> if (!enable_vmware_backdoor)
>
>
> > + return 0;
> > +
> > if (new_efer & EFER_SVME) {
> > r = svm_allocate_nested(svm);
> > if (r)
> > --
> > 2.53.0.rc1.225.gd81095ad13-goog
> >
Powered by blists - more mailing lists