lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <byogsjz2vljtzvr7ar4wefm3mrzqxboujz2yugsszgrtkluyks@phifb333vw45>
Date: Wed, 4 Feb 2026 23:30:04 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] KVM: nSVM: Do not track EFER.SVME toggling in guest
 mode

On Wed, Feb 04, 2026 at 01:15:45PM -0800, Sean Christopherson wrote:
> The shortlog is *very* misleading.  The changelog isn't much better.  This isn't
> just removing "tracking", it's redefining guest visible behavior and effectively
> changing the KVM virtual CPU microarchitecture.
> 
> On Fri, Jan 30, 2026, Yosry Ahmed wrote:
> > KVM tracks when EFER.SVME is set and cleared to initialize and tear down
> > nested state. However, it doesn't differentiate if EFER.SVME is getting
> > toggled in L1 or L2+. Toggling EFER.SVME in L2+ is inconsequential from
> > KVM's perspective, as the vCPU is still obviously using nested.
> > 
> > This causes a problem if L2 sets then clears EFER.SVME without L1
> > interception, as KVM exits guest mode and tears down nested state while
> > L2 is running, executing L1 without injecting a proper #VMEXIT.
> > 
> > Technically, it's not a bug as the APM states that an L1 hypervisor
> > should intercept EFER writes:
> > 
> > 	The effect of turning off EFER.SVME while a guest is running is
> > 	undefined; therefore, the VMM should always prevent guests from
> > 	writing EFER.
> > 
> > However, it would be nice if KVM handled it more gracefully.
> 
> That's not sufficient justification for this change.  Nothing will ever trip this
> code unless it's _trying_ to trip this code.  I.e.  Outside of a bespoke selftest
> that is little more than "make sure the kernel doesn't explode", and future
> malicious actors, KVM's behavior is largely irrelevant.
> 
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@...ux.dev>
> > ---
> >  arch/x86/kvm/svm/svm.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 4575a6a7d6c4e..eaf0f8053fbfb 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -208,6 +208,13 @@ static int svm_set_efer_svme(struct kvm_vcpu *vcpu, u64 old_efer, u64 new_efer)
> >  	if ((old_efer & EFER_SVME) == (new_efer & EFER_SVME))
> >  		return 0;
> >  
> > +	/*
> > +	 * An L2 guest setting or clearing EFER_SVME does not change whether or
> > +	 * not the vCPU can use nested from KVM's perspective.
> 
> This should call out that the architectural behavior is undefined.  "from KVM's
> perspective" is an obtuse way of saying "KVM is making up behavior because it
> can".  E.g. something like
> 
> 	/*
> 	 * Architecturally, clearing EFER.SVME while a guest is running yields
> 	 * undefined behavior, i.e. KVM has carte blance to do anything if L1
> 	 * doesn't intercept writes to EFER.  Simply do nothing, because XYZ.
> 	 */
> 
> > +	 */
> > +	if (is_guest_mode(vcpu))
> 
> This is fine, because svm_allocate_nested() plays nice with redundant calls, but
> this is a rather scary change for something that straight up doesn't happen in
> practice.  Any hypervisor that doesn't intercept EFER is broken, period.
> 
> E.g. if a future change adds novel code that's guarded by the
> 
> 	if ((old_efer & EFER_SVME) == (new_efer & EFER_SVME)) 
> 		return 0;
> 
> check, then doing nothing here could result in a guest-exploitable bug.  In other
> words, from a kernel safety perspective, "doing nothing" is more dangerous than
> forcibly leaving nested mode, because doing nothing deliberately puts KVM in an
> inconsistent state.  Given that there's basically zero benefit in practice, I'm
> strongly inclined to keep the call svm_leave_nested().
> 
> All that said, I agree that pulling the rug out from under the VM is a terrible
> experience.  What if we throw a triple fault at the vCPU so that L1 gets an
> immediate SHUTDOWN (not a VM-Exit, a SHUTDOWN of the L1 vCPU), instead of running
> random garbage from L2?

I am fine with this too, anything is better than pulling the rug. I will
send a v2 and probably drop patch 1 (unless you prefer that we keep it).

> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 5f0136dbdde6..ccd73a3be3f9 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -216,6 +216,17 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>  
>         if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
>                 if (!(efer & EFER_SVME)) {
> +                       /*
> +                        * Architecturally, clearing EFER.SVME while a guest is
> +                        * running yields undefined behavior, i.e. KVM can do
> +                        * literally anything.  Force the vCPU back into L1 as
> +                        * that is the safest option for KVM, but synthesize a
> +                        * triple fault (for L1!) so that KVM at least doesn't
> +                        * run random L2 code in the context of L1.
> +                        */
> +                       if (is_guest_mode(vcpu))
> +                               kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> +
>                         svm_leave_nested(vcpu);
>                         /* #GP intercept is still needed for vmware backdoor */
>                         if (!enable_vmware_backdoor)
> 
> 
> > +		return 0;
> > +
> >  	if (new_efer & EFER_SVME) {
> >  		r = svm_allocate_nested(svm);
> >  		if (r)
> > -- 
> > 2.53.0.rc1.225.gd81095ad13-goog
> > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ