linux-kernel - Re: [PATCH 09/13] KVM: SVM: allow NMI window with vNMI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y3Z7sq42Ao/qRn0u@google.com>
Date:   Thu, 17 Nov 2022 18:21:38 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Maxim Levitsky <mlevitsk@...hat.com>
Cc:     kvm@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sandipan Das <sandipan.das@....com>,
        Daniel Sneddon <daniel.sneddon@...ux.intel.com>,
        Jing Liu <jing2.liu@...el.com>,
        Josh Poimboeuf <jpoimboe@...nel.org>,
        Wyes Karny <wyes.karny@....com>,
        Borislav Petkov <bp@...en8.de>,
        Babu Moger <babu.moger@....com>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Jim Mattson <jmattson@...gle.com>, x86@...nel.org
Subject: Re: [PATCH 09/13] KVM: SVM: allow NMI window with vNMI

On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> When the vNMI is enabled, the only case when the KVM will use an NMI
> window is when the vNMI injection is pending.
> 
> In this case on next IRET/RSM/STGI, the injection has to be complete
> and a new NMI can be injected.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@...hat.com>
> ---
>  arch/x86/kvm/svm/svm.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index cfec4c98bb589b..eaa30f8ace518d 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2477,7 +2477,10 @@ static int iret_interception(struct kvm_vcpu *vcpu)
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
>  	++vcpu->stat.nmi_window_exits;
> -	vcpu->arch.hflags |= HF_IRET_MASK;
> +
> +	if (!is_vnmi_enabled(svm))
> +		vcpu->arch.hflags |= HF_IRET_MASK;

Ugh, HF_IRET_MASK is such a terrible name/flag.  Given that it lives with GIF
and NMI, one would naturally think that it means "IRET is intercepted", but it
really means "KVM just intercepted an IRET and is waiting for NMIs to become
unblocked".

And on a related topic, why on earth are GIF, NMI, and IRET tracked in hflags?
They are 100% SVM concepts.  IMO, this code would be much easier to follow if
by making them bools in vcpu_svm with more descriptive names.

> +
>  	if (!sev_es_guest(vcpu->kvm)) {
>  		svm_clr_intercept(svm, INTERCEPT_IRET);
>  		svm->nmi_iret_rip = kvm_rip_read(vcpu);

The vNMI interaction with this logic is confusing, as nmi_iret_rip doesn't need
to be captured for the vNMI case.  SEV-ES actually has unrelated reasons for not
reading RIP vs. not intercepting IRET, they just got bundled together here for
convenience.

This is also an opportunity to clean up the SEV-ES interaction with IRET interception,
which is splattered all over the place and isn't documented anywhere.

E.g. (with an HF_IRET_MASK => awaiting_iret_completion change)

/*
 * For SEV-ES guests, KVM must not rely on IRET to detect NMI unblocking as
 * #VC->IRET in the guest will result in KVM thinking NMIs are unblocked before
 * the guest is ready for a new NMI.  Architecturally, KVM is 100% correct to
 * treat NMIs as unblocked on IRET, but the guest-host ABI for SEV-ES guests is
 * that KVM must wait for an explicit "NMI Complete" from the guest.
 */
static void svm_disable_iret_interception(struct vcpu_svm *svm)
{
	if (!sev_es_guest(svm->vcpu.kvm))
		svm_clr_intercept(svm, INTERCEPT_IRET);
}

static void svm_enable_iret_interception(struct vcpu_svm *svm)
{
	if (!sev_es_guest(svm->vcpu.kvm))
		svm_set_intercept(svm, INTERCEPT_IRET);
}

static int iret_interception(struct kvm_vcpu *vcpu)
{
	struct vcpu_svm *svm = to_svm(vcpu);

	++vcpu->stat.nmi_window_exits;

	/*
	 * No need to wait for the IRET to complete if vNMIs are enabled as
	 * hardware will automatically process the pending NMI when NMIs are
	 * unblocked from the guest's perspective.
	 */
	if (!is_vnmi_enabled(svm)) {
		svm->awaiting_iret_completion = true;

		/*
		 * The guest's RIP is inaccessible for SEV-ES guests, just
		 * assume forward progress was made on the next VM-Exit.
		 */
		if (!sev_es_guest(vcpu->kvm))
			svm->nmi_iret_rip = kvm_rip_read(vcpu);
	}

	svm_disable_iret_interception(svm);

	kvm_make_request(KVM_REQ_EVENT, vcpu);
	return 1;
}

> @@ -3735,9 +3738,6 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> -	if (is_vnmi_enabled(svm))
> -		return;
> -
>  	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
>  		return; /* IRET will cause a vm exit */

As much as I like incremental patches, in this case I'm having a hell of a time
reviewing the code as the vNMI logic ends up being split across four patches.
E.g. in this particular case, the above requires knowing that svm_inject_nmi()
never sets HF_NMI_MASK when vNMI is enabled.

In the next version, any objection to squashing patches 7-10 into a single "Add
non-nested vNMI support" patch?

As for this code, IMO some pre-work to change the flow would help with the vNMI
case.  The GIF=0 logic overrides legacy NMI blocking, and so can be handled first.
And I vote to explicitly set INTERCEPT_IRET in the above case instead of relying
on INTERCEPT_IRET to already be set by svm_inject_nmi().

That would yield this as a final result:

static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
{
	struct vcpu_svm *svm = to_svm(vcpu);

	/*
	 * GIF=0 blocks NMIs irrespective of legacy NMI blocking.  No need to
	 * intercept or single-step IRET if GIF=0, just intercept STGI.
	 */
	if (!gif_set(svm)) {
		if (vgif)
			svm_set_intercept(svm, INTERCEPT_STGI);
		return;
	}

	/*
	 * NMI is blocked, either because an NMI is in service or because KVM
	 * just injected an NMI.  If KVM is waiting for an intercepted IRET to
	 * complete, single-step the IRET to wait for NMIs to become unblocked.
	 * Otherwise, intercept the guest's next IRET.
	 */
	if (svm->awaiting_iret_completion) {
		svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
		svm->nmi_singlestep = true;
		svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
	} else {
		svm_set_intercept(svm, INTERCEPT_IRET);
	}
}

>  
> @@ -3751,9 +3751,14 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>  	 * Something prevents NMI from been injected. Single step over possible
>  	 * problem (IRET or exception injection or interrupt shadow)
>  	 */
> -	svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> -	svm->nmi_singlestep = true;
> -	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> +
> +	if (is_vnmi_enabled(svm)) {
> +		svm_set_intercept(svm, INTERCEPT_IRET);

This will break SEV-ES.  Per commit 4444dfe4050b ("KVM: SVM: Add NMI support for
an SEV-ES guest"), the hypervisor must not rely on IRET interception to detect
NMI unblocking for SEV-ES guests.  As above, I think we should provide helpers to
toggle NMI interception to reduce the probability of breaking SEV-ES.

> +	} else {
> +		svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> +		svm->nmi_singlestep = true;
> +		svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> +	}
>  }
>  
>  static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> -- 
> 2.34.3
>