linux-kernel - Re: [PATCH v2 5/8] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6vvgGFngGjQHwps@google.com>
Date: Tue, 11 Feb 2025 16:46:56 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Chao Gao <chao.gao@...el.com>
Cc: Binbin Wu <binbin.wu@...ux.intel.com>, Yan Zhao <yan.y.zhao@...el.com>, pbonzini@...hat.com, 
	kvm@...r.kernel.org, rick.p.edgecombe@...el.com, kai.huang@...el.com, 
	adrian.hunter@...el.com, reinette.chatre@...el.com, xiaoyao.li@...el.com, 
	tony.lindgren@...el.com, isaku.yamahata@...el.com, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 5/8] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>

On Tue, Feb 11, 2025, Chao Gao wrote:
> On Tue, Feb 11, 2025 at 04:11:19PM +0800, Binbin Wu wrote:
> >
> >
> >On 2/11/2025 2:54 PM, Yan Zhao wrote:
> >> On Tue, Feb 11, 2025 at 10:54:39AM +0800, Binbin Wu wrote:
> >> > +static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu *vcpu)
> >> > +{
> >> > +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> >> > +
> >> > +	if (vcpu->run->hypercall.ret) {
> >> > +		tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
> >> > +		tdx->vp_enter_args.r11 = tdx->map_gpa_next;
> >> > +		return 1;
> >> > +	}
> >> > +
> >> > +	tdx->map_gpa_next += TDX_MAP_GPA_MAX_LEN;
> >> > +	if (tdx->map_gpa_next >= tdx->map_gpa_end)
> >> > +		return 1;
> >> > +
> >> > +	/*
> >> > +	 * Stop processing the remaining part if there is pending interrupt.
> >> > +	 * Skip checking pending virtual interrupt (reflected by
> >> > +	 * TDX_VCPU_STATE_DETAILS_INTR_PENDING bit) to save a seamcall because
> >> > +	 * if guest disabled interrupt, it's OK not returning back to guest
> >> > +	 * due to non-NMI interrupt. Also it's rare to TDVMCALL_MAP_GPA
> >> > +	 * immediately after STI or MOV/POP SS.
> >> > +	 */
> >> > +	if (pi_has_pending_interrupt(vcpu) ||
> >> > +	    kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending) {
> >> Should here also use "kvm_vcpu_has_events()" to replace
> >> "pi_has_pending_interrupt(vcpu) ||
> >>   kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending" as Sean
> >> suggested at [1]?
> >> 
> >> [1] https://lore.kernel.org/all/Z4rIGv4E7Jdmhl8P@google.com
> >
> >For TDX guests, kvm_vcpu_has_events() will check pending virtual interrupt
> >via a SEAM call.  As noted in the comments, the check for pending virtual
> >interrupt is intentionally skipped to save the SEAM call. Additionally,

Drat, I had a whole response typed up and then discovered the implementation of
tdx_protected_apic_has_interrupt() had changed.  But I think the basic gist
still holds.

The new version:

 bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
 {
-       return pi_has_pending_interrupt(vcpu);
+       u64 vcpu_state_details;
+
+       if (pi_has_pending_interrupt(vcpu))
+               return true;
+
+       vcpu_state_details =
+               td_state_non_arch_read64(to_tdx(vcpu), TD_VCPU_STATE_DETAILS_NON_ARCH);
+
+       return tdx_vcpu_state_details_intr_pending(vcpu_state_details);
 }

is much better than the old:

 bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
 {
-       return pi_has_pending_interrupt(vcpu);
+       bool ret = pi_has_pending_interrupt(vcpu);
+       union tdx_vcpu_state_details details;
+       struct vcpu_tdx *tdx = to_tdx(vcpu);
+
+       if (ret || vcpu->arch.mp_state != KVM_MP_STATE_HALTED)
+               return true;
+
+       if (tdx->interrupt_disabled_hlt)
+               return false;
+
+       details.full = td_state_non_arch_read64(tdx, TD_VCPU_STATE_DETAILS_NON_ARCH);
+       return !!details.vmxip;
 }

because assuming the vCPU has an interrupt if it's not HALTED is all kinds of
wrong.

However, checking VMXIP for the !HLT case is also wrong.  And undesirable, as
evidenced by both this path and the EPT violation retry path wanted to avoid
checking VMXIP.

Except for the guest being stupid (non-HLT TDCALL in an interrupt shadow), having
an interrupt in RVI that is fully unmasked will be extremely rare.  Actually,
outside of an interrupt shadow, I don't think it's even possible.  I can't think
of any CPU flows that modify RVI in the middle of instruction execution.  I.e. if
RVI is non-zero, then either the interrupt has been pending since before the
TDVMCALL, or the TDVMCALL is in an STI/SS shadow.  And if the interrupt was
pending before TDVMCALL, then it _must_ be blocked, otherwise the interrupt
would have been serviced at the instruction boundary.

I am completely comfortable saying that KVM doesn't care about STI/SS shadows
outside of the HALTED case, and so unless I'm missing something, I think it makes
sense for tdx_protected_apic_has_interrupt() to not check RVI outside of the HALTED
case, because it's impossible to know if the interrupt is actually unmasked, and
statistically it's far, far more likely that it _is_ masked.

> >unnecessarily returning back to guest will has performance impact.
> >
> >But according to the discussion thread above, it seems that Sean prioritized
> >code readability (i.e. reuse the common helper to make TDX code less special)
> >over performance considerations?
> 
> To mitigate the performance impact, we can cache the "pending interrupt" status
> on the first read, similar to how guest RSP/RBP are cached to avoid VMREADs for
> normal VMs. This optimization can be done in a separate patch or series.
> 
> And, future TDX modules will report the status via registers.