linux-kernel - Re: [PATCH] KVM: nVMX: Always use TLB_FLUSH

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z4k9seeAK09VAKiz@google.com>
Date: Thu, 16 Jan 2025 09:11:13 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Jim Mattson <jmattson@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: nVMX: Always use TLB_FLUSH_GUEST for nested VM-Enter/VM-Exit

On Thu, Jan 16, 2025, Yosry Ahmed wrote:
> On Wed, Jan 15, 2025 at 9:27 PM Jim Mattson <jmattson@...gle.com> wrote:
> > On Wed, Jan 15, 2025 at 7:50 PM Yosry Ahmed <yosryahmed@...gle.com> wrote:
> > > Use KVM_REQ_TLB_FLUSH_GUEST in this case in
> > > nested_vmx_transition_tlb_flush() for consistency. This arguably makes
> > > more sense conceptually too -- L1 and L2 cannot share the TLB tag for
> > > guest-physical translations, so only flushing linear and combined
> > > translations (i.e. guest-generated translations) is needed.

No, using KVM_REQ_TLB_FLUSH_CURRENT is correct.  From *L1's* perspective, VPID
is enabled, and so VM-Entry/VM-Exit are NOT architecturally guaranteed to flush
TLBs, and thus KVM is not required to FLUSH_GUEST.

E.g. if KVM is using shadow paging (no EPT whatsoever), and L1 has modified the
PTEs used to map L2 but has not yet flushed TLBs for L2's VPID, then KVM is allowed
to retain its old, "stale" SPTEs that map L2 because architecturally they aren't
guaranteed to be visible to L2.

But because L1 and L2 share TLB entries *in hardware*, KVM needs to ensure the
hardware TLBs are flushed.  Without EPT, KVM will use different CR3s for L1 and
L2, but Intel's ASID tag doesn't include the CR3 address, only the PCID, which
KVM always pulls from guest CR3, i.e. could be the same for L1 and L2.

Specifically, the synchronization of shadow roots in kvm_vcpu_flush_tlb_guest()
is not required in this scenario.

  static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
  {
	++vcpu->stat.tlb_flush;

	if (!tdp_enabled) {
		/*
		 * A TLB flush on behalf of the guest is equivalent to
		 * INVPCID(all), toggling CR4.PGE, etc., which requires
		 * a forced sync of the shadow page tables.  Ensure all the
		 * roots are synced and the guest TLB in hardware is clean.
		 */
		kvm_mmu_sync_roots(vcpu);
		kvm_mmu_sync_prev_roots(vcpu);
	}

	kvm_x86_call(flush_tlb_guest)(vcpu);

	/*
	 * Flushing all "guest" TLB is always a superset of Hyper-V's fine
	 * grained flushing.
	 */
	kvm_hv_vcpu_purge_flush_tlb(vcpu);
  }