linux-kernel - Re: [PATCH] KVM: nVMX: Always use TLB_FLUSH

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJD7tkZQQUqh1GG5RpfYFT4-jK-CV7H+z9p2rTudLsrBe3WgbA@mail.gmail.com>
Date: Thu, 16 Jan 2025 10:24:02 -0800
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Jim Mattson <jmattson@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: nVMX: Always use TLB_FLUSH_GUEST for nested VM-Enter/VM-Exit

On Thu, Jan 16, 2025 at 9:11 AM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Thu, Jan 16, 2025, Yosry Ahmed wrote:
> > On Wed, Jan 15, 2025 at 9:27 PM Jim Mattson <jmattson@...gle.com> wrote:
> > > On Wed, Jan 15, 2025 at 7:50 PM Yosry Ahmed <yosryahmed@...gle.com> wrote:
> > > > Use KVM_REQ_TLB_FLUSH_GUEST in this case in
> > > > nested_vmx_transition_tlb_flush() for consistency. This arguably makes
> > > > more sense conceptually too -- L1 and L2 cannot share the TLB tag for
> > > > guest-physical translations, so only flushing linear and combined
> > > > translations (i.e. guest-generated translations) is needed.
>
> No, using KVM_REQ_TLB_FLUSH_CURRENT is correct.  From *L1's* perspective, VPID
> is enabled, and so VM-Entry/VM-Exit are NOT architecturally guaranteed to flush
> TLBs, and thus KVM is not required to FLUSH_GUEST.
>
> E.g. if KVM is using shadow paging (no EPT whatsoever), and L1 has modified the
> PTEs used to map L2 but has not yet flushed TLBs for L2's VPID, then KVM is allowed
> to retain its old, "stale" SPTEs that map L2 because architecturally they aren't
> guaranteed to be visible to L2.
>
> But because L1 and L2 share TLB entries *in hardware*, KVM needs to ensure the
> hardware TLBs are flushed.  Without EPT, KVM will use different CR3s for L1 and
> L2, but Intel's ASID tag doesn't include the CR3 address, only the PCID, which
> KVM always pulls from guest CR3, i.e. could be the same for L1 and L2.
>
> Specifically, the synchronization of shadow roots in kvm_vcpu_flush_tlb_guest()
> is not required in this scenario.

Aha, I was examining vmx_flush_tlb_guest() not
kvm_vcpu_flush_tlb_guest(), so I missed the synchronization. Yeah I
think it's possible that we end up unnecessarily synchronizing the
shadow page tables (or dropping them) in this case.

Do you think it's worth expanding the comment in
nested_vmx_transition_tlb_flush()?

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2ed454186e59c..43d34e413d016 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1239,6 +1239,11 @@ static void
nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
         * does not have a unique TLB tag (ASID), i.e. EPT is disabled and
         * KVM was unable to allocate a VPID for L2, flush the current context
         * as the effective ASID is common to both L1 and L2.
+        *
+        * Note that even though TLB_FLUSH_GUEST would be correct because we
+        * only need to flush linear mappings, it would unnecessarily
+        * synchronize the MMU even though a TLB flush is not architecturally
+        * required from L1's perspective.
         */
        if (!nested_has_guest_tlb_tag(vcpu))
                kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);