linux-kernel - Re: [PATCH] KVM: nVMX: Always use TLB_FLUSH

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z4qbDBduEYWEwjkS@google.com>
Date: Fri, 17 Jan 2025 10:01:48 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Jim Mattson <jmattson@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: nVMX: Always use TLB_FLUSH_GUEST for nested VM-Enter/VM-Exit

On Thu, Jan 16, 2025, Yosry Ahmed wrote:
> On Thu, Jan 16, 2025 at 4:35 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > On Thu, Jan 16, 2025, Yosry Ahmed wrote:
> > > On Thu, Jan 16, 2025 at 2:35 PM Sean Christopherson <seanjc@...gle.com> wrote:
> > > > How about:
> > > >
> > > >          * Note, only the hardware TLB entries need to be flushed, as VPID is
> > > >          * fully enabled from L1's perspective, i.e. there's no architectural
> > > >          * TLB flush from L1's perspective.
> > >
> > > I hate to bikeshed, but I want to explicitly call out that we do not
> > > need to synchronize the MMU.
> >
> > Why?  Honest question, I want to understand what's unclear.  My hesitation to
> > talk about synchronizing MMUs is that it brings things into play that aren't
> > super relevant to this specific code, and might even add confusion.  Specifically,
> > kvm_vcpu_flush_tlb_guest() does NOT synchronize MMUs when EPT/TDP is enabled, but
> > the fact that this path is reachable if and only if EPT is enabled is completely
> > coincidental.
> 
> Personally, the main thing that was unclear to me and I wanted to add
> a comment to clarify was why we use KVM_REQ_TLB_FLUSH_GUEST in the
> first two cases but KVM_REQ_TLB_FLUSH_CURRENT in the last one.
>
> Here's my understanding:
> 
> In the first case (i.e. !nested_cpu_has_vpid(vmcs12)), the flush is
> architecturally required from L1's perspective to we need to flush
> guest-generated TLB entries (and potentially synchronize KVM's MMU).
> 
> In the second case, KVM does not track the history of VPID12, so the
> flush *may* be architecturally required from L1's perspective, so we
> do the same thing.
> 
> In the last case though, the flush is NOT architecturally required
> from L1's perspective, it's just an artifact of KVM's potential
> failure to allocate a dedicated VPID for L2 despite L1 asking for it.
> 
> So ultimately, I don't want to specifically call out synchronizing
> MMUs, as much as I want to call out why this case uses
> KVM_REQ_TLB_FLUSH_CURRENT and not KVM_REQ_TLB_FLUSH_GUEST like the
> others. I only suggested calling out the MMU synchronization since
> it's effectively the only difference between the two in this case.

Yep.  I suspect the issue is lack of documentation for TLB_FLUSH_GUEST and
TLB_FLUSH_CURRENT.  I'm not entirely sure where it would be best to document
them.  I guess maybe where they are #defined?

TLB_FLUSH_GUEST is used when a flush of the guest's TLB, from the guest's
perspective, is architecturally required.  The one oddity with TLB_FLUSH_GUEST
is that it does NOT include guest-physical mappings, i.e. TLB entries that are
associated with an EPT root.

TLB_FLUSH_CURRENT is used when KVM needs to flush the hardware TLB(s) for the
current CPU+context.  The most "obvious" case is for when KVM has modified its
page tables.  More subtle cases are things like changing which physical CPU the
vCPU is running on, and this case where KVM is switching the shadow CR3, VPID is
enabled in the host (i.e. hardware won't flush TLBs), and the L1 vs. L2 shadow
CR3s are not tagged (i.e. use the same hardware VPID).

The use of TLB_FLUSH_GUEST *may* result in an MMU sync, but that's a side effect
of an architectural guest TLB flush occurring, not the other way around.