linux-kernel - Re: [PATCH] KVM: nVMX: Always use TLB_FLUSH

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkaa1cqUeUUKNdQADBqXH-G9h=5Liv+wj=5gitgbdO9Tsw@mail.gmail.com>
Date: Fri, 17 Jan 2025 10:20:38 -0800
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Jim Mattson <jmattson@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: nVMX: Always use TLB_FLUSH_GUEST for nested VM-Enter/VM-Exit

On Fri, Jan 17, 2025 at 10:01 AM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Thu, Jan 16, 2025, Yosry Ahmed wrote:
> > On Thu, Jan 16, 2025 at 4:35 PM Sean Christopherson <seanjc@...gle.com> wrote:
> > >
> > > On Thu, Jan 16, 2025, Yosry Ahmed wrote:
> > > > On Thu, Jan 16, 2025 at 2:35 PM Sean Christopherson <seanjc@...gle.com> wrote:
> > > > > How about:
> > > > >
> > > > >          * Note, only the hardware TLB entries need to be flushed, as VPID is
> > > > >          * fully enabled from L1's perspective, i.e. there's no architectural
> > > > >          * TLB flush from L1's perspective.
> > > >
> > > > I hate to bikeshed, but I want to explicitly call out that we do not
> > > > need to synchronize the MMU.
> > >
> > > Why?  Honest question, I want to understand what's unclear.  My hesitation to
> > > talk about synchronizing MMUs is that it brings things into play that aren't
> > > super relevant to this specific code, and might even add confusion.  Specifically,
> > > kvm_vcpu_flush_tlb_guest() does NOT synchronize MMUs when EPT/TDP is enabled, but
> > > the fact that this path is reachable if and only if EPT is enabled is completely
> > > coincidental.
> >
> > Personally, the main thing that was unclear to me and I wanted to add
> > a comment to clarify was why we use KVM_REQ_TLB_FLUSH_GUEST in the
> > first two cases but KVM_REQ_TLB_FLUSH_CURRENT in the last one.
> >
> > Here's my understanding:
> >
> > In the first case (i.e. !nested_cpu_has_vpid(vmcs12)), the flush is
> > architecturally required from L1's perspective to we need to flush
> > guest-generated TLB entries (and potentially synchronize KVM's MMU).
> >
> > In the second case, KVM does not track the history of VPID12, so the
> > flush *may* be architecturally required from L1's perspective, so we
> > do the same thing.
> >
> > In the last case though, the flush is NOT architecturally required
> > from L1's perspective, it's just an artifact of KVM's potential
> > failure to allocate a dedicated VPID for L2 despite L1 asking for it.
> >
> > So ultimately, I don't want to specifically call out synchronizing
> > MMUs, as much as I want to call out why this case uses
> > KVM_REQ_TLB_FLUSH_CURRENT and not KVM_REQ_TLB_FLUSH_GUEST like the
> > others. I only suggested calling out the MMU synchronization since
> > it's effectively the only difference between the two in this case.
>
> Yep.  I suspect the issue is lack of documentation for TLB_FLUSH_GUEST and
> TLB_FLUSH_CURRENT.  I'm not entirely sure where it would be best to document
> them.  I guess maybe where they are #defined?

I guess at the #define we can just mention that they result in calling
kvm_vcpu_flush_tlb_{guest/current}() before entering the guest, if
anything.

The specific documentation about what they do could be above the
functions themselves, and describing the potential MMU sync is
naturally part of documenting kvm_vcpu_flush_tlb_guest() (kinda
already there).

The flush_tlb_guest() callback is documented in kvm_host.h, but not
flush_tlb_current(). I was going to suggest just documenting that. But
kvm_vcpu_flush_tlb_guest() does not only call flush_tlb_guest(), but
it also potentially synchronizes the MMU. So only documenting the
callbacks does not paint a full picture.

FTR, I initially confused myself because all kvm_vcpu_flush_tlb_*()
functions are more-or-less thin wrappers around the per-vendor
callbacks -- except kvm_vcpu_flush_tlb_guest().

>
> TLB_FLUSH_GUEST is used when a flush of the guest's TLB, from the guest's
> perspective, is architecturally required.  The one oddity with TLB_FLUSH_GUEST
> is that it does NOT include guest-physical mappings, i.e. TLB entries that are
> associated with an EPT root.

The way I think about this is how it's documented above the per-vendor
callback. It flushes translations created by the guest. The guest does
not (directly) create guest-physical translations, only linear and
combined translations.

>
> TLB_FLUSH_CURRENT is used when KVM needs to flush the hardware TLB(s) for the
> current CPU+context.  The most "obvious" case is for when KVM has modified its
> page tables.  More subtle cases are things like changing which physical CPU the
> vCPU is running on, and this case where KVM is switching the shadow CR3, VPID is
> enabled in the host (i.e. hardware won't flush TLBs), and the L1 vs. L2 shadow
> CR3s are not tagged (i.e. use the same hardware VPID).
>
> The use of TLB_FLUSH_GUEST *may* result in an MMU sync, but that's a side effect
> of an architectural guest TLB flush occurring, not the other way around.

Agreed.