lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z8fxGYt6OmTlOd5i@google.com>
Date: Wed, 5 Mar 2025 06:37:13 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Maxim Levitsky <mlevitsk@...hat.com>
Cc: Sean Christopherson <seanjc@...gle.com>,
	Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 12/13] KVM: nSVM: Service local TLB flushes before
 nested transitions

On Tue, Mar 04, 2025 at 10:03:51PM -0500, Maxim Levitsky wrote:
> On Mon, 2025-03-03 at 22:18 +0000, Yosry Ahmed wrote:
> > On Fri, Feb 28, 2025 at 09:20:18PM -0500, Maxim Levitsky wrote:
> > > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote:
> > > > KVM does not track TLB flush requests for L1 vs. L2. Hence, service
> > > > local flush that target the current context before switching to a new
> > > > one. Since ASIDs are tracked per-VMCB, service the flushes before every
> > > > VMCB switch.
> > > > 
> > > > This is conceptually similar to how nVMX calls
> > > > kvm_service_local_tlb_flush_requests() in
> > > > nested_vmx_enter_non_root_mode() and nested_vmx_vmexit(), with the
> > > > following differences:
> > > > 
> > > > 1. nVMX tracks the current VPID based on is_guest_mode(), so local TLB
> > > >    flushes are serviced before enter_guest_mode() and
> > > >    leave_guest_mode(). On the other hand, nSVM tracks the current ASID
> > > >    based on the current VMCB, so the TLB flushes are serviced before an
> > > >    VMCB switch.
> > > > 
> > > > 2. nVMX only enters and leaves guest mode in
> > > >    nested_vmx_enter_non_root_mode() and nested_vmx_vmexit(). Other paths
> > > >    like vmx_set_nested_state() and vmx_leave_nested() call into these
> > > >    two functions. On the other hand, nSVM open codes the switch in
> > > >    functions like svm_set_nested_state() and svm_leave_nested(), so
> > > >    servicing the flush in svm_switch_svm() is probably most reliable.
> > > > 
> > > > Signed-off-by: Yosry Ahmed <yosry.ahmed@...ux.dev>
> > > > ---
> > > >  arch/x86/kvm/svm/svm.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > index 5e7b1c9bfa605..6daa7efa9262b 100644
> > > > --- a/arch/x86/kvm/svm/svm.c
> > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > @@ -1421,6 +1421,12 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > > >  
> > > >  void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
> > > >  {
> > > > +	/*
> > > > +	 * ASIDs are tracked per-VMCB. Perform any pending TLB flushes for the
> > > > +	 * current VMCB before switching to a new one.
> > > > +	 */
> > > > +	kvm_service_local_tlb_flush_requests(&svm->vcpu);
> > > > +
> > > >  	svm->current_vmcb = target_vmcb;
> > > >  	svm->vmcb = target_vmcb->ptr;
> > > >  }
> > > 
> > > Note that another difference between SVM and VMX is that this code will only set tlb_ctl
> > > in the current vmcb, the actual flush can happen much later, when we do VM entry with this vmcb,
> > > e.g if we are now in L2, the flush will happen when we enter L2 again.
> > 
> > Right, but I think the internal implementation of the TLB flushes is not
> > relevant in this specific instance. Do you think it would be useful to
> > mention that here?
> 
> I am not sure to be honest, I just mentioned this because in theory there can be a difference,
> in regard to the fact that we might think that we flushed the TLB while in fact we haven't yet.
> 
> I am trying my best to think about what hidden problems might lurk around and surface later.
> 
> Not directly related to the above, but I am thinking:
> I really like the way SVM flush works because it ensures that redundant flushes don't cost anything.
> 
> I wonder if we can make VMX code emulate this,
> by having an emulated 'tlb_control' field and then doing the flush (INVEPT) on VM entry.

I briefly thought about this while working on this series. One way to do
this is to make TLB flush requests (e.g. KVM_REQ_TLB_FLUSH) when
emulating INVVPID and INVEPT instead of doing the flush right away. Then
we do service the flushes once before running the guest. Redundant
flushes are coalesced.

But I did not look too closely if the existing TLB flush requests
satisfy all the use cases or if we would need to create new ones. Not
sure if the complexity would be worth it, but something to think about.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ