linux-kernel - Re: [RFC PATCH 11/13] KVM: nSVM: Do not reset TLB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z8fwZ-94duaK4c2p@google.com>
Date: Wed, 5 Mar 2025 06:34:15 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Maxim Levitsky <mlevitsk@...hat.com>
Cc: Sean Christopherson <seanjc@...gle.com>,
	Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 11/13] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02
 on nested entry

On Tue, Mar 04, 2025 at 10:01:25PM -0500, Maxim Levitsky wrote:
> On Mon, 2025-03-03 at 22:14 +0000, Yosry Ahmed wrote:
> > On Fri, Feb 28, 2025 at 09:17:52PM -0500, Maxim Levitsky wrote:
> > > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote:
> > > > TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to
> > > > L2. This is unnecessary because it should always be
> > > > TLB_CONTROL_DO_NOTHING at this point.
> > > > 
> > > > The flow for setting TLB_CONTROL is as follows:
> > > > 1. In vcpu_enter_guest(), servicing a TLB flush request may set it to
> > > > TLB_CONTROL_FLUSH_ASID in svm_flush_tlb_asid().
> > > > 2. In svm_vcpu_run() -> pre_svm_run(), it may get upgraded to
> > > > TLB_CONTROL_FLUSH_ALL_ASID when assigning a new ASID.
> > > > 3. In svm_cpu_run(), it gets reset to TLB_CONTROL_DO_NOTHING after the
> > > > guest is run.
> > > > 
> > > > Hence, TLB_CONTROL is reset after each run and there is no need to do it
> > > > again on every nested transition to L2.
> > > > 
> > > > There is a TODO in nested_svm_transition_tlb_flush() about this reset
> > > > crushing pending TLB flushes. Remove it, as the reset is not really
> > > > crushing anything as explained above.
> > > 
> > > I am not sure that we don't crush a pending tlb request: 
> > > 
> > > svm_flush_tlb_asid can also be called by KVM_REQ_TLB_FLUSH
> > > and set the flush request in both vmcbs, thus later the nested_svm_exit_tlb_flush
> > > can crush this request.
> > 
> > How so?
> > 
> > nested_svm_exit_tlb_flush() makes a KVM_REQ_TLB_FLUSH_GUEST request.
> > svm_flush_tlb_asid() is called when servicing KVM_REQ_TLB_* requests.
> 
> I am probably missing something but:
> 
> Suppose KVM_REQ_TLB_FLUSH is raised and then processed while ordinary L1 entry is happening,
> but nested state is allocated.
> 
> (KVM_REQ_TLB_FLUSH can be raised anytime MMU wants a 'big hammer flush everything')
> 
> In this case svm_flush_tlb_all will call svm_flush_tlb_asid on both vmcbs (see patch 8),
> and that will set TLB_CONTROL_FLUSH_ASID in both vmcbs.
> In particular it will be set in vmcb02.
> 
> Later, maybe even hours later in theory, L1 issues VMRUN, we reach nested_vmcb02_prepare_control,
> and crush the value (TLB_CONTROL_FLUSH_ASID) set in vmcb02.
> 
> I think that this is what the removed comment referred to.

When KVM_REQ_TLB_FLUSH is raised, we do not call svm_flush_tlb_all()
immediately. We only call svm_flush_tlb_all() when the request is
serviced in vcpu_enter_guest():

	/*
	 * Note, the order matters here, as flushing "all" TLB entries
	 * also flushes the "current" TLB entries, i.e. servicing the
	 * flush "all" will clear any request to flush "current".
	 */
	if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
		kvm_vcpu_flush_tlb_all(vcpu);

	kvm_service_local_tlb_flush_requests(vcpu);

IIUC, vcpu_enter_guest() will be called for L2 after
nested_vmcb02_prepare_control() is called. My understanding is that the
sequence of events is as follows:
- L1 executes VMRUN, which is trapped and emulated by L0.

- KVM executes handles the VM-exit in L0 by calling
  vmrun_interception() to setup VMCB02 in prepartion for running L2.
  This will call nested_svm_vmrun() -> enter_svm_guest_mode() ->
  nested_vmcb02_prepare_control() (setting tlb_ctl to
  TLB_CONTROL_DO_NOTHING).

- Execution will pick up after the VMRUN instruction in L0, eventually
  getting to the loop in vcpu_run(), and calling vcpu_enter_guest()
  for L2. At this point any pending TLB flush requests (e.g.
  KVM_REQ_TLB_FLUSH) will be handled, and svm_flush_tlb_*() functions
  may be called to set tlb_ctl to a non-zero value (e.g.
  TLB_CONTROL_FLUSH_ASID).

- A little bit later in svm_vcpu_run() -> pre_svm_run(), tlb_ctl may be
  upgraded to TLB_CONTROL_FLUSH_ALL_ASID if a new ASID is allocated.
 
- After the guest is run, svm_cpu_run() resets tlb_ctl to TLB_CONTROL_DO_NOTHING.

So nested_vmcb02_prepare_control() setting tlb_ctl to
TLB_CONTROL_DO_NOTHING should have no effect because tlb_ctl is set
after that anyway before L2 is run, and reset back to
TLB_CONTROL_DO_NOTHING after L2 is run.

I tried to clarify this in the commit log, but I don't think it is clear
enough. I will try to add more details in the next version.

Please correct me if I am wrong.