[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z8fwZ-94duaK4c2p@google.com>
Date: Wed, 5 Mar 2025 06:34:15 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Maxim Levitsky <mlevitsk@...hat.com>
Cc: Sean Christopherson <seanjc@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 11/13] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02
on nested entry
On Tue, Mar 04, 2025 at 10:01:25PM -0500, Maxim Levitsky wrote:
> On Mon, 2025-03-03 at 22:14 +0000, Yosry Ahmed wrote:
> > On Fri, Feb 28, 2025 at 09:17:52PM -0500, Maxim Levitsky wrote:
> > > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote:
> > > > TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to
> > > > L2. This is unnecessary because it should always be
> > > > TLB_CONTROL_DO_NOTHING at this point.
> > > >
> > > > The flow for setting TLB_CONTROL is as follows:
> > > > 1. In vcpu_enter_guest(), servicing a TLB flush request may set it to
> > > > TLB_CONTROL_FLUSH_ASID in svm_flush_tlb_asid().
> > > > 2. In svm_vcpu_run() -> pre_svm_run(), it may get upgraded to
> > > > TLB_CONTROL_FLUSH_ALL_ASID when assigning a new ASID.
> > > > 3. In svm_cpu_run(), it gets reset to TLB_CONTROL_DO_NOTHING after the
> > > > guest is run.
> > > >
> > > > Hence, TLB_CONTROL is reset after each run and there is no need to do it
> > > > again on every nested transition to L2.
> > > >
> > > > There is a TODO in nested_svm_transition_tlb_flush() about this reset
> > > > crushing pending TLB flushes. Remove it, as the reset is not really
> > > > crushing anything as explained above.
> > >
> > > I am not sure that we don't crush a pending tlb request:
> > >
> > > svm_flush_tlb_asid can also be called by KVM_REQ_TLB_FLUSH
> > > and set the flush request in both vmcbs, thus later the nested_svm_exit_tlb_flush
> > > can crush this request.
> >
> > How so?
> >
> > nested_svm_exit_tlb_flush() makes a KVM_REQ_TLB_FLUSH_GUEST request.
> > svm_flush_tlb_asid() is called when servicing KVM_REQ_TLB_* requests.
>
> I am probably missing something but:
>
> Suppose KVM_REQ_TLB_FLUSH is raised and then processed while ordinary L1 entry is happening,
> but nested state is allocated.
>
> (KVM_REQ_TLB_FLUSH can be raised anytime MMU wants a 'big hammer flush everything')
>
> In this case svm_flush_tlb_all will call svm_flush_tlb_asid on both vmcbs (see patch 8),
> and that will set TLB_CONTROL_FLUSH_ASID in both vmcbs.
> In particular it will be set in vmcb02.
>
> Later, maybe even hours later in theory, L1 issues VMRUN, we reach nested_vmcb02_prepare_control,
> and crush the value (TLB_CONTROL_FLUSH_ASID) set in vmcb02.
>
> I think that this is what the removed comment referred to.
When KVM_REQ_TLB_FLUSH is raised, we do not call svm_flush_tlb_all()
immediately. We only call svm_flush_tlb_all() when the request is
serviced in vcpu_enter_guest():
/*
* Note, the order matters here, as flushing "all" TLB entries
* also flushes the "current" TLB entries, i.e. servicing the
* flush "all" will clear any request to flush "current".
*/
if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
kvm_vcpu_flush_tlb_all(vcpu);
kvm_service_local_tlb_flush_requests(vcpu);
IIUC, vcpu_enter_guest() will be called for L2 after
nested_vmcb02_prepare_control() is called. My understanding is that the
sequence of events is as follows:
- L1 executes VMRUN, which is trapped and emulated by L0.
- KVM executes handles the VM-exit in L0 by calling
vmrun_interception() to setup VMCB02 in prepartion for running L2.
This will call nested_svm_vmrun() -> enter_svm_guest_mode() ->
nested_vmcb02_prepare_control() (setting tlb_ctl to
TLB_CONTROL_DO_NOTHING).
- Execution will pick up after the VMRUN instruction in L0, eventually
getting to the loop in vcpu_run(), and calling vcpu_enter_guest()
for L2. At this point any pending TLB flush requests (e.g.
KVM_REQ_TLB_FLUSH) will be handled, and svm_flush_tlb_*() functions
may be called to set tlb_ctl to a non-zero value (e.g.
TLB_CONTROL_FLUSH_ASID).
- A little bit later in svm_vcpu_run() -> pre_svm_run(), tlb_ctl may be
upgraded to TLB_CONTROL_FLUSH_ALL_ASID if a new ASID is allocated.
- After the guest is run, svm_cpu_run() resets tlb_ctl to TLB_CONTROL_DO_NOTHING.
So nested_vmcb02_prepare_control() setting tlb_ctl to
TLB_CONTROL_DO_NOTHING should have no effect because tlb_ctl is set
after that anyway before L2 is run, and reset back to
TLB_CONTROL_DO_NOTHING after L2 is run.
I tried to clarify this in the commit log, but I don't think it is clear
enough. I will try to add more details in the next version.
Please correct me if I am wrong.
Powered by blists - more mailing lists