lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7ae57680-e657-48d3-56c1-bb818d73dd38@linux.microsoft.com>
Date:   Thu, 9 Mar 2023 18:58:45 +0100
From:   Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>
To:     alexander.grest@...rosoft.com
Cc:     Paolo Bonzini <pbonzini@...hat.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, Vitaly Kuznetsov <vkuznets@...hat.com>,
        Tianyu Lan <ltykernel@...il.com>,
        Michael Kelley <mikelley@...rosoft.com>,
        Sean Christopherson <seanjc@...gle.com>
Subject: Re: [PATCH] KVM: SVM: Disable TDP MMU when running on Hyper-V

@Alex,
do you know the answer to Sean's question below about ASID handling in Hyper-V?
Any other comments about how we're intending to fix things are also welcome.

Jeremi

On 08/03/2023 20:11, Sean Christopherson wrote:
> On Wed, Mar 08, 2023, Jeremi Piotrowski wrote:
>> On 08/03/2023 01:39, Sean Christopherson wrote:
>>> On Wed, Mar 08, 2023, Paolo Bonzini wrote:
>>>> On Tue, Mar 7, 2023 at 6:36 PM Sean Christopherson <seanjc@...gle.com> wrote:
>>>>> Thinking about this more, I would rather revert commit 1e0c7d40758b ("KVM: SVM:
>>>>> hyper-v: Remote TLB flush for SVM") or fix the thing properly straitaway.  KVM
>>>>> doesn't magically handle the flushes correctly for the shadow/legacy MMU, KVM just
>>>>> happens to get lucky and not run afoul of the underlying bugs.
>>>>
>>>> I don't think it's about luck---the legacy MMU's zapping/invalidation
>>>> seems to invoke the flush hypercall correctly:
>>>
>>> ...for the paths that Jeremi has exercised, and for which a stale TLB entry is
>>> fatal to L2.  E.g. kvm_unmap_gfn_range() does not have a range-based TLB flush
>>> in its path and fully relies on the buggy kvm_flush_remote_tlbs().
>>>
>>
>> Why do you say "buggy kvm_flush_remote_tlbs"? kvm_flush_remote_tlbs calls the
>> hypercall that is needed, I don't see how this might be an issue of a missing
>> "range-based TLB flush".
> 
> Doh, I forgot that the arch hook in kvm_flush_remote_tlbs() leads to the Hyper-V
> hook.
> 
> svm_flush_tlb_current is very much broken, but in practice it doesn't matter outside
> of the direct call from kvm_mmu_load(), because in all other paths KVM will flow
> through a Hyper-V flush if KVM actually modifies its MMU in any ways.  E.g. the
> request from kvm_mmu_new_pgd() when force_flush_and_sync_on_reuse=true is neutered,
> but that exists only as a safeguard against MMU bugs.  And for things like
> kvm_invalidate_pcid() and kvm_post_set_cr4(), my understanding is that Hyper-V
> will still flush the bare metal TLB, it's only Hyper-V's shadow page tables that
> are stale.
> 
> Depending on how Hyper-V handles ASIDs, pre_svm_run() may also be broken.  If
> Hyper-V tracks and rebuilds only the current ASID, then bumping the ASID won't
> rebuild potentially stale page tables.  But I'm guessing Hyper-V ignores the ASID
> since the hypercall takes only the root PA.
> 
> The truly problematic case is kvm_mmu_load(), where KVM relies on the flush to
> force Hyper-V to rebuild shadow page tables for an old, possibly stale nCR3.  This
> affects only the TDP MMU because of an explicit optimization in the TDP MMU.  So
> in practice we could squeak by with something like this:
> 
> 	if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb)
> 		hyperv_flush_guest_mapping(vcpu->arch.mmu->root.hpa);
> 	else
> 		static_call(kvm_x86_flush_tlb_current)(vcpu);
> 
> but I'm not convinced that avoiding a hypercall in svm_flush_tlb_current() just
> to avoid overhead when running an L3 (nested VM from L1 KVM's perspective) is
> worthwhile.  The real problem there is that KVM nested SVM TLB/ASID support is an
> unoptimized mess, and I can't imagine someone running an L3 is going to be super
> concerned with performance.
> 
> I also think we should have a sanity check in the flush_tlb_all() path, i.e. WARN
> if kvm_flush_remote_tlbs() falls back.
> 
> Something like this (probably doesn't compile, likely needs #ifdefs or helpers):
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7ef4f9e3b35a..38afc5cac1c4 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3770,7 +3770,7 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>         svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
>  }
>  
> -static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> +static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
>  {
>         struct vcpu_svm *svm = to_svm(vcpu);
>  
> @@ -3794,6 +3794,23 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
>                 svm->current_vmcb->asid_generation--;
>  }
>  
> +static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> +{
> +       if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb &&
> +           VALID_PAGE(vcpu->arch.mmu->root.hpa))
> +               hyperv_flush_guest_mapping(vcpu->arch.mmu->root.hpa);
> +
> +       svm_flush_tlb_asid(vcpu);
> +}
> +
> +static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
> +{
> +       if (WARN_ON_ONCE(kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb))
> +               hv_remote_flush_tlb(vcpu->kvm);
> +
> +       svm_flush_tlb_asid(vcpu);
> +}
> +
>  static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
>  {
>         struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4786,10 +4803,10 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>         .set_rflags = svm_set_rflags,
>         .get_if_flag = svm_get_if_flag,
>  
> -       .flush_tlb_all = svm_flush_tlb_current,
> +       .flush_tlb_all = svm_flush_tlb_all,
>         .flush_tlb_current = svm_flush_tlb_current,
>         .flush_tlb_gva = svm_flush_tlb_gva,
> -       .flush_tlb_guest = svm_flush_tlb_current,
> +       .flush_tlb_guest = svm_flush_tlb_asid,
>  
>         .vcpu_pre_run = svm_vcpu_pre_run,
>         .vcpu_run = svm_vcpu_run,

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ