linux-kernel - Re: [PATCH 1/5] KVM: x86: Remove VMX support for virtualizing guest MTRR memtypes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5ee34382-b45b-2069-ea33-ef58acacaa79@oracle.com>
Date: Mon, 11 Mar 2024 18:10:50 -0700
From: Dongli Zhang <dongli.zhang@...cle.com>
To: Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Josh Triplett <josh@...htriplett.org>
Cc: kvm@...r.kernel.org, rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
        Kevin Tian <kevin.tian@...el.com>, Yan Zhao <yan.y.zhao@...el.com>,
        Yiwei Zhang <zzyiwei@...gle.com>
Subject: Re: [PATCH 1/5] KVM: x86: Remove VMX support for virtualizing guest
 MTRR memtypes



On 3/8/24 17:09, Sean Christopherson wrote:
> Remove KVM's support for virtualizing guest MTRR memtypes, as full MTRR
> adds no value, negatively impacts guest performance, and is a maintenance
> burden due to it's complexity and oddities.
> 
> KVM's approach to virtualizating MTRRs make no sense, at all.  KVM *only*
> honors guest MTRR memtypes if EPT is enabled *and* the guest has a device
> that may perform non-coherent DMA access.  From a hardware virtualization
> perspective of guest MTRRs, there is _nothing_ special about EPT.  Legacy
> shadowing paging doesn't magically account for guest MTRRs, nor does NPT.

[snip]

>  
> -bool __kvm_mmu_honors_guest_mtrrs(bool vm_has_noncoherent_dma)
> +bool kvm_mmu_may_ignore_guest_pat(void)
>  {
>  	/*
> -	 * If host MTRRs are ignored (shadow_memtype_mask is non-zero), and the
> -	 * VM has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is
> -	 * to honor the memtype from the guest's MTRRs so that guest accesses
> -	 * to memory that is DMA'd aren't cached against the guest's wishes.
> -	 *
> -	 * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
> -	 * e.g. KVM will force UC memtype for host MMIO.
> +	 * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM
> +	 * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to
> +	 * honor the memtype from the guest's PAT so that guest accesses to
> +	 * memory that is DMA'd aren't cached against the guest's wishes.  As a
> +	 * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA,
> +	 * KVM _always_ ignores guest PAT (when EPT is enabled).
>  	 */
> -	return vm_has_noncoherent_dma && shadow_memtype_mask;
> +	return shadow_memtype_mask;
>  }
>  

Any special reason to use the naming 'may_ignore_guest_pat', but not
'may_honor_guest_pat'?

Since it is also controlled by other cases, e.g., kvm_arch_has_noncoherent_dma()
at vmx_get_mt_mask(), it can be 'may_honor_guest_pat' too?

Therefore, why not directly use 'shadow_memtype_mask' (without the API), or some
naming like "ept_enabled_for_hardware".


Even with the code from PATCH 5/5, we still have high chance that VM has
non-coherent DMA?

 bool kvm_mmu_may_ignore_guest_pat(void)
 {
 	/*
-	 * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM
+	 * When EPT is enabled (shadow_memtype_mask is non-zero), the CPU does
+	 * not support self-snoop (or is affected by an erratum), and the VM
 	 * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to
 	 * honor the memtype from the guest's PAT so that guest accesses to
 	 * memory that is DMA'd aren't cached against the guest's wishes.  As a
 	 * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA,
-	 * KVM _always_ ignores guest PAT (when EPT is enabled).
+	 * KVM _always_ ignores or honors guest PAT, i.e. doesn't toggle SPTE
+	 * bits in response to non-coherent device (un)registration.
 	 */
-	return shadow_memtype_mask;
+	return !static_cpu_has(X86_FEATURE_SELFSNOOP) && shadow_memtype_mask;
 }


Thank you very much!

Dongli Zhang