linux-kernel - Re: [PATCH 1/5] KVM: x86: Remove VMX support for virtualizing guest MTRR memtypes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZfCL8mCmmEx5wGwv@google.com>
Date: Tue, 12 Mar 2024 10:08:02 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Dongli Zhang <dongli.zhang@...cle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Lai Jiangshan <jiangshanlai@...il.com>, 
	"Paul E. McKenney" <paulmck@...nel.org>, Josh Triplett <josh@...htriplett.org>, kvm@...r.kernel.org, 
	rcu@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Kevin Tian <kevin.tian@...el.com>, Yan Zhao <yan.y.zhao@...el.com>, 
	Yiwei Zhang <zzyiwei@...gle.com>
Subject: Re: [PATCH 1/5] KVM: x86: Remove VMX support for virtualizing guest
 MTRR memtypes

On Mon, Mar 11, 2024, Dongli Zhang wrote:
> 
> 
> On 3/8/24 17:09, Sean Christopherson wrote:
> > Remove KVM's support for virtualizing guest MTRR memtypes, as full MTRR
> > adds no value, negatively impacts guest performance, and is a maintenance
> > burden due to it's complexity and oddities.
> > 
> > KVM's approach to virtualizating MTRRs make no sense, at all.  KVM *only*
> > honors guest MTRR memtypes if EPT is enabled *and* the guest has a device
> > that may perform non-coherent DMA access.  From a hardware virtualization
> > perspective of guest MTRRs, there is _nothing_ special about EPT.  Legacy
> > shadowing paging doesn't magically account for guest MTRRs, nor does NPT.
> 
> [snip]
> 
> >  
> > -bool __kvm_mmu_honors_guest_mtrrs(bool vm_has_noncoherent_dma)
> > +bool kvm_mmu_may_ignore_guest_pat(void)
> >  {
> >  	/*
> > -	 * If host MTRRs are ignored (shadow_memtype_mask is non-zero), and the
> > -	 * VM has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is
> > -	 * to honor the memtype from the guest's MTRRs so that guest accesses
> > -	 * to memory that is DMA'd aren't cached against the guest's wishes.
> > -	 *
> > -	 * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
> > -	 * e.g. KVM will force UC memtype for host MMIO.
> > +	 * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM
> > +	 * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to
> > +	 * honor the memtype from the guest's PAT so that guest accesses to
> > +	 * memory that is DMA'd aren't cached against the guest's wishes.  As a
> > +	 * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA,
> > +	 * KVM _always_ ignores guest PAT (when EPT is enabled).
> >  	 */
> > -	return vm_has_noncoherent_dma && shadow_memtype_mask;
> > +	return shadow_memtype_mask;
> >  }
> >  
> 
> Any special reason to use the naming 'may_ignore_guest_pat', but not
> 'may_honor_guest_pat'?

Because which (after this series) is would either be misleading or outright wrong.
If KVM returns true from the helper based solely on shadow_memtype_mask, then it's
misleading because KVM will *always* honors guest PAT for such CPUs.  I.e. that
name would yield this misleading statement.

  If the CPU supports self-snoop, KVM may honor guest PAT.

If KVM returns true iff self-snoop is NOT available (as proposed in this series),
then it's outright wrong as KVM would return false, i.e. would make this incorrect
statement:

  If the CPU supports self-snoop, KVM never honors guest PAT.

As saying that KVM may not or cannot do something is saying that KVM will never
do that thing.

And because the EPT flag is "ignore guest PAT", not "honor guest PAT", but that's
as much coincidence as it is anything else.

> Since it is also controlled by other cases, e.g., kvm_arch_has_noncoherent_dma()
> at vmx_get_mt_mask(), it can be 'may_honor_guest_pat' too?
> 
> Therefore, why not directly use 'shadow_memtype_mask' (without the API), or some
> naming like "ept_enabled_for_hardware".

Again, after this series, KVM will *always* honor guest PAT for CPUs with self-snoop,
i.e. KVM will *never* ignore guest PAT.  But for CPUs without self-snoop (or with
errata), KVM conditionally honors/ignores guest PAT.

> Even with the code from PATCH 5/5, we still have high chance that VM has
> non-coherent DMA?

I don't follow.  On CPUs with self-snoop, whether or not the VM has non-coherent
DMA (from VFIO!) is irrelevant.  If the CPU has self-snoop, then KVM can safely
honor guest PAT at all times.

>  bool kvm_mmu_may_ignore_guest_pat(void)
>  {
>  	/*
> -	 * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM
> +	 * When EPT is enabled (shadow_memtype_mask is non-zero), the CPU does
> +	 * not support self-snoop (or is affected by an erratum), and the VM
>  	 * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to
>  	 * honor the memtype from the guest's PAT so that guest accesses to
>  	 * memory that is DMA'd aren't cached against the guest's wishes.  As a
>  	 * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA,
> -	 * KVM _always_ ignores guest PAT (when EPT is enabled).
> +	 * KVM _always_ ignores or honors guest PAT, i.e. doesn't toggle SPTE
> +	 * bits in response to non-coherent device (un)registration.
>  	 */
> -	return shadow_memtype_mask;
> +	return !static_cpu_has(X86_FEATURE_SELFSNOOP) && shadow_memtype_mask;
>  }
> 
> 
> Thank you very much!
> 
> Dongli Zhang