lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877cbyuzdn.fsf@redhat.com>
Date: Fri, 30 Aug 2024 11:35:32 +0200
From: Vitaly Kuznetsov <vkuznets@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini
 <pbonzini@...hat.com>
Cc: kvm@...r.kernel.org, rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
 Kevin Tian <kevin.tian@...el.com>, Yan Zhao <yan.y.zhao@...el.com>, Yiwei
 Zhang <zzyiwei@...gle.com>, Lai Jiangshan <jiangshanlai@...il.com>, "Paul
 E. McKenney" <paulmck@...nel.org>, Josh Triplett <josh@...htriplett.org>,
 Gerd Hoffmann <kraxel@...hat.com>
Subject: Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that
 support self-snoop

Sean Christopherson <seanjc@...gle.com> writes:

> Unconditionally honor guest PAT on CPUs that support self-snoop, as
> Intel has confirmed that CPUs that support self-snoop always snoop caches
> and store buffers.  I.e. CPUs with self-snoop maintain cache coherency
> even in the presence of aliased memtypes, thus there is no need to trust
> the guest behaves and only honor PAT as a last resort, as KVM does today.
>
> Honoring guest PAT is desirable for use cases where the guest has access
> to non-coherent DMA _without_ bouncing through VFIO, e.g. when a virtual
> (mediated, for all intents and purposes) GPU is exposed to the guest, along
> with buffers that are consumed directly by the physical GPU, i.e. which
> can't be proxied by the host to ensure writes from the guest are performed
> with the correct memory type for the GPU.
>
> Cc: Yiwei Zhang <zzyiwei@...gle.com>
> Suggested-by: Yan Zhao <yan.y.zhao@...el.com>
> Suggested-by: Kevin Tian <kevin.tian@...el.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>  arch/x86/kvm/mmu/mmu.c |  8 +++++---
>  arch/x86/kvm/vmx/vmx.c | 10 ++++++----
>  2 files changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 403cd8f914cd..7fa514830628 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4622,14 +4622,16 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
>  bool kvm_mmu_may_ignore_guest_pat(void)
>  {
>  	/*
> -	 * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM
> +	 * When EPT is enabled (shadow_memtype_mask is non-zero), the CPU does
> +	 * not support self-snoop (or is affected by an erratum), and the VM
>  	 * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to
>  	 * honor the memtype from the guest's PAT so that guest accesses to
>  	 * memory that is DMA'd aren't cached against the guest's wishes.  As a
>  	 * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA,
> -	 * KVM _always_ ignores guest PAT (when EPT is enabled).
> +	 * KVM _always_ ignores or honors guest PAT, i.e. doesn't toggle SPTE
> +	 * bits in response to non-coherent device (un)registration.
>  	 */
> -	return shadow_memtype_mask;
> +	return !static_cpu_has(X86_FEATURE_SELFSNOOP) && shadow_memtype_mask;
>  }
>  
>  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 17a8e4fdf9c4..5dc4c24ae203 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7605,11 +7605,13 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
>  
>  	/*
>  	 * Force WB and ignore guest PAT if the VM does NOT have a non-coherent
> -	 * device attached.  Letting the guest control memory types on Intel
> -	 * CPUs may result in unexpected behavior, and so KVM's ABI is to trust
> -	 * the guest to behave only as a last resort.
> +	 * device attached and the CPU doesn't support self-snoop.  Letting the
> +	 * guest control memory types on Intel CPUs without self-snoop may
> +	 * result in unexpected behavior, and so KVM's (historical) ABI is to
> +	 * trust the guest to behave only as a last resort.
>  	 */
> -	if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
> +	if (!static_cpu_has(X86_FEATURE_SELFSNOOP) &&
> +	    !kvm_arch_has_noncoherent_dma(vcpu->kvm))
>  		return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
>  
>  	return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT);

Necroposting!

Turns out that this change broke "bochs-display" driver in QEMU even
when the guest is modern (don't ask me 'who the hell uses bochs for
modern guests', it was basically a configuration error :-). E.g:

$ qemu-kvm -name c10s -nodefaults -smp 4 -machine
q35,smm=on,accel=kvm,kernel-irqchip=split -global
driver=cfi.pflash01,property=secure,value=on -cpu host -drive
id=drive_image2,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/var/lib/libvirt/images/c10s.qcow2
-device virtio-blk-pci,id=image2,drive=drive_image2,bootindex=3,bus=pcie.0,addr=0x8
-drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,readonly=on,unit=0
-drive file=/tmp/OVMF_VARS.secboot.fd,if=pflash,format=raw,unit=1
-device ahci,id=ahci0 -vnc :0 -device bochs-display -m 8G -monitor stdio

The failure looks like Wayland starting and failing right away, this
repeats multiple times but after a number of restarts it may try to
pretend to work but then it crashes again. Things go back to normal when
the commit is reverted in the host kernel.

The CPU where this reproduces is fairly modern too (Intel(R) Xeon(R)
Silver 4410Y, Sapphire Rapids). I wish I could give additional details
to what exactly happens in the guest but I can't find anything useful in
the logs ("WARNING: Application 'org.gnome.Shell.desktop' killed by
signal 9") and I know too little (nothing?) about how modern Linux
graphics stack is organized :-( Cc: Gerd just in case.

-- 
Vitaly


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ