lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZtHvjzBFUbG3fcMc@google.com>
Date: Fri, 30 Aug 2024 09:13:03 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Vitaly Kuznetsov <vkuznets@...hat.com>
Cc: Gerd Hoffmann <kraxel@...hat.com>, Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, 
	rcu@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Kevin Tian <kevin.tian@...el.com>, Yan Zhao <yan.y.zhao@...el.com>, 
	Yiwei Zhang <zzyiwei@...gle.com>, Lai Jiangshan <jiangshanlai@...il.com>, 
	"Paul E. McKenney" <paulmck@...nel.org>, Josh Triplett <josh@...htriplett.org>
Subject: Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

On Fri, Aug 30, 2024, Vitaly Kuznetsov wrote:
> Vitaly Kuznetsov <vkuznets@...hat.com> writes:
> 
> > Sean Christopherson <seanjc@...gle.com> writes:
> >
> >> On Fri, Aug 30, 2024, Vitaly Kuznetsov wrote:
> >>> Gerd Hoffmann <kraxel@...hat.com> writes:
> >>> 
> >>> >> Necroposting!
> >>> >> 
> >>> >> Turns out that this change broke "bochs-display" driver in QEMU even
> >>> >> when the guest is modern (don't ask me 'who the hell uses bochs for
> >>> >> modern guests', it was basically a configuration error :-). E.g:
> >>> >
> >>> > qemu stdvga (the default display device) is affected too.
> >>> >
> >>> 
> >>> So far, I was only able to verify that the issue has nothing to do with
> >>> OVMF and multi-vcpu, it reproduces very well with
> >>> 
> >>> $ qemu-kvm -machine q35,accel=kvm,kernel-irqchip=split -name guest=c10s
> >>> -cpu host -smp 1 -m 16384 -drive file=/var/lib/libvirt/images/c10s-bios.qcow2,if=none,id=drive-ide0-0-0
> >>> -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1
> >>> -vnc :0 -device VGA -monitor stdio --no-reboot
> >>> 
> >>> Comparing traces of working and broken cases, I couldn't find anything
> >>> suspicious but I may had missed something of course. For now, it seems
> >>> like a userspace misbehavior resulting in a segfault.
> >>
> >> Guest userspace?
> >>
> >
> > Yes? :-) As Gerd described, video memory is "mapped into userspace so
> > the wayland / X11 display server can software-render into the buffer"
> > and it seems that wayland gets something unexpected in this memory and
> > crashes. 
> 
> Also, I don't know if it helps or not, but out of two hunks in
> 377b2f359d1f, it is the vmx_get_mt_mask() one which brings the
> issue. I.e. the following is enough to fix things:
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index f18c2d8c7476..733a0c45d1a6 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7659,13 +7659,11 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
>  
>         /*
>          * Force WB and ignore guest PAT if the VM does NOT have a non-coherent
> -        * device attached and the CPU doesn't support self-snoop.  Letting the
> -        * guest control memory types on Intel CPUs without self-snoop may
> -        * result in unexpected behavior, and so KVM's (historical) ABI is to
> -        * trust the guest to behave only as a last resort.
> +        * device attached.  Letting the guest control memory types on Intel
> +        * CPUs may result in unexpected behavior, and so KVM's ABI is to trust
> +        * the guest to behave only as a last resort.
>          */
> -       if (!static_cpu_has(X86_FEATURE_SELFSNOOP) &&
> -           !kvm_arch_has_noncoherent_dma(vcpu->kvm))
> +       if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
>                 return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
>  
>         return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT);

Hmm, that suggests the guest kernel maps the buffer as WC.  And looking at the
bochs driver, IIUC, the kernel mappings via ioremap() are UC-, not WC.  So it
could be that userspace doesn't play nice with WC, but could it also be that the
QEMU backend doesn't play nice with WC (on Intel)?

Given that this is a purely synthetic device, is there any reason to use UC or WC?
I.e. can the bochs driver configure its VRAM buffers to be WB?  It doesn't look
super easy (the DRM/TTM code has so. many. layers), but it appears doable.  Since
the device only exists in VMs, it's possible the bochs driver has never run on
Intel CPUs with WC memtype.

The one thing that confuses and concerns me is that this broke in the first place.
KVM has honored guest PAT on SVM since forever, which is why I/we had decent
confidence KVM could honor guest PAT on VMX without breaking anything.  SVM (NPT)
has an explicitlyed document special "WC+" memtype, where guest=WC && host=WB == WC+,
and WC+ accesses snoop caches on all CPUs.

But per Intel engineers, Intel CPUs with self-snoop are supposed to snoop caches
on all processors too.

I assume this same setup works fine on AMD/SVM?  If so, we probably need to do
more digging before fudging around this in the guest.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ