linux-kernel - Re: [5.2 regression] x86/fpu changes cause crashes in KVM guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANRm+CxWbkr0=DB7DBdaQOsTTt0XS5vSk_BRL2iFeAAm81H8Bg@mail.gmail.com>
Date:   Fri, 19 Jul 2019 16:59:25 +0800
From:   Wanpeng Li <kernellwp@...il.com>
To:     Thomas Lambertz <mail@...maslambertz.de>
Cc:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Rik van Riel <riel@...riel.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krcmar <rkrcmar@...hat.com>, kvm <kvm@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [5.2 regression] x86/fpu changes cause crashes in KVM guest

Cc kvm ml,
On Thu, 18 Jul 2019 at 08:08, Thomas Lambertz <mail@...maslambertz.de> wrote:
>
> Since kernel 5.2, I've been experiencing strange issues in my Windows 10
> QEMU/KVM guest.
> Via bisection, I have tracked down that the issue lies in the FPU state
> handling changes.
> Kernels before 8ff468c29e9a9c3afe9152c10c7b141343270bf3 work great, the
> ones afterwards are affected.
> Sometimes the state seems to be restored incorrectly in the guest.
>
> I have managed to reproduce it relatively cleanly, on a linux guest.
> (ubuntu-server 18.04, but that should not matter, since it occured on
> windows aswell)
>
> To reproduce the issue, you need prime95 (or mprime), from
> https://www.mersenne.org/download/ .
> This is just a stress test for the FPU, which helps reproduce the error
> much quicker.
>
> - Run it in the guest as 'Benchmark Only', and choose the '(2) Small
> FFTs' torture test. Give it the maximum amount of cores (for me 10).
> - On the host, run the same test. To keep my pc usable, I limited it to
> 5 cores. I do this to put some pressure on the system.
> - repeatedly focus and unfocus the qemu window
>
> With this config, errors in the guest usually occur within 30 seconds.
> Without the refocusing, takes ~5min on average, but the variance of this
> time is quite large.
>
> The error messages are either
>      "FATAL ERROR: Rounding was ......., expected less than 0.4"
> or
>      "FATAL ERROR: Resulting sum was ....., expexted: ......",
> suggesting that something in the calculation has gone wrong.
>
> On the host, no errors are ever observed!

I found it is offended by commit 5f409e20b (x86/fpu: Defer FPU state
load until return to userspace) and can only be reproduced when
CONFIG_PREEMPT is enabled. Why restore qemu userspace fpu context to
hardware before vmentry in the commit?
https://lkml.org/lkml/2017/11/14/945 Actually I suspect the commit
f775b13eedee2 (x86,kvm: move qemu/guest FPU switching out to vcpu_run)
inaccurately save guest fpu state which in xsave area into the qemu
userspace fpu buffer. However, Rik replied in
https://lkml.org/lkml/2017/11/14/891, "The scheduler will save the
guest fpu context when a vCPU thread is preempted, and restore it when
it is scheduled back in." But I can't find any scheduler codes do
this. In addition, below codes can fix the mprime error warning.
(Still not sure it is correct)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58305cf..18f928e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3306,6 +3306,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)

     kvm_x86_ops->vcpu_load(vcpu, cpu);

+    if (test_thread_flag(TIF_NEED_FPU_LOAD))
+        switch_fpu_return();
+
     /* Apply any externally detected TSC adjustments (due to suspend) */
     if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
         adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
@@ -7990,10 +7993,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
     trace_kvm_entry(vcpu->vcpu_id);
     guest_enter_irqoff();

-    fpregs_assert_state_consistent();
-    if (test_thread_flag(TIF_NEED_FPU_LOAD))
-        switch_fpu_return();
-
     if (unlikely(vcpu->arch.switch_db_regs)) {
         set_debugreg(0, 7);
         set_debugreg(vcpu->arch.eff_db[0], 0);