lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANRm+CznfORBCuap5nCoZij-hnOaGmUV42R4i8Hm3G8v3f-YKQ@mail.gmail.com>
Date:   Mon, 4 Dec 2017 10:15:49 +0800
From:   Wanpeng Li <kernellwp@...il.com>
To:     Rik van Riel <riel@...hat.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>, kvm <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Radim Krcmar <rkrcmar@...hat.com>,
        Christian Borntraeger <borntraeger@...ibm.com>
Subject: Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run

2017-11-14 13:12 GMT+08:00 Rik van Riel <riel@...hat.com>:
> Currently, every time a VCPU is scheduled out, the host kernel will
> first save the guest FPU/xstate context, then load the qemu userspace
> FPU context, only to then immediately save the qemu userspace FPU
> context back to memory. When scheduling in a VCPU, the same extraneous
> FPU loads and saves are done.
>
> This could be avoided by moving from a model where the guest FPU is
> loaded and stored with preemption disabled, to a model where the
> qemu userspace FPU is swapped out for the guest FPU context for
> the duration of the KVM_RUN ioctl.
>
> This is done under the VCPU mutex, which is also taken when other
> tasks inspect the VCPU FPU context, so the code should already be
> safe for this change. That should come as no surprise, given that
> s390 already has this optimization.
>
> No performance changes were detected in quick ping-pong tests on
> my 4 socket system, which is expected since an FPU+xstate load is
> on the order of 0.1us, while ping-ponging between CPUs is on the
> order of 20us, and somewhat noisy.
>
> There may be other tests where performance changes are noticeable.

The kvm/queue has the below splatting:

[  650.866212] Bad FPU state detected at kvm_put_guest_fpu+0x7d/0x210
[kvm], reinitializing FPU registers.
[  650.866232] ------------[ cut here ]------------
[  650.866241] WARNING: CPU: 2 PID: 2583 at arch/x86/mm/extable.c:103
ex_handler_fprestore+0x5f/0x70
[  650.866473]  libahci wmi hid pinctrl_sunrisepoint video pinctrl_intel
[  650.866496] CPU: 2 PID: 2583 Comm: qemu-system-x86 Not tainted 4.14.0+ #7
[  650.866500] Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS
1.4.9 09/12/2016
[  650.866503] task: ffff97a095a28000 task.stack: ffffa71c8585c000
[  650.866509] RIP: 0010:ex_handler_fprestore+0x5f/0x70
[  650.866512] RSP: 0018:ffffa71c8585fc28 EFLAGS: 00010282
[  650.866519] RAX: 000000000000005b RBX: ffffa71c8585fc68 RCX: 0000000000000006
[  650.866522] RDX: 0000000000000000 RSI: ffffffffb4d35333 RDI: 0000000000000282
[  650.866526] RBP: 000000000000000d R08: 00000000fddae359 R09: 0000000000000000
[  650.866529] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  650.866532] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280
[  650.866536] FS:  00007f6f8f22c700(0000) GS:ffff97a09ca00000(0000)
knlGS:0000000000000000
[  650.866540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  650.866543] CR2: 00007f6f993f3000 CR3: 00000003d4aae005 CR4: 00000000003626e0
[  650.866547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  650.866550] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  650.866554] Call Trace:
[  650.866559]  fixup_exception+0x32/0x40
[  650.866567]  do_general_protection+0xa0/0x1b0
[  650.866574]  general_protection+0x22/0x30
[  650.866595] RIP: 0010:kvm_put_guest_fpu+0x7d/0x210 [kvm]
[  650.866599] RSP: 0018:ffffa71c8585fd18 EFLAGS: 00010246
[  650.866605] RAX: 00000000ffffffff RBX: ffff97a095a30000 RCX: 0000000000000001
[  650.866608] RDX: 00000000ffffffff RSI: 00000000f7d5d46a RDI: ffff97a095a30b80
[  650.866611] RBP: 0000000000000000 R08: 00000000fddae359 R09: ffff97a095a28968
[  650.866615] R10: 0000000000000000 R11: 00000000e8d39b88 R12: ffff97a095a31bc0
[  650.866618] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280
[  650.866650]  ? kvm_put_guest_fpu+0x27/0x210 [kvm]
[  650.866670]  kvm_vcpu_reset+0x1be/0x250 [kvm]
[  650.866689]  kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
[  650.866707]  kvm_vm_ioctl+0x31a/0x820 [kvm]
[  650.866712]  ? __lock_acquire+0x809/0x1410
[  650.866721]  ? __lock_acquire+0x809/0x1410
[  650.866734]  do_vfs_ioctl+0x9f/0x6c0
[  650.866743]  ? __fget+0x108/0x1f0
[  650.866752]  SyS_ioctl+0x74/0x80
[  650.866757]  ? do_syscall_64+0xc4/0x3d0
[  650.866764]  do_syscall_64+0x8a/0x3d0
[  650.866769]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  650.866781]  entry_SYSCALL64_slow_path+0x25/0x25
[  650.866785] RIP: 0033:0x7f6f973a0f07
[  650.866788] RSP: 002b:00007f6f8f22b968 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  650.866795] RAX: ffffffffffffffda RBX: 000000000000ae41 RCX: 00007f6f973a0f07
[  650.866798] RDX: 0000000000000000 RSI: 000000000000ae41 RDI: 000000000000000b
[  650.866802] RBP: 0000000000000000 R08: 0000558248e26a40 R09: 000055824b58e280
[  650.866805] R10: 0000558249593f40 R11: 0000000000000246 R12: 000055824b55ec90
[  650.866808] R13: 00007ffd274d79ff R14: 00007f6f8f22c9c0 R15: 000055824b58e280
[  650.867014] ---[ end trace 2c5d6cfaba0ee1b3 ]---

Regards,
Wanpeng Li

>
> Signed-off-by: Rik van Riel <riel@...hat.com>
> Suggested-by: Christian Borntraeger <borntraeger@...ibm.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 13 +++++++++++++
>  arch/x86/kvm/x86.c              | 29 ++++++++++++-----------------
>  2 files changed, 25 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c73e493adf07..92e66685249e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -536,7 +536,20 @@ struct kvm_vcpu_arch {
>         struct kvm_mmu_memory_cache mmu_page_cache;
>         struct kvm_mmu_memory_cache mmu_page_header_cache;
>
> +       /*
> +        * QEMU userspace and the guest each have their own FPU state.
> +        * In vcpu_run, we switch between the user and guest FPU contexts.
> +        * While running a VCPU, the VCPU thread will have the guest FPU
> +        * context.
> +        *
> +        * Note that while the PKRU state lives inside the fpu registers,
> +        * it is switched out separately at VMENTER and VMEXIT time. The
> +        * "guest_fpu" state here contains the guest FPU context, with the
> +        * host PRKU bits.
> +        */
> +       struct fpu user_fpu;
>         struct fpu guest_fpu;
> +
>         u64 xcr0;
>         u64 guest_supported_xcr0;
>         u32 guest_xstate_size;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 03869eb7fcd6..59912b20a830 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2917,7 +2917,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>         srcu_read_unlock(&vcpu->kvm->srcu, idx);
>         pagefault_enable();
>         kvm_x86_ops->vcpu_put(vcpu);
> -       kvm_put_guest_fpu(vcpu);
>         vcpu->arch.last_host_tsc = rdtsc();
>  }
>
> @@ -6908,7 +6907,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>         preempt_disable();
>
>         kvm_x86_ops->prepare_guest_switch(vcpu);
> -       kvm_load_guest_fpu(vcpu);
>
>         /*
>          * Disable IRQs before setting IN_GUEST_MODE.  Posted interrupt
> @@ -7095,6 +7093,8 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>
>         vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>
> +       kvm_load_guest_fpu(vcpu);
> +
>         for (;;) {
>                 if (kvm_vcpu_running(vcpu)) {
>                         r = vcpu_enter_guest(vcpu);
> @@ -7132,6 +7132,8 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>                 }
>         }
>
> +       kvm_put_guest_fpu(vcpu);
> +
>         srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>
>         return r;
> @@ -7663,32 +7665,25 @@ static void fx_init(struct kvm_vcpu *vcpu)
>         vcpu->arch.cr0 |= X86_CR0_ET;
>  }
>
> +/* Swap (qemu) user FPU context for the guest FPU context. */
>  void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>  {
> -       if (vcpu->guest_fpu_loaded)
> -               return;
> -
> -       /*
> -        * Restore all possible states in the guest,
> -        * and assume host would use all available bits.
> -        * Guest xcr0 would be loaded later.
> -        */
> -       vcpu->guest_fpu_loaded = 1;
> -       __kernel_fpu_begin();
> +       preempt_disable();
> +       copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
>         /* PKRU is separately restored in kvm_x86_ops->run.  */
>         __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
>                                 ~XFEATURE_MASK_PKRU);
> +       preempt_enable();
>         trace_kvm_fpu(1);
>  }
>
> +/* When vcpu_run ends, restore user space FPU context. */
>  void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
>  {
> -       if (!vcpu->guest_fpu_loaded)
> -               return;
> -
> -       vcpu->guest_fpu_loaded = 0;
> +       preempt_disable();
>         copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
> -       __kernel_fpu_end();
> +       copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
> +       preempt_enable();
>         ++vcpu->stat.fpu_reload;
>         trace_kvm_fpu(0);
>  }
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ