lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANRm+CxN8F_Wvb3w6e0rxXz6AwsnhE_Ugf5coJAMvqrGz0H+-A@mail.gmail.com>
Date:   Mon, 25 Dec 2017 18:08:45 +0800
From:   Wanpeng Li <kernellwp@...il.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     linux-kernel@...r.kernel.org, kvm <kvm@...r.kernel.org>
Subject: Re: [PATCH 0/4] KVM: nVMX: prepare_vmcs02 optimizations

2017-12-25 18:07 GMT+08:00 Wanpeng Li <kernellwp@...il.com>:
> 2017-12-21 20:43 GMT+08:00 Paolo Bonzini <pbonzini@...hat.com>:
>> That's about 800-1000 clock cycles more that can be easily peeled, by
>> saving about 60 VMWRITEs on every exit.
>>
>> My numbers so far have been collected on a Haswell system vs. the
>> Broadwell that Jim used for his KVM Forum talk, and I am now down
>> from 22000 (compared to 18000 that Jim gave as the baseline) to 14000.
>> Also the guest is running 4.14, so it didn't have the XSETBV and DEBUGCTL
>> patches; that removes two ancillary exit to L1, each costing about 1000
>> cycles on my machine).  So we are probably pretty close to VMware's
>> 6500 cycles on Broadwell.
>>
>> After these patches there may still be some low-hanging fruit; the remaining
>> large deltas between non-nested and nested workloads with lots of vmexits are:
>>
>>    4.80%  vmx_set_cr3
>>    4.35%  native_read_msr
>>    3.73%  vmcs_load
>>    3.65%  update_permission_bitmask
>>    2.49%  _raw_spin_lock
>>    2.37%  sync_vmcs12
>>    2.20%  copy_shadow_to_vmcs12
>>    1.19%  kvm_load_guest_fpu
>>
>> There is a large cost associated to resetting the MMU.  Making that smarter
>> could probably be worth a 10-15% improvement; not easy, but actually even
>> more worthwhile than that on SMP nested guests because that's where the
>> spinlock contention comes from.
>>
>> The MSR accesses are probably also interesting, but I haven't tried to see
>> what they are about.  One somewhat crazy idea in that area is to set
>> CR4.FSGSBASE at vcpu_load/sched_in and clear it at vcpu_put/sched_out.
>> Then we could skip the costly setup of the FS/GS/kernelGS base MSRs.
>> However the cost of writes to CR4 might make it less appealing for
>> userspace exits; I haven't benchmarked it.
>>
>> Paolo
>>
>> Paolo Bonzini (4):
>>   KVM: VMX: split list of shadowed VMCS field to a separate file
>>   KVM: nVMX: track dirty state of non-shadowed VMCS fields
>>   KVM: nVMX: move descriptor cache handling to prepare_vmcs02_full
>>   KVM: nVMX: move other simple fields to prepare_vmcs02_full
>>
>>  arch/x86/kvm/vmx.c               | 301 +++++++++++++++++++--------------------
>>  arch/x86/kvm/vmx_shadow_fields.h |  71 +++++++++
>>  2 files changed, 214 insertions(+), 158 deletions(-)
>>  create mode 100644 arch/x86/kvm/vmx_shadow_fields.h
>
> I observe L1(latest kvm/queue) panic and L0(latest kvm/queue)
> calltrace, I'm not sure whether it is caused by this patchset.

It can be reproduced steadily by running kvm-unit-tests in L1.

Regards,
Wanpeng Li

>
> L1:
>
> [  114.941243] BUG: unable to handle kernel paging request at ffffa6e6831dfbbe
> [  114.943423] IP: native_load_gdt+0x0/0x10
> [  114.944249] PGD 42ed2e067 P4D 42ed2e067 PUD 42ed2f067 PMD 3e7fc7067
> PTE 800000040c09d163
> [  114.945911] Oops: 0009 [#1] SMP
> [  114.946615] Modules linked in: kvm_intel kvm irqbypass
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
> binfmt_misc i2c_piix4 aes_x86_64 joydev input_leds crypto_simd
> serio_raw glue_helper cryptd mac_hid parport_pc ppdev lp parport
> autofs4 hid_generic usbhid hid floppy psmouse pata_acpi
> [  114.952293] CPU: 13 PID: 11077 Comm: qemu-system-x86 Not tainted
> 4.15.0-rc3+ #6
> [  114.954108] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
> [  114.957213] RIP: 0010:native_load_gdt+0x0/0x10
> [  114.958360] RSP: 0018:ffffa6e6831dfbb0 EFLAGS: 00010286
> [  114.959868] RAX: 000000000000007f RBX: ffff8a78c9620000 RCX: 00000000c0000102
> [  114.961933] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa6e6831dfbbe
> [  114.964147] RBP: ffffa6e6831dfc00 R08: 000000000000002c R09: 0000000000000001
> [  114.966190] R10: ffffa6e6831dfc18 R11: 0000000000000000 R12: 0000000000000000
> [  114.968008] R13: 0000000000000000 R14: ffff8a78ebe86000 R15: ffff8a78c9620000
> [  114.969456] FS:  00007fd452168700(0000) GS:ffff8a78ef540000(0000)
> knlGS:0000000000000000
> [  114.977682] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  114.979207] CR2: ffffa6e6831dfbbe CR3: 0000000429932005 CR4: 0000000000162ee0
> [  114.980691] Call Trace:
> [  114.981209]  load_fixmap_gdt+0x30/0x40
> [  114.982005]  __vmx_load_host_state.part.82+0xfc/0x190 [kvm_intel]
> [  114.983367]  ? __gfn_to_pfn_memslot+0x2ed/0x3c0 [kvm]
> [  114.984520]  ? vmx_switch_vmcs+0x26/0x40 [kvm_intel]
> [  114.985548]  vmx_switch_vmcs+0x26/0x40 [kvm_intel]
> [  114.986995]  nested_vmx_vmexit+0x86/0x770 [kvm_intel]
> [  114.988640]  ? enter_vmx_non_root_mode+0x720/0x10e0 [kvm_intel]
> [  114.990613]  ? enter_vmx_non_root_mode+0x720/0x10e0 [kvm_intel]
> [  114.992607]  ? vmx_handle_exit+0xb22/0x1530 [kvm_intel]
> [  114.994394]  vmx_handle_exit+0xb22/0x1530 [kvm_intel]
> [  114.996226]  ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel]
> [  114.998113]  ? vmx_vcpu_run+0x3ae/0x4b0 [kvm_intel]
> [  114.999803]  kvm_arch_vcpu_ioctl_run+0x9ed/0x15e0 [kvm]
> [  115.001595]  ? file_update_time+0x60/0x110
> [  115.003001]  ? kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
> [  115.004539]  kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
> [  115.005982]  do_vfs_ioctl+0x9f/0x5e0
> [  115.006981]  ? vfs_write+0x14f/0x1a0
> [  115.007732]  SyS_ioctl+0x74/0x80
> [  115.008393]  entry_SYSCALL_64_fastpath+0x1e/0x81
> [  115.009345] RIP: 0033:0x7fd470412f07
> [  115.010141] RSP: 002b:00007fd452167978 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  115.011742] RAX: ffffffffffffffda RBX: 00007fd4757c8001 RCX: 00007fd470412f07
> [  115.013280] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000e
> [  115.014900] RBP: 0000000000000001 R08: 000055844b5c1650 R09: 0000000000000001
> [  115.016487] R10: 00007fd452167790 R11: 0000000000000246 R12: 0000000000000000
> [  115.017971] R13: 000055844b5abe40 R14: 00007fd4757c7000 R15: 000055844c277530
> [  115.023574] RIP: native_load_gdt+0x0/0x10 RSP: ffffa6e6831dfbb0
> [  115.024812] CR2: ffffa6e6831dfbbe
> [  115.025485] ---[ end trace 3d70820b36036f21 ]---
>
>
> L0:
>
> [  149.013514] WARNING: CPU: 2 PID: 2073 at arch/x86/kvm/vmx.c:6376
> handle_desc+0x2d/0x40 [kvm_intel]
> [  149.013517] Modules linked in: binfmt_misc nls_iso8859_1
> snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic
> snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm
> x86_pkg_temp_thermal intel_powerclamp coretemp crc32_pclmul
> snd_seq_midi snd_seq_midi_event pcbc snd_rawmidi snd_seq aesni_intel
> aes_x86_64 crypto_simd cryptd snd_seq_device joydev glue_helper
> input_leds snd_timer snd mei_me shpchp mei soundcore wmi_bmof lpc_ich
> mac_hid kvm_intel kvm irqbypass parport_pc ppdev lp parport autofs4
> hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper syscopyarea
> sysfillrect sysimgblt fb_sys_fops drm e1000e ahci ptp libahci pps_core
> wmi video
> [  149.013687] CPU: 2 PID: 2073 Comm: qemu-system-x86 Tainted: G
>  W        4.15.0-rc3+ #1
> [  149.013690] Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY,
> BIOS FBKTC1AUS 02/16/2016
> [  149.013696] RIP: 0010:handle_desc+0x2d/0x40 [kvm_intel]
> [  149.013699] RSP: 0018:ffffa2c341a07ca0 EFLAGS: 00010246
> [  149.013705] RAX: ffffffffc04d5160 RBX: 000000000000002f RCX: 0000000000000001
> [  149.013709] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffff95befbb48000
> [  149.013712] RBP: ffffa2c341a07ca0 R08: 000000008a728c6a R09: f1ee3e3400000000
> [  149.013715] R10: 0000000000000000 R11: 0000000000000001 R12: ffff95befbaa0000
> [  149.013718] R13: 0000000000000000 R14: 0000000000000000 R15: ffff95befbb48000
> [  149.013721] FS:  00007fee79ffb700(0000) GS:ffff95bf0d800000(0000)
> knlGS:0000000000000000
> [  149.013724] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  149.013727] CR2: ffffffffffffe000 CR3: 00000003fba1d004 CR4: 00000000001626e0
> [  149.013730] Call Trace:
> [  149.013737]  vmx_handle_exit+0xbd/0xe20 [kvm_intel]
> [  149.013752]  ? kvm_arch_vcpu_ioctl_run+0xcea/0x1c20 [kvm]
> [  149.013773]  kvm_arch_vcpu_ioctl_run+0xd66/0x1c20 [kvm]
> [  149.013800]  kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
> [  149.013813]  ? kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
> [  149.013821]  ? __fget+0xfc/0x210
> [  149.013826]  ? __fget+0xfc/0x210
> [  149.013835]  do_vfs_ioctl+0xa4/0x6a0
> [  149.013840]  ? __fget+0x11d/0x210
> [  149.013850]  SyS_ioctl+0x79/0x90
> [  149.013859]  entry_SYSCALL_64_fastpath+0x1f/0x96
> [  149.013862] RIP: 0033:0x7fee94e26f07
> [  149.013865] RSP: 002b:00007fee79ffa8b8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  149.013871] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fee94e26f07
> [  149.013874] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015
> [  149.013877] RBP: 00005558752c6960 R08: 0000000000000000 R09: 0000000000000001
> [  149.013881] R10: 0000000000000058 R11: 0000000000000246 R12: 0000000000000000
> [  149.013884] R13: 00007fee974f9000 R14: 0000000000000000 R15: 00005558752c6960
> [  149.013900] Code: 44 00 00 f6 87 f1 03 00 00 08 55 48 89 e5 74 1b
> 45 31 c0 31 c9 31 f6 ba 10 00 00 00 e8 2d 0e f8 ff 85 c0 0f 94 c0 0f
> b6 c0 5d c3 <0f> ff eb e1 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00
> 0f 1f
> [  149.014080] ---[ end trace 53c0bffb9d8f6939 ]---
>
> Regards,
> Wanpeng Li

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ