lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri,  1 May 2020 21:32:24 -0700
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     Sean Christopherson <sean.j.christopherson@...el.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: [PATCH 00/10] KVM: x86: Misc anti-retpoline optimizations

A smattering of optimizations geared toward avoiding retpolines, though
IMO most of the patches are worthwhile changes irrespective of retpolines.
I can split this up into separate patches if desired, outside of the
obvious combos there are no dependencies.

I was mainly coming at this from a nVMX angle.  On a Haswell, this reduces
the best case latency for a nested VMX roundtrip by ~750 cycles, though I
doubt that much benefit will be realized in practice.


The CR0 and CR4 caching changes in particular are keepers, if only because
they get rid of the awful cache vs. decache naming.  And the CR4 change
can eliminate multiple of VMREADs in the nested VMX paths.

Ditto for the CR3 validation patch; it's arguably more readable and can
avoid a VMREAD.

I like the L1 TSC offset patch because VMX and SVM have inverted logic for
how they track the L1 offset, and because math is hard.

I think I like the TDP level change even if retpolines didn't exist?  The
nested EPT behavior is a bit scary, but it was already scary, this just
makes it more obvious.

The RIP/RSP accessors are definitely obsoleted by static calls, but IMO
the noise is worth the benefit unless static calls are imminent.

The DR6 change is gratuitous, though I do like not having to dive into the
VMX code when I inevitably forget the VMX implementations are nops.

Sean Christopherson (10):
  KVM: x86: Save L1 TSC offset in 'struct kvm_vcpu_arch'
  KVM: nVMX: Unconditionally validate CR3 during nested transitions
  KVM: x86: Make kvm_x86_ops' {g,s}et_dr6() hooks optional
  KVM: x86: Split guts of kvm_update_dr7() to separate helper
  KVM: nVMX: Avoid retpoline when writing DR7 during nested transitions
  KVM: VMX: Add proper cache tracking for CR4
  KVM: VMX: Add proper cache tracking for CR0
  KVM: VMX: Add anti-retpoline accessors for RIP and RSP
  KVM: VMX: Move nested EPT out of kvm_x86_ops.get_tdp_level() hook
  KVM: x86/mmu: Capture TDP level when updating CPUID

 arch/x86/include/asm/kvm_host.h |  7 +--
 arch/x86/kvm/cpuid.c            |  3 +-
 arch/x86/kvm/kvm_cache_regs.h   | 10 +++--
 arch/x86/kvm/mmu/mmu.c          |  6 +--
 arch/x86/kvm/svm/nested.c       |  2 +-
 arch/x86/kvm/svm/svm.c          | 21 ---------
 arch/x86/kvm/vmx/nested.c       | 49 +++++++++++----------
 arch/x86/kvm/vmx/vmx.c          | 77 +++++++++++++--------------------
 arch/x86/kvm/vmx/vmx.h          | 30 +++++++++++++
 arch/x86/kvm/x86.c              | 26 ++++-------
 arch/x86/kvm/x86.h              | 14 ++++++
 11 files changed, 125 insertions(+), 120 deletions(-)

-- 
2.26.0

Powered by blists - more mailing lists