lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230322013731.102955-1-minipli@grsecurity.net>
Date:   Wed, 22 Mar 2023 02:37:25 +0100
From:   Mathias Krause <minipli@...ecurity.net>
To:     kvm@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org,
        Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Mathias Krause <minipli@...ecurity.net>
Subject: [PATCH v4 0/6] KVM: MMU: performance tweaks for heavy CR0.WP users

v3: https://lore.kernel.org/kvm/20230201194604.11135-1-minipli@grsecurity.net/

This series is the fourth iteration of resurrecting the missing pieces of
Paolo's previous attempt[1] to avoid needless MMU roots unloading.

It's incorporating Sean's feedback to v3 and rebased on top of
kvm-x86/next, namely commit d8708b80fa0e ("KVM: Change return type of
kvm_arch_vm_ioctl() to "int"").

The performance gap between TDP and legacy MMU is still existent,
especially noticeable under grsecurity which implements kernel W^X by
toggling CR0.WP, which happens very frequently. This series tries to fix
this needless performance loss.

Patch 1 is a v3 of [3], addressing Sean's feedback.

Patch 2 implements Sean's feedback[2] to Paolo's original approach and
skips unloading the MMU roots for CR0.WP toggling under TDP.

Patch 3 further micro-optimizes this for non-paging guests -- anyone still
running MS Singularity? ;)

Sean was suggesting another change on top of v2 of this series (and then
again a more elaborate version on top of v3), to skip intercepting CR0.WP
writes completely for VMX[4]. That turned out to be yet another
performance boost and is implemented in patch 6.

Patches 2 and 6 are the most important ones, as they bring the big
performance gains.

I used 'ssdd 10 50000' from rt-tests[5] as a micro-benchmark, running on a
grsecurity L1 VM. Below table shows the results (runtime in seconds, lower
is better):

                              legacy     TDP    shadow
    kvm-x86/next@...08b        8.43s    9.45s    70.3s
    + patches 1-3              5.39s    5.63s    70.2s
    + patches 4-6              3.51s    3.47s    67.8s

I re-ran the benchmarks (again) and the numbers changed a little from the
ones in v3. We got a better baseline which is likely caused by the rebase
to a v6.3-rc2 based tree (was v6.2-rc3 based before).

Patches 1, 4 and 5 didn't change from v3, beside minor changelog tweaks.

Patches 2 and 6 have been rewritten.

Patch 3 is new to this series.

Bonus rant^Wbug report:

Actually re-running the benchmarks took me a while because my VM was
constantly crashing on me with a #GP during scheduling. Looking a little
closer, I noticed it was for a IA32_PRED_CMD MSR write which was puzzling,
as the VM's kernel didn't change for my tests (built it more than a month
ago!), so the old test runs should have triggered that code path (and #GP)
too! Digging through some kernel code let me see it's all tied to the x86
feature flag X86_FEATURE_USE_IBPB which gets set when X86_FEATURE_IBPB is,
i.e. the CPU supports IBPB.

*head-scratching pause* 

Foolish me probably did a system update of the host and got a microcode
update that added IBPB support to my CPU. Yayyy... NOT! As this implies
announcing IBPB support to the VM as well and, in turn, makes the guest
kernel try to use it, I'm doomed to hit that bug. *bummer*

Something must not be right with KVM / QEMU / the kernel, as this guest
behaviour clearly shouldn't cause KVM to inject a #GP into the guest.

The bugging call chain in the guest kernel is:
  -> switch_mm_irqs_off()
     -> cond_mitigation()
        -> indirect_branch_prediction_barrier()
           -> alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB))

So far I'm working around this by passing 'clearcpuid=ibpb' on the kernel
commandline. But this should probably be fixed, that's why I'm mentioning
it. It's just too late here already to debug this further today. Well, for
some definition of "today."


Thanks,
Mathias

[1] https://lore.kernel.org/kvm/20220217210340.312449-1-pbonzini@redhat.com/
[2] https://lore.kernel.org/kvm/YhATewkkO%2Fl4P9UN@google.com/
[3] https://lore.kernel.org/kvm/YhAB1d1%2FnQbx6yvk@google.com/
[4] https://lore.kernel.org/kvm/Y8cTMnyBzNdO5dY3@google.com/
[5] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git


Mathias Krause (5):
  KVM: x86: Do not unload MMU roots when only toggling CR0.WP with TDP
    enabled
  KVM: x86: Ignore CR0.WP toggles in non-paging mode
  KVM: x86: Make use of kvm_read_cr*_bits() when testing bits
  KVM: x86/mmu: Fix comment typo
  KVM: VMX: Make CR0.WP a guest owned bit

Paolo Bonzini (1):
  KVM: x86/mmu: Avoid indirect call for get_cr3

 arch/x86/kvm/kvm_cache_regs.h  |  2 +-
 arch/x86/kvm/mmu/mmu.c         | 31 ++++++++++++++++++++-----------
 arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
 arch/x86/kvm/mmu/spte.c        |  2 +-
 arch/x86/kvm/pmu.c             |  4 ++--
 arch/x86/kvm/vmx/nested.c      |  4 ++--
 arch/x86/kvm/vmx/vmx.c         |  6 +++---
 arch/x86/kvm/vmx/vmx.h         | 18 ++++++++++++++++++
 arch/x86/kvm/x86.c             | 18 ++++++++++++++++++
 9 files changed, 66 insertions(+), 21 deletions(-)

-- 
2.39.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ