lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230201194604.11135-4-minipli@grsecurity.net>
Date:   Wed,  1 Feb 2023 20:46:01 +0100
From:   Mathias Krause <minipli@...ecurity.net>
To:     kvm@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org,
        Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Mathias Krause <minipli@...ecurity.net>
Subject: [PATCH v3 3/6] KVM: x86: Do not unload MMU roots when only toggling CR0.WP

There is no need to unload the MMU roots for a direct MMU role when only
CR0.WP has changed -- the paging structures are still valid, only the
permission bitmap needs to be updated.

One heavy user of toggling CR0.WP is grsecurity's KERNEXEC feature to
implement kernel W^X.

The optimization brings a huge performance gain for this case as the
following micro-benchmark running 'ssdd 10 50000' from rt-tests[1] on a
grsecurity L1 VM shows (runtime in seconds, lower is better):

                       legacy     TDP    shadow
kvm.git/queue          11.55s   13.91s    75.2s
kvm.git/queue+patch     7.32s    7.31s    74.6s

For legacy MMU this is ~36% faster, for TTP MMU even ~47% faster. Also
TDP and legacy MMU now both have around the same runtime which vanishes
the need to disable TDP MMU for grsecurity.

Shadow MMU sees no measurable difference and is still slow, as expected.

[1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git

Co-developed-by: Sean Christopherson <seanjc@...gle.com>
Signed-off-by: Mathias Krause <minipli@...ecurity.net>
---
v2: handle the CR0.WP case directly in kvm_post_set_cr0() and only for
the direct MMU role -- Sean

I re-ran the benchmark and it's even faster than with my patch, as the
critical path is now the first one handled and is now inline. Thanks a
lot for the suggestion, Sean!

 arch/x86/kvm/x86.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 508074e47bc0..f09bfc0a3cc1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -902,6 +902,15 @@ EXPORT_SYMBOL_GPL(load_pdptrs);
 
 void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
 {
+	/*
+	 * Toggling just CR0.WP doesn't invalidate page tables per se, only the
+	 * permission bits.
+	 */
+	if (vcpu->arch.mmu->root_role.direct && (cr0 ^ old_cr0) == X86_CR0_WP) {
+		kvm_init_mmu(vcpu);
+		return;
+	}
+
 	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
 		kvm_clear_async_pf_completion_queue(vcpu);
 		kvm_async_pf_hash_reset(vcpu);
-- 
2.39.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ