linux-kernel - [PATCH 10/12] DO NOT MERGE: KVM: x86/mmu: Always send !PRESENT faults down the fast path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20220423034752.1161007-11-seanjc@google.com>
Date:   Sat, 23 Apr 2022 03:47:50 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     Sean Christopherson <seanjc@...gle.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, Ben Gardon <bgardon@...gle.com>,
        David Matlack <dmatlack@...gle.com>,
        Venkatesh Srinivas <venkateshs@...gle.com>,
        Chao Peng <chao.p.peng@...ux.intel.com>
Subject: [PATCH 10/12] DO NOT MERGE: KVM: x86/mmu: Always send !PRESENT faults
 down the fast path

Posted for posterity, and to show that it's possible to funnel indirect
page faults down the fast path.

Not-signed-off-by: Sean Christopherson <seanjc@...gle.com>
---
 arch/x86/kvm/mmu/mmu.c | 44 +++++++++++++++++++++---------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 744c06bd7017..7ba88907d032 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3006,26 +3006,25 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
 		return false;
 
 	/*
-	 * #PF can be fast if:
-	 *
-	 * 1. The shadow page table entry is not present and A/D bits are
-	 *    disabled _by KVM_, which could mean that the fault is potentially
-	 *    caused by access tracking (if enabled).  If A/D bits are enabled
-	 *    by KVM, but disabled by L1 for L2, KVM is forced to disable A/D
-	 *    bits for L2 and employ access tracking, but the fast page fault
-	 *    mechanism only supports direct MMUs.
-	 * 2. The shadow page table entry is present, the access is a write,
-	 *    and no reserved bits are set (MMIO SPTEs cannot be "fixed"), i.e.
-	 *    the fault was caused by a write-protection violation.  If the
-	 *    SPTE is MMU-writable (determined later), the fault can be fixed
-	 *    by setting the Writable bit, which can be done out of mmu_lock.
+	 * Unconditionally send !PRESENT page faults (except for emulated MMIO)
+	 * through the fast path.  There are two scenarios where the fast path
+	 * can resolve the fault.  The first is if the fault is spurious, i.e.
+	 * a different vCPU has faulted in the page, which applies to all MMUs.
+	 * The second scenario is if KVM marked the SPTE !PRESENT for access
+	 * tracking (due to lack of EPT A/D bits), in which case KVM can fix
+	 * the fault after logging the access.
 	 */
 	if (!fault->present)
-		return !kvm_ad_enabled();
+		return true;
 
 	/*
-	 * Note, instruction fetches and writes are mutually exclusive, ignore
-	 * the "exec" flag.
+	 * Skip the fast path if the fault is due to a protection violation and
+	 * the access isn't a write.  Write-protection violations can be fixed
+	 * by KVM, e.g. if memory is write-protected for dirty logging, but all
+	 * other protection violations are in the domain of a third party, i.e.
+	 * either the primary MMU or the guest's page tables, and thus are
+	 * extremely unlikely to be resolved by KVM.  Note, instruction fetches
+	 * and writes are mutually exclusive, ignore the "exec" flag.
 	 */
 	return fault->write;
 }
@@ -3041,12 +3040,13 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	/*
 	 * Theoretically we could also set dirty bit (and flush TLB) here in
 	 * order to eliminate unnecessary PML logging. See comments in
-	 * set_spte. But fast_page_fault is very unlikely to happen with PML
-	 * enabled, so we do not do this. This might result in the same GPA
-	 * to be logged in PML buffer again when the write really happens, and
-	 * eventually to be called by mark_page_dirty twice. But it's also no
-	 * harm. This also avoids the TLB flush needed after setting dirty bit
-	 * so non-PML cases won't be impacted.
+	 * set_spte. But a write-protection violation that can be fixed outside
+	 * of mmu_lock is very unlikely to happen with PML enabled, so we don't
+	 * do this. This might result in the same GPA to be logged in the PML
+	 * buffer again when the write really happens, and eventually to be
+	 * sent to mark_page_dirty() twice, but that's a minor performance blip
+	 * and not a function issue.  This also avoids the TLB flush needed
+	 * after setting dirty bit so non-PML cases won't be impacted.
 	 *
 	 * Compare with set_spte where instead shadow_dirty_mask is set.
 	 */
-- 
2.36.0.rc2.479.g8af0fa9b8e-goog