[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJhGHyDWsti6JpYmLhoDfxtaxWC3wFeWzM3NWubqmd=_4ENc3Q@mail.gmail.com>
Date: Thu, 26 Aug 2021 06:49:34 +0800
From: Lai Jiangshan <jiangshanlai+lkml@...il.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
LKML <linux-kernel@...r.kernel.org>,
Sergey Senozhatsky <senozhatsky@...gle.com>,
Ben Gardon <bgardon@...gle.com>
Subject: Re: [PATCH] KVM: x86/mmu: Complete prefetch for trailing SPTEs for
direct, legacy MMU
On Thu, Aug 19, 2021 at 7:57 AM Sean Christopherson <seanjc@...gle.com> wrote:
>
> Make a final call to direct_pte_prefetch_many() if there are "trailing"
> SPTEs to prefetch, i.e. SPTEs for GFNs following the faulting GFN. The
> call to direct_pte_prefetch_many() in the loop only handles the case
> where there are !PRESENT SPTEs preceding a PRESENT SPTE.
>
> E.g. if the faulting GFN is a multiple of 8 (the prefetch size) and all
> SPTEs for the following GFNs are !PRESENT, the loop will terminate with
> "start = sptep+1" and not prefetch any SPTEs.
>
> Prefetching trailing SPTEs as intended can drastically reduce the number
> of guest page faults, e.g. accessing the first byte of every 4kb page in
> a 6gb chunk of virtual memory, in a VM with 8gb of preallocated memory,
> the number of pf_fixed events observed in L0 drops from ~1.75M to <0.27M.
>
> Note, this only affects memory that is backed by 4kb pages as KVM doesn't
> prefetch when installing hugepages. Shadow paging prefetching is not
> affected as it does not batch the prefetches due to the need to process
> the corresponding guest PTE. The TDP MMU is not affected because it
> doesn't have prefetching, yet...
>
> Fixes: 957ed9effd80 ("KVM: MMU: prefetch ptes when intercepted guest #PF")
> Cc: Sergey Senozhatsky <senozhatsky@...gle.com>
> Cc: Ben Gardon <bgardon@...gle.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>
> Cc'd Ben as this highlights a potential gap with the TDP MMU, which lacks
> prefetching of any sort. For large VMs, which are likely backed by
> hugepages anyways, this is a non-issue as the benefits of holding mmu_lock
> for read likely masks the cost of taking more VM-Exits. But VMs with a
> small number of vCPUs won't benefit as much from parallel page faults,
> e.g. there's no benefit at all if there's a single vCPU.
>
> arch/x86/kvm/mmu/mmu.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a272ccbddfa1..daf7df35f788 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2818,11 +2818,13 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
> if (!start)
> continue;
> if (direct_pte_prefetch_many(vcpu, sp, start, spte) < 0)
> - break;
> + return;
> start = NULL;
> } else if (!start)
> start = spte;
> }
> + if (start)
> + direct_pte_prefetch_many(vcpu, sp, start, spte);
> }
Reviewed-by: Lai Jiangshan <jiangshanlai@...il.com>
>
> static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
> --
> 2.33.0.rc1.237.g0d66db33f3-goog
>
Powered by blists - more mailing lists