[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YWZ0D9r5BOm/8f7d@google.com>
Date: Wed, 13 Oct 2021 14:52:15 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: David Matlack <dmatlack@...gle.com>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Sean Christopherson <seanjc@...gle.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Suleiman Souhlal <suleiman@...gle.com>,
kvm list <kvm@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] KVM: MMU: make PTE_PREFETCH_NUM tunable
On (21/10/12 09:50), David Matlack wrote:
> On Tue, Oct 12, 2021 at 2:16 AM Sergey Senozhatsky
> <senozhatsky@...omium.org> wrote:
> >
> > Turn PTE_PREFETCH_NUM into a module parameter, so that it
> > can be tuned per-VM.
>
> Module parameters do not allow tuning per VM, they effect every VM on
> the machine.
>
> If you want per-VM tuning you could introduce a VM ioctl.
ACK.
> > ---
> > arch/x86/kvm/mmu/mmu.c | 31 ++++++++++++++++++++++---------
>
> Please also update the shadow paging prefetching code in
> arch/x86/kvm/mmu/paging_tmpl.h, unless there is a good reason to
> diverge.
ACK.
> > @@ -732,7 +734,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
> >
> > /* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */
> > r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
> > - 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
> > + 1 + PT64_ROOT_MAX_LEVEL + pte_prefetch_num);
>
> There is a sampling problem. What happens if the user changes
> pte_prefetch_num while a fault is being handled?
Good catch.
> > @@ -2753,20 +2755,29 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
> > struct kvm_mmu_page *sp,
> > u64 *start, u64 *end)
> > {
> > - struct page *pages[PTE_PREFETCH_NUM];
> > + struct page **pages;
> > struct kvm_memory_slot *slot;
> > unsigned int access = sp->role.access;
> > int i, ret;
> > gfn_t gfn;
> >
> > + pages = kmalloc_array(pte_prefetch_num, sizeof(struct page *),
> > + GFP_KERNEL);
>
> This code runs with the MMU lock held. From
> In general we avoid doing any dynamic memory allocation while the MMU
> lock is held. That's why the memory caches exist. You can avoid
> allocating under a lock by allocating the prefetch array when the vCPU
> is first initialized. This would also solve the module parameter
> sampling problem because you can read it once and store it in struct
> kvm_vcpu.
I'll do per-VCPU pre-allocation, thanks. GFP_KERNEL is less of a problem
if we hold read kvm->mmu_lock, but more so if we hold write kvm->mmu_lock.
> > static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
> > @@ -2785,10 +2798,10 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
> >
> > WARN_ON(!sp->role.direct);
> >
> > - i = (sptep - sp->spt) & ~(PTE_PREFETCH_NUM - 1);
> > + i = (sptep - sp->spt) & ~(pte_prefetch_num - 1);
>
> This code assumes pte_prefetch_num is a power of 2, which is now no
> longer guaranteed to be true.
It does. I can test if it's a pow(2) in ioctl
Powered by blists - more mailing lists