lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkZ-mrhCXoR0ZONjJ7kuAuzWED8Dcjkw6mv812-3uSo_Cg@mail.gmail.com>
Date:   Tue, 26 Apr 2022 12:33:28 -0700
From:   Yosry Ahmed <yosryahmed@...gle.com>
To:     Marc Zyngier <maz@...nel.org>
Cc:     Sean Christopherson <seanjc@...gle.com>,
        Huacai Chen <chenhuacai@...nel.org>,
        Aleksandar Markovic <aleksandar.qemu.devel@...il.com>,
        Anup Patel <anup@...infault.org>,
        Atish Patra <atishp@...shpatra.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        James Morse <james.morse@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Shameer Kolothum <shameerali.kolothum.thodi@...wei.com>,
        Alexandru Elisei <alexandru.elisei@....com>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        linux-mips@...r.kernel.org, kvm@...r.kernel.org,
        kvm-riscv@...ts.infradead.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-fsdevel@...r.kernel.org, Linux-MM <linux-mm@...ck.org>,
        cgroups@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        kvmarm@...ts.cs.columbia.edu
Subject: Re: [PATCH v3 4/6] KVM: arm64/mmu: count KVM page table pages in
 pagetable stats

Thanks a lot for taking the time to look at this!

On Tue, Apr 26, 2022 at 8:58 AM Marc Zyngier <maz@...nel.org> wrote:
>
> On Tue, 26 Apr 2022 06:39:02 +0100,
> Yosry Ahmed <yosryahmed@...gle.com> wrote:
> >
> > Count the pages used by KVM in arm64 for page tables in pagetable stats.
> >
> > Account pages allocated for PTEs in pgtable init functions and
> > kvm_set_table_pte().
> >
> > Since most page table pages are freed using put_page(), add a helper
> > function put_pte_page() that checks if this is the last ref for a pte
> > page before putting it, and unaccounts stats accordingly.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@...gle.com>
> > ---
> >  arch/arm64/kernel/image-vars.h |  3 ++
> >  arch/arm64/kvm/hyp/pgtable.c   | 50 +++++++++++++++++++++-------------
> >  2 files changed, 34 insertions(+), 19 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> > index 241c86b67d01..25bf058714f6 100644
> > --- a/arch/arm64/kernel/image-vars.h
> > +++ b/arch/arm64/kernel/image-vars.h
> > @@ -143,6 +143,9 @@ KVM_NVHE_ALIAS(__hyp_rodata_end);
> >  /* pKVM static key */
> >  KVM_NVHE_ALIAS(kvm_protected_mode_initialized);
> >
> > +/* Called by kvm_account_pgtable_pages() to update pagetable stats */
> > +KVM_NVHE_ALIAS(__mod_lruvec_page_state);
>
> This cannot be right. It means that this function will be called
> directly from the EL2 code when in protected mode, and will result in
> extreme fireworks.  There is no way you can call core kernel stuff
> like this from this context.
>
> Please do not add random symbols to this list just for the sake of
> being able to link the kernel.

Excuse my ignorance, this is my first time touching kvm code. Thanks a
lot for pointing this out.

>
> > +
> >  #endif /* CONFIG_KVM */
> >
> >  #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 2cb3867eb7c2..53e13c3313e9 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -152,6 +152,7 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
> >
> >       WARN_ON(kvm_pte_valid(old));
> >       smp_store_release(ptep, pte);
> > +     kvm_account_pgtable_pages((void *)childp, +1);
>
> Why the + sign?

I am following conventions in other existing stat accounting hooks
(e.g. kvm_mod_used_mmu_pages(vcpu->kvm, +1) call in
arch/x86/kvm/mmu/mmu.c), but I can certainly remove it if you think
this is better.

>
> >  }
> >
> >  static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> > @@ -326,6 +327,14 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
> >       return ret;
> >  }
> >
> > +static void put_pte_page(kvm_pte_t *ptep, struct kvm_pgtable_mm_ops *mm_ops)
> > +{
> > +     /* If this is the last page ref, decrement pagetable stats first. */
> > +     if (!mm_ops->page_count || mm_ops->page_count(ptep) == 1)
> > +             kvm_account_pgtable_pages((void *)ptep, -1);
> > +     mm_ops->put_page(ptep);
> > +}
> > +
> >  struct hyp_map_data {
> >       u64                             phys;
> >       kvm_pte_t                       attr;
> > @@ -488,10 +497,10 @@ static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> >
> >       dsb(ish);
> >       isb();
> > -     mm_ops->put_page(ptep);
> > +     put_pte_page(ptep, mm_ops);
> >
> >       if (childp)
> > -             mm_ops->put_page(childp);
> > +             put_pte_page(childp, mm_ops);
> >
> >       return 0;
> >  }
> > @@ -522,6 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
> >       pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
> >       if (!pgt->pgd)
> >               return -ENOMEM;
> > +     kvm_account_pgtable_pages((void *)pgt->pgd, +1);
> >
> >       pgt->ia_bits            = va_bits;
> >       pgt->start_level        = KVM_PGTABLE_MAX_LEVELS - levels;
> > @@ -541,10 +551,10 @@ static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> >       if (!kvm_pte_valid(pte))
> >               return 0;
> >
> > -     mm_ops->put_page(ptep);
> > +     put_pte_page(ptep, mm_ops);
> >
> >       if (kvm_pte_table(pte, level))
> > -             mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> > +             put_pte_page(kvm_pte_follow(pte, mm_ops), mm_ops);
>
> OK, I see the pattern. I don't think this workable as such. I'd rather
> the callbacks themselves (put_page, zalloc_page*) call into the
> accounting code when it makes sense, rather than spreading the
> complexity and having to special case the protected case.
>

This makes sense. I am working on moving calls to
kvm_account_pgtable_pages to callbacks in mmu.c in the next version
(stage2_memcache_zalloc_page, kvm_host_put_page, etc).


> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ