linux-kernel - Re: [RFC PATCH v2 12/47] hugetlb: add hugetlb_hgm_walk and hugetlb_walk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CADrL8HXUo-vTL5vH7=fMVPoE0+epdmGbQT=3FXq7C2gwoPWaAQ@mail.gmail.com>
Date:   Thu, 5 Jan 2023 01:23:15 +0000
From:   James Houghton <jthoughton@...gle.com>
To:     Jane Chu <jane.chu@...cle.com>
Cc:     Mike Kravetz <mike.kravetz@...cle.com>,
        Muchun Song <songmuchun@...edance.com>,
        Peter Xu <peterx@...hat.com>,
        David Hildenbrand <david@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Mina Almasry <almasrymina@...gle.com>,
        "Zach O'Keefe" <zokeefe@...gle.com>,
        Manish Mishra <manish.mishra@...anix.com>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Yang Shi <shy828301@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2 12/47] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step

On Thu, Jan 5, 2023 at 12:58 AM Jane Chu <jane.chu@...cle.com> wrote:
>
> > + * @stop_at_none determines what we do when we encounter an empty PTE. If true,
> > + * we return that PTE. If false and @sz is less than the current PTE's size,
> > + * we make that PTE point to the next level down, going until @sz is the same
> > + * as our current PTE.
> [..]
> > +int hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma,
> > +                  struct hugetlb_pte *hpte, unsigned long addr,
> > +                  unsigned long sz, bool stop_at_none)
> > +{
> [..]
> > +     while (hugetlb_pte_size(hpte) > sz && !ret) {
> > +             pte = huge_ptep_get(hpte->ptep);
> > +             if (!pte_present(pte)) {
> > +                     if (stop_at_none)
> > +                             return 0;
> > +                     if (unlikely(!huge_pte_none(pte)))
> > +                             return -EEXIST;
>
> If 'stop_at_none' means settling down on the just encountered empty PTE,
> should the above two "if" clauses switch order?  I thought Peter has
> raised this question too, but I'm not seeing a response.

A better name for "stop_at_none" would be "dont_allocate"; it will be
changed in the next version. The idea is that "stop_at_none" would
simply do a walk, and the caller will deal with what it finds. If we
can't continue the walk for any reason, just return 0. So in this
case, if we land on a non-present, non-none PTE, we can't continue the
walk, so just return 0.

Another way to justify this order: we want to ensure that calls to
this function with stop_at_none=1 and sz=PAGE_SIZE will never fail,
and that gives us the order that you see. (This requirement is
documented in the comment above the definition of hugetlb_hgm_walk().
This guarantee makes it easier to write code that uses HGM walks.)

> Also here below, the way 'stop_at_none' is used when HGM isn't enabled
> is puzzling.  Could you elaborate please?
>
> > +       if (!hugetlb_hgm_enabled(vma)) {
> > +               if (stop_at_none)
> > +                       return 0;
> > +               return sz == huge_page_size(hstate_vma(vma)) ? 0 : -EINVAL;
> > +       }

This is for the same reason; if "stop_at_none" is provided, we need to
guarantee that this function won't fail. If "stop_at_none" is false
and sz != huge_page_size(), then the caller is attempting to use HGM
without having enabled it, hence -EINVAL.

Both of these bits will be cleaned up with the next version of this series. :)

Thanks!

- James