linux-kernel - Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+2MQi_b98v+vS9J+FnovPe5avNv4WSeD2KoQycdQAG4c3Hxpg@mail.gmail.com>
Date:   Wed, 23 Dec 2020 11:38:54 +0800
From:   Liang Li <liliang324@...il.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Dan Williams <dan.j.williams@...el.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        David Hildenbrand <david@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Michal Hocko <mhocko@...e.com>,
        Liang Li <liliangleo@...iglobal.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        virtualization@...ts.linux-foundation.org, qemu-devel@...gnu.org
Subject: Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

> > +hugepage_reporting_cycle(struct page_reporting_dev_info *prdev,
> > +                        struct hstate *h, unsigned int nid,
> > +                        struct scatterlist *sgl, unsigned int *offset)
> > +{
> > +       struct list_head *list = &h->hugepage_freelists[nid];
> > +       unsigned int page_len = PAGE_SIZE << h->order;
> > +       struct page *page, *next;
> > +       long budget;
> > +       int ret = 0, scan_cnt = 0;
> > +
> > +       /*
> > +        * Perform early check, if free area is empty there is
> > +        * nothing to process so we can skip this free_list.
> > +        */
> > +       if (list_empty(list))
> > +               return ret;
> > +
> > +       spin_lock_irq(&hugetlb_lock);
> > +
> > +       if (huge_page_order(h) > MAX_ORDER)
> > +               budget = HUGEPAGE_REPORTING_CAPACITY;
> > +       else
> > +               budget = HUGEPAGE_REPORTING_CAPACITY * 32;
>
> Wouldn't huge_page_order always be more than MAX_ORDER? Seems like we
> don't even really need budget since this should probably be pulling
> out no more than one hugepage at a time.

I want to disting a 2M page and 1GB page here. The order of 1GB page is greater
than MAX_ORDER while 2M page's order is less than MAX_ORDER.

>
> > +       /* loop through free list adding unreported pages to sg list */
> > +       list_for_each_entry_safe(page, next, list, lru) {
> > +               /* We are going to skip over the reported pages. */
> > +               if (PageReported(page)) {
> > +                       if (++scan_cnt >= MAX_SCAN_NUM) {
> > +                               ret = scan_cnt;
> > +                               break;
> > +                       }
> > +                       continue;
> > +               }
> > +
>
> It would probably have been better to place this set before your new
> set. I don't see your new set necessarily being the best use for page
> reporting.

I haven't really latched on to what you mean, could you explain it again?

>
> > +               /*
> > +                * If we fully consumed our budget then update our
> > +                * state to indicate that we are requesting additional
> > +                * processing and exit this list.
> > +                */
> > +               if (budget < 0) {
> > +                       atomic_set(&prdev->state, PAGE_REPORTING_REQUESTED);
> > +                       next = page;
> > +                       break;
> > +               }
> > +
>
> If budget is only ever going to be 1 then we probably could just look
> at making this the default case for any time we find a non-reported
> page.

and here again.

> > +               /* Attempt to pull page from list and place in scatterlist */
> > +               if (*offset) {
> > +                       isolate_free_huge_page(page, h, nid);
> > +                       /* Add page to scatter list */
> > +                       --(*offset);
> > +                       sg_set_page(&sgl[*offset], page, page_len, 0);
> > +
> > +                       continue;
> > +               }
> > +
>
> There is no point in the continue case if we only have a budget of 1.
> We should probably just tighten up the loop so that all it does is
> search until it finds the 1 page it can pull, pull it, and then return
> it. The scatterlist doesn't serve much purpose and could be reduced to
> just a single entry.

I will think about it more.

> > +static int
> > +hugepage_reporting_process_hstate(struct page_reporting_dev_info *prdev,
> > +                           struct scatterlist *sgl, struct hstate *h)
> > +{
> > +       unsigned int leftover, offset = HUGEPAGE_REPORTING_CAPACITY;
> > +       int ret = 0, nid;
> > +
> > +       for (nid = 0; nid < MAX_NUMNODES; nid++) {
> > +               ret = hugepage_reporting_cycle(prdev, h, nid, sgl, &offset);
> > +
> > +               if (ret < 0)
> > +                       return ret;
> > +       }
> > +
> > +       /* report the leftover pages before going idle */
> > +       leftover = HUGEPAGE_REPORTING_CAPACITY - offset;
> > +       if (leftover) {
> > +               sgl = &sgl[offset];
> > +               ret = prdev->report(prdev, sgl, leftover);
> > +
> > +               /* flush any remaining pages out from the last report */
> > +               spin_lock_irq(&hugetlb_lock);
> > +               hugepage_reporting_drain(prdev, h, sgl, leftover, !ret);
> > +               spin_unlock_irq(&hugetlb_lock);
> > +       }
> > +
> > +       return ret;
> > +}
> > +
>
> If HUGEPAGE_REPORTING_CAPACITY is 1 it would make more sense to
> rewrite this code to just optimize for a find and process a page
> approach rather than trying to batch pages.

Yes, I will make a change. Thanks for your comments!

Liang