[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMZfGtWMSjYS_Xqb8qXfvzsQCZG7Vn2hUxpxiOqLrPXgy80Suw@mail.gmail.com>
Date: Wed, 21 Apr 2021 11:42:31 +0800
From: Muchun Song <songmuchun@...edance.com>
To: Mike Kravetz <mike.kravetz@...cle.com>
Cc: Jonathan Corbet <corbet@....net>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, bp@...en8.de,
X86 ML <x86@...nel.org>, hpa@...or.com,
dave.hansen@...ux.intel.com, luto@...nel.org,
Peter Zijlstra <peterz@...radead.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Andrew Morton <akpm@...ux-foundation.org>, paulmck@...nel.org,
pawan.kumar.gupta@...ux.intel.com,
Randy Dunlap <rdunlap@...radead.org>, oneukum@...e.com,
anshuman.khandual@....com, jroedel@...e.de,
Mina Almasry <almasrymina@...gle.com>,
David Rientjes <rientjes@...gle.com>,
Matthew Wilcox <willy@...radead.org>,
Oscar Salvador <osalvador@...e.de>,
Michal Hocko <mhocko@...e.com>,
"Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>,
David Hildenbrand <david@...hat.com>,
HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@....com>,
Joao Martins <joao.m.martins@...cle.com>,
Xiongchun duan <duanxiongchun@...edance.com>,
fam.zheng@...edance.com, linux-doc@...r.kernel.org,
LKML <linux-kernel@...r.kernel.org>,
Linux Memory Management List <linux-mm@...ck.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [External] Re: [PATCH v20 6/9] mm: hugetlb: alloc the vmemmap
pages associated with each HugeTLB page
On Wed, Apr 21, 2021 at 1:48 AM Mike Kravetz <mike.kravetz@...cle.com> wrote:
>
> On 4/20/21 1:46 AM, Muchun Song wrote:
> > On Tue, Apr 20, 2021 at 7:20 AM Mike Kravetz <mike.kravetz@...cle.com> wrote:
> >>
> >> On 4/15/21 1:40 AM, Muchun Song wrote:
> >>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> >>> index 0abed7e766b8..6e970a7d3480 100644
> >>> --- a/include/linux/hugetlb.h
> >>> +++ b/include/linux/hugetlb.h
> >>> @@ -525,6 +525,7 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> >>> * code knows it has only reference. All other examinations and
> >>> * modifications require hugetlb_lock.
> >>> * HPG_freed - Set when page is on the free lists.
> >>> + * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are freed.
> >>> * Synchronization: hugetlb_lock held for examination and modification.
> >>
> >> I like the per-page flag. In previous versions of the series, you just
> >> checked the free_vmemmap_pages_per_hpage() to determine if vmemmmap
> >> should be allocated. Is there any change in functionality that makes is
> >> necessary to set the flag in each page, or is it mostly for flexibility
> >> going forward?
> >
> > Actually, only the routine of dissolving the page cares whether
> > the page is on the buddy free list when update_and_free_page
> > returns. But we cannot change the return type of the
> > update_and_free_page (e.g. change return type from 'void' to 'int').
> > Why? If the hugepage is freed through a kworker, we cannot
> > know the return value when update_and_free_page returns.
> > So adding a return value seems odd.
> >
> > In the dissolving routine, We can allocate vmemmap pages first,
> > if it is successful, then we can make sure that
> > update_and_free_page can successfully free page. So I need
> > some stuff to mark the page which does not need to allocate
> > vmemmap pages.
> >
> > On the surface, we seem to have a straightforward method
> > to do this.
> >
> > Add a new parameter 'alloc_vmemmap' to update_and_free_page() to
> > indicate that the caller is already allocated the vmemmap pages.
> > update_and_free_page() do not need to allocate. Just like below.
> >
> > void update_and_free_page(struct hstate *h, struct page *page, bool atomic,
> > bool alloc_vmemmap)
> > {
> > if (alloc_vmemmap)
> > // allocate vmemmap pages
> > }
> >
> > But if the page is freed through a kworker. How to pass
> > 'alloc_vmemmap' to the kworker? We can embed this
> > information into the per-page flag. So if we introduce
> > HPG_vmemmap_optimized, the parameter of
> > alloc_vmemmap is also necessary.
> >
> > So it seems that introducing HPG_vmemmap_optimized is
> > a good choice.
>
> Thanks for the explanation!
>
> Agree that the flag is a good choice. How about adding a comment like
> this above the alloc_huge_page_vmemmap call in dissolve_free_huge_page?
>
> /*
> * Normally update_and_free_page will allocate required vmemmmap before
> * freeing the page. update_and_free_page will fail to free the page
> * if it can not allocate required vmemmap. We need to adjust
> * max_huge_pages if the page is not freed. Attempt to allocate
> * vmemmmap here so that we can take appropriate action on failure.
> */
Thanks. I will add this comment.
>
> ...
> >>> +static void add_hugetlb_page(struct hstate *h, struct page *page,
> >>> + bool adjust_surplus)
> >>> +{
> >>
> >> We need to be a bit careful with hugepage specific flags that may be
> >> set. The routine remove_hugetlb_page which is called for 'page' before
> >> this routine will not clear any of the hugepage specific flags. If the
> >> calling path goes through free_huge_page, most but not all flags are
> >> cleared.
> >>
> >> We had a discussion about clearing the page->private field in Oscar's
> >> series. In the case of 'new' pages we can assume page->private is
> >> cleared, but perhaps we should not make that assumption here. Since we
> >> hope to rarely call this routine, it might be safer to do something
> >> like:
> >>
> >> set_page_private(page, 0);
> >> SetHPageVmemmapOptimized(page);
> >
> > Agree. Thanks for your reminder. I will fix this.
> >
> >>
> >>> + int nid = page_to_nid(page);
> >>> +
> >>> + lockdep_assert_held(&hugetlb_lock);
> >>> +
> >>> + INIT_LIST_HEAD(&page->lru);
> >>> + h->nr_huge_pages++;
> >>> + h->nr_huge_pages_node[nid]++;
> >>> +
> >>> + if (adjust_surplus) {
> >>> + h->surplus_huge_pages++;
> >>> + h->surplus_huge_pages_node[nid]++;
> >>> + }
> >>> +
> >>> + set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
> >>> +
> >>> + /*
> >>> + * The refcount can possibly be increased by memory-failure or
> >>> + * soft_offline handlers.
> >>> + */
> >>> + if (likely(put_page_testzero(page))) {
> >>
> >> In the existing code there is no such test. Is the need for the test
> >> because of something introduced in the new code?
> >
> > No.
> >
> >> Or, should this test be in the existing code?
> >
> > Yes. gather_surplus_pages should be fixed. I can fix it
> > in a separate patch.
> >
> > The possible bad scenario:
> >
> > CPU0: CPU1:
> > set_compound_page_dtor(HUGETLB_PAGE_DTOR);
> > memory_failure_hugetlb
> > get_hwpoison_page
> > __get_hwpoison_page
> > get_page_unless_zero
> > put_page_testzero()
> >
> > put_page(page)
> >
> >
> > More details and discussion can refer to:
> >
> > https://lore.kernel.org/linux-doc/CAMZfGtVRSBkKe=tKAKLY8dp_hywotq3xL+EJZNjXuSKt3HK3bQ@mail.gmail.com/
> >
>
> Thanks you! I did not remember that discussion.
>
> It would be helpful to add a separate patch for gather_surplus_pages.
> Otherwise, we have the VM_BUG_ON there and not in add_hugetlb_page.
>
Agree. Will do.
> --
> Mike Kravetz
Powered by blists - more mailing lists