linux-kernel - Re: [PATCH v2 1/9] mm/hugetlb: remove checking hstate_is_gigantic() in return_unused_surplus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YrYMS5sATPzEgUxb@monkey>
Date:   Fri, 24 Jun 2022 12:11:07 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     HORIGUCHI NAOYA(堀口　直也) 
        <naoya.horiguchi@....com>
Cc:     Miaohe Lin <linmiaohe@...wei.com>,
        Muchun Song <songmuchun@...edance.com>,
        Naoya Horiguchi <nao.horiguchi@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>,
        Liu Shixin <liushixin2@...wei.com>,
        Yang Shi <shy828301@...il.com>,
        Oscar Salvador <osalvador@...e.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>
Subject: Re: [PATCH v2 1/9] mm/hugetlb: remove checking hstate_is_gigantic()
 in return_unused_surplus_pages()

On 06/24/22 08:34, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Fri, Jun 24, 2022 at 04:15:19PM +0800, Miaohe Lin wrote:
> > On 2022/6/24 16:03, Muchun Song wrote:
> > > On Fri, Jun 24, 2022 at 10:25:48AM +0800, Miaohe Lin wrote:
> > >> On 2022/6/24 7:51, Naoya Horiguchi wrote:
> > >>> From: Naoya Horiguchi <naoya.horiguchi@....com>
> > >>>
> > >>> I found a weird state of 1GB hugepage pool, caused by the following
> > >>> procedure:
> > >>>
> > >>>   - run a process reserving all free 1GB hugepages,
> > >>>   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
> > >>>     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
> > >>>   - kill the reserving process.
> > >>>
> > >>> , then all the hugepages are free *and* surplus at the same time.
> > >>>
> > >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> > >>>   3
> > >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
> > >>>   3
> > >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
> > >>>   0
> > >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
> > >>>   3
> > >>>
> > >>> This state is resolved by reserving and allocating the pages then
> > >>> freeing them again, so this seems not to result in serious problem.
> > >>> But it's a little surprizing (shrinking pool suddenly fails).
> > >>>
> > >>> This behavior is caused by hstate_is_gigantic() check in
> > >>> return_unused_surplus_pages(). This was introduced so long ago in 2008
> > >>> by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
> > >>> it seems to me that this check is no longer unnecessary. Let's remove it.

Thank you for finding this!!!

> > >>> +++ b/mm/hugetlb.c
> > >>> @@ -2432,10 +2432,6 @@ static void return_unused_surplus_pages(struct hstate *h,
> > >>>  	/* Uncommit the reservation */
> > >>>  	h->resv_huge_pages -= unused_resv_pages;
> > >>>  
> > >>> -	/* Cannot return gigantic pages currently */
> > >>> -	if (hstate_is_gigantic(h))
> > >>> -		goto out;
> > >>> -
> > >>
> > >> IIUC it might be better to do the below check:
> > >> 	/*
> > >> 	 * Cannot return gigantic pages currently if runtime gigantic page
> > >> 	 * allocation is not supported.
> > >> 	 */
> > >> 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> > >> 		goto out;
> > >>
> > > 
> > > The change looks good to me. However, the comments above is unnecessary
> > > since gigantic_page_runtime_supported() is straightforward.
> > 
> > Agree. The comments can be removed.
> 
> Thank you, both. Adding !gigantic_page_runtime_supported without comment
> makes sense to me.

The change above makes sense to me.  However, ...

If we make the change above, will we have the same strange situation described
in the commit message when !gigantic_page_runtime_supported() is true?

IIUC, !gigantic_page_runtime_supported implies that gigantic hugetlb
pages can not be allocated or freed at run time.  They can only be
allocated at boot time.  So, there should NEVER be surplus gigantic
pages if !gigantic_page_runtime_supported().  To avoid this situation,
perhaps we should change set_max_huge_pages as follows (not tested)?

-- 
Mike Kravetz


diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5eabb8009d8a..c78d6c20e6b0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3416,7 +3416,8 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 	 * the user tries to allocate gigantic pages but let the user free the
 	 * boottime allocated gigantic pages.
 	 */
-	if (hstate_is_gigantic(h) && !IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
+	if (hstate_is_gigantic(h) && (!IS_ENABLED(CONFIG_CONTIG_ALLOC) ||
+					!gigantic_page_runtime_supported())) {
 		if (count > persistent_huge_pages(h)) {
 			spin_unlock_irq(&hugetlb_lock);
 			mutex_unlock(&h->resize_lock);
@@ -3464,6 +3465,19 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 			goto out;
 	}
 
+	/*
+	 * We can not decrease gigantic pool size if runtime modification
+	 * is not supported.
+	 */
+	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) {
+		if (count < persistent_huge_pages(h)) {
+			spin_unlock_irq(&hugetlb_lock);
+			mutex_unlock(&h->resize_lock);
+			NODEMASK_FREE(node_alloc_noretry);
+			return -EINVAL;
+		}
+	}
+
 	/*
 	 * Decrease the pool size
 	 * First return free pages to the buddy allocator (being careful