[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <04f2a081-b267-42de-881f-14cb16ab0e3c@redhat.com>
Date: Wed, 19 Feb 2025 09:31:22 +0100
From: David Hildenbrand <david@...hat.com>
To: yangge1116@....com, akpm@...ux-foundation.org
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, stable@...r.kernel.org,
21cnbao@...il.com, baolin.wang@...ux.alibaba.com, muchun.song@...ux.dev,
osalvador@...e.de, liuzixing@...on.cn
Subject: Re: [PATCH V4] mm/hugetlb: wait for hugetlb folios to be freed
On 19.02.25 04:46, yangge1116@....com wrote:
> From: Ge Yang <yangge1116@....com>
>
> Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing
> of huge pages if in non-task context"), which supports deferring the
> freeing of hugetlb pages, the allocation of contiguous memory through
> cma_alloc() may fail probabilistically.
>
> In the CMA allocation process, if it is found that the CMA area is occupied
> by in-use hugetlb folios, these in-use hugetlb folios need to be migrated
> to another location. When there are no available hugetlb folios in the
> free hugetlb pool during the migration of in-use hugetlb folios, new folios
> are allocated from the buddy system. A temporary state is set on the newly
> allocated folio. Upon completion of the hugetlb folio migration, the
> temporary state is transferred from the new folios to the old folios.
> Normally, when the old folios with the temporary state are freed, it is
> directly released back to the buddy system. However, due to the deferred
> freeing of hugetlb pages, the PageBuddy() check fails, ultimately leading
> to the failure of cma_alloc().
>
> Here is a simplified call trace illustrating the process:
> cma_alloc()
> ->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios
> ->unmap_and_move_huge_page()
> ->folio_putback_hugetlb() // Free old folios
> ->test_pages_isolated()
> ->__test_page_isolated_in_pageblock()
> ->PageBuddy(page) // Check if the page is in buddy
>
> To resolve this issue, we have implemented a function named
> wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb
> folios are properly released back to the buddy system after their migration
> is completed. By invoking wait_for_freed_hugetlb_folios() before calling
> PageBuddy(), we ensure that PageBuddy() will succeed.
>
> Fixes: c77c0a8ac4c52 ("mm/hugetlb: defer freeing of huge pages if in non-task context")
> Signed-off-by: Ge Yang <yangge1116@....com>
> Cc: <stable@...r.kernel.org>
> ---
>
> V4:
> - add a check to determine if hpage_freelist is empty suggested by David
>
> V3:
> - adjust code and message suggested by Muchun and David
>
> V2:
> - flush all folios at once suggested by David
>
> include/linux/hugetlb.h | 5 +++++
> mm/hugetlb.c | 8 ++++++++
> mm/page_isolation.c | 10 ++++++++++
> 3 files changed, 23 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 6c6546b..0c54b3a 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m);
>
> int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
> int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
> +void wait_for_freed_hugetlb_folios(void);
> struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> unsigned long addr, bool cow_from_owner);
> struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
> @@ -1092,6 +1093,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn,
> return 0;
> }
>
> +static inline void wait_for_freed_hugetlb_folios(void)
> +{
> +}
> +
> static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> unsigned long addr,
> bool cow_from_owner)
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 30bc34d..8801dbc 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2955,6 +2955,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
> return ret;
> }
>
> +void wait_for_freed_hugetlb_folios(void)
> +{
> + if (llist_empty(&hpage_freelist))
> + return;
> +
> + flush_work(&free_hpage_work);
> +}
> +
> typedef enum {
> /*
> * For either 0/1: we checked the per-vma resv map, and one resv
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 8ed53ee0..b2fc526 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -615,6 +615,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
> int ret;
>
> /*
> + * Due to the deferred freeing of hugetlb folios, the hugepage folios may
> + * not immediately release to the buddy system. This can cause PageBuddy()
> + * to fail in __test_page_isolated_in_pageblock(). To ensure that the
> + * hugetlb folios are properly released back to the buddy system, we
> + * invoke the wait_for_freed_hugetlb_folios() function to wait for the
> + * release to complete.
> + */
> + wait_for_freed_hugetlb_folios();
> +
> + /*
> * Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free
> * pages are not aligned to pageblock_nr_pages.
> * Then we just check migratetype first.
Acked-by: David Hildenbrand <david@...hat.com>
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists