[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ed946df-9c6c-9a4d-4be9-2f32809974f7@redhat.com>
Date: Wed, 10 Feb 2021 09:23:59 +0100
From: David Hildenbrand <david@...hat.com>
To: Oscar Salvador <osalvador@...e.de>,
Mike Kravetz <mike.kravetz@...cle.com>
Cc: Muchun Song <songmuchun@...edance.com>,
Michal Hocko <mhocko@...nel.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle
free hugetlb pages
On 08.02.21 11:38, Oscar Salvador wrote:
> Free hugetlb pages are trickier to handle as to in order to guarantee
> no userspace appplication disruption, we need to replace the
> current free hugepage with a new one.
>
> In order to do that, a new function called alloc_and_dissolve_huge_page
> in introduced.
> This function will first try to get a new fresh hugetlb page, and if it
> succeeds, it will dissolve the old one.
>
Thanks for looking into this! Can we move this patch to #1 in the
series? It is the easier case.
I also wonder if we should at least try on the memory unplug path to
keep nr_pages by at least trying to allocate at new one if required, and
printing a warning if that fails (after all, we're messing with
something configured by the admin - "nr_pages"). Note that gigantic
pages are special (below).
> Signed-off-by: Oscar Salvador <osalvador@...e.de>
> ---
> include/linux/hugetlb.h | 6 ++++++
> mm/compaction.c | 11 +++++++++++
> mm/hugetlb.c | 35 +++++++++++++++++++++++++++++++++++
> 3 files changed, 52 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index ebca2ef02212..f81afcb86e89 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -505,6 +505,7 @@ struct huge_bootmem_page {
> struct hstate *hstate;
> };
>
> +bool alloc_and_dissolve_huge_page(struct page *page);
> struct page *alloc_huge_page(struct vm_area_struct *vma,
> unsigned long addr, int avoid_reserve);
> struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> @@ -773,6 +774,11 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
> #else /* CONFIG_HUGETLB_PAGE */
> struct hstate {};
>
> +static inline bool alloc_and_dissolve_huge_page(struct page *page)
> +{
> + return false;
> +}
> +
> static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
> unsigned long addr,
> int avoid_reserve)
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 89cd2e60da29..7969ddc10856 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -952,6 +952,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> low_pfn += compound_nr(page) - 1;
> goto isolate_success_no_list;
> }
> + } else {
} else if (alloc_and_dissolve_huge_page(page))) {
...
> + /*
> + * Free hugetlb page. Allocate a new one and
> + * dissolve this is if succeed.
> + */
> + if (alloc_and_dissolve_huge_page(page)) {
> + unsigned long order = buddy_order_unsafe(page);
> +
> + low_pfn += (1UL << order) - 1;
> + continue;
> + }
Note that there is a very ugly corner case we will have to handle
gracefully (I think also in patch #1):
Assume you allocated a gigantic page (and assume that we are not using
CMA for gigantic pages for simplicity). Assume you want to allocate
another one. alloc_pool_huge_page()->...->alloc_contig_pages() will
stumble over the first allocated page. It will try to
alloc_and_dissolve_huge_page() the existing gigantic page. To do that,
it will alloc_pool_huge_page()->...->alloc_contig_pages() ... and so on.
Bad.
We really don't want to mess with gigantic pages (migrate, dissolve)
while allocating a gigantic page. I think the easiest (and cleanest) way
forward is to not mess (isolate, migrate, dissolve) with gigantic pages
at all.
Gigantic pages are not movable, so they won't be placed on random CMA /
ZONE_MOVABLE.
Some hstate_is_gigantic(h) calls (maybe inside
alloc_and_dissolve_huge_page() ? ) along with a nice comment might be
good enough to avoid having to pass down some kind of alloc_contig
context. I even think that should be handled inside
(the main issue is that in contrast to CMA, plain alloc_contig_pages()
has no memory about which parts were allocated and will simply try
re-allocating what it previously allocated and never freed - which is
usually fine, unless we're dealing with such special cases)
Apart from that, not messing with gigantic pages feels like the right
approach (allocating/migrating gigantic pages is just horribly slow and
most probably not worth it anyway).
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists