[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zfq1NWzgpR-msYlg@localhost.localdomain>
Date: Wed, 20 Mar 2024 11:06:45 +0100
From: Oscar Salvador <osalvador@...e.de>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>
Cc: akpm@...ux-foundation.org, muchun.song@...ux.dev, david@...hat.com,
	linmiaohe@...wei.com, naoya.horiguchi@....com, mhocko@...nel.org,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/3] mm: hugetlb: make the hugetlb migration strategy
 consistent
On Wed, Mar 06, 2024 at 06:13:27PM +0800, Baolin Wang wrote:
> As discussed in previous thread [1], there is an inconsistency when handing
> hugetlb migration. When handling the migration of freed hugetlb, it prevents
> fallback to other NUMA nodes in alloc_and_dissolve_hugetlb_folio(). However,
> when dealing with in-use hugetlb, it allows fallback to other NUMA nodes in
> alloc_hugetlb_folio_nodemask(), which can break the per-node hugetlb pool
> and might result in unexpected failures when node bound workloads doesn't get
> what is asssumed available.
> 
> To make hugetlb migration strategy more clear, we should list all the scenarios
> of hugetlb migration and analyze whether allocation fallback is permitted:
> 1) Memory offline: will call dissolve_free_huge_pages() to free the freed hugetlb,
> and call do_migrate_range() to migrate the in-use hugetlb. Both can break the
> per-node hugetlb pool, but as this is an explicit offlining operation, no better
> choice. So should allow the hugetlb allocation fallback.
> 2) Memory failure: same as memory offline. Should allow fallback to a different node
> might be the only option to handle it, otherwise the impact of poisoned memory can
> be amplified.
> 3) Longterm pinning: will call migrate_longterm_unpinnable_pages() to migrate in-use
> and not-longterm-pinnable hugetlb, which can break the per-node pool. But we should
> fail to longterm pinning if can not allocate on current node to avoid breaking the
> per-node pool.
> 4) Syscalls (mbind, migrate_pages, move_pages): these are explicit users operation
> to move pages to other nodes, so fallback to other nodes should not be prohibited.
> 5) alloc_contig_range: used by CMA allocation and virtio-mem fake-offline to allocate
> given range of pages. Now the freed hugetlb migration is not allowed to fallback, to
> keep consistency, the in-use hugetlb migration should be also not allowed to fallback.
> 6) alloc_contig_pages: used by kfence, pgtable_debug etc. The strategy should be
> consistent with that of alloc_contig_range().
> 
> Based on the analysis of the various scenarios above, introducing a new helper to
> determine whether fallback is permitted according to the migration reason..
> 
> [1] https://lore.kernel.org/all/6f26ce22d2fcd523418a085f2c588fe0776d46e7.1706794035.git.baolin.wang@linux.alibaba.com/
> Signed-off-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
Reviewed-by: Oscar Salvador <osalvador@...e.de>
> +static inline bool htlb_allow_alloc_fallback(int reason)
> +{
> +	bool allowed_fallback = false;
> +
> +	/*
> +	 * Note: the memory offline, memory failure and migration syscalls will
> +	 * be allowed to fallback to other nodes due to lack of a better chioce,
                                                                         ^
									 choice
-- 
Oscar Salvador
SUSE Labs
Powered by blists - more mailing lists
 
