linux-kernel - Re: [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200629075510.GA32461@dhcp22.suse.cz>
Date:   Mon, 29 Jun 2020 09:55:10 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Joonsoo Kim <js1304@...il.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>, kernel-team@....com,
        Vlastimil Babka <vbabka@...e.cz>,
        Christoph Hellwig <hch@...radead.org>,
        Roman Gushchin <guro@...com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: Re: [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA
 aware

On Mon 29-06-20 15:27:25, Joonsoo Kim wrote:
[...]
> Solution that Introduces a new
> argument doesn't cause this problem while avoiding CMA regions.

My primary argument is that there is no real reason to treat hugetlb
dequeing somehow differently. So if we simply exclude __GFP_MOVABLE for
_any_ other allocation then this certainly has some drawbacks on the
usable memory for the migration target and it can lead to allocation
failures (especially on movable_node setups where the amount of movable
memory might be really high) and therefore longterm gup failures. And
yes those failures might be premature. But my point is that the behavior
would be _consistent_. So a user wouldn't see random failures for some
types of pages while a success for others.

Let's have a look at this patch. It is simply working that around the
restriction for a very limited types of pages - only hugetlb pages
which have reserves in non-cma movable pools. I would claim that many
setups will simply not have many (if any) spare hugetlb pages in the
pool except for temporary time periods when a workload is (re)starting
because this would be effectively a wasted memory.

The patch is adding a special case flag to claim what the code already
does by memalloc_nocma_{save,restore} API so the information is already
there. Sorry I didn't bring this up earlier but I have completely forgot
about its existence. With that one in place I do agree that dequeing
needs a fixup but that should be something like the following instead.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 57ece74e3aae..c1595b1d36f3 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1092,10 +1092,14 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
 /* Movability of hugepages depends on migration support. */
 static inline gfp_t htlb_alloc_mask(struct hstate *h)
 {
+	gfp_t gfp;
+
 	if (hugepage_movable_supported(h))
-		return GFP_HIGHUSER_MOVABLE;
+		gfp = GFP_HIGHUSER_MOVABLE;
 	else
-		return GFP_HIGHUSER;
+		gfp = GFP_HIGHUSER;
+
+	return current_gfp_context(gfp);
 }

 static struct page *dequeue_huge_page_vma(struct hstate *h,

If we even fix this general issue for other allocations and allow a
better CMA exclusion then it would be implemented consistently for
everybody.

Does this make more sense to you are we still not on the same page wrt
to the actual problem?
-- 
Michal Hocko
SUSE Labs