lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 25 Sep 2019 09:08:17 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     David Rientjes <rientjes@...gle.com>
Cc:     Andrea Arcangeli <aarcange@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>, Vlastimil Babka <vbabka@...e.cz>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch for-5.3 0/4] revert immediate fallback to remote hugepages

Let me revive this thread as there was no follow up.

On Mon 09-09-19 21:30:20, Michal Hocko wrote:
[...]
> I believe it would be the best to start by explaining why we do not see
> the same problem with order-0 requests. We do not enter the slow path
> and thus the memory reclaim if there is any other node to pass through
> watermakr as well right? So essentially we are relying on kswapd to keep
> nodes balanced so that allocation request can be satisfied from a local
> node. We do have kcompactd to do background compaction. Why do we want
> to rely on the direct compaction instead? What is the fundamental
> difference?

I am especially interested about this part. The more I think about this
the more I am convinced that the underlying problem really is in the pre
mature fallback in the fast path. Does the almost-patch below helps your
workload? It effectively reduces the fast path for higher order
allocations to the local/requested node. The justification is that
watermark check might be too strict for those requests as it is primary
order-0 oriented. Low watermark target simply has no meaning for the
higher order requests AFAIU. The min-low gap is giving kswapd a chance
to balance and be more local node friendly while we do not have anything
like that in compaction.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ff5484fdbdf9..09036cf55fca 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4685,7 +4685,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 {
 	struct page *page;
 	unsigned int alloc_flags = ALLOC_WMARK_LOW;
-	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
+	gfp_t fastpath_mask, alloc_mask; /* The gfp_t that was actually used for allocation */
 	struct alloc_context ac = { };
 
 	/*
@@ -4698,7 +4698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 	}
 
 	gfp_mask &= gfp_allowed_mask;
-	alloc_mask = gfp_mask;
+	fastpath_mask = alloc_mask = gfp_mask;
 	if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
 		return NULL;
 
@@ -4710,8 +4710,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 	 */
 	alloc_flags |= alloc_flags_nofragment(ac.preferred_zoneref->zone, gfp_mask);
 
-	/* First allocation attempt */
-	page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
+	/*
+	 * First allocation attempt. If we have a high order allocation then do not fall
+	 * back to a remote node just based on the watermark check on the requested node
+	 * because compaction might easily free up a requested order and then it would be
+	 * better to simply go to the slow path.
+	 * TODO: kcompactd should help here but nobody has woken it up unless we hit the
+	 * slow path so we might need some tuning there as well.
+	 */
+	if (order && (gfp_mask & __GFP_DIRECT_RECLAIM))
+		fastpath_mask |= __GFP_THISNODE;
+	page = get_page_from_freelist(fastpath_mask, order, alloc_flags, &ac);
 	if (likely(page))
 		goto out;
 
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ