lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1909051400380.217933@chino.kir.corp.google.com>
Date:   Thu, 5 Sep 2019 14:06:28 -0700 (PDT)
From:   David Rientjes <rientjes@...gle.com>
To:     Andrea Arcangeli <aarcange@...hat.com>
cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...e.com>, Mel Gorman <mgorman@...e.de>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch for-5.3 0/4] revert immediate fallback to remote
 hugepages

On Wed, 4 Sep 2019, Andrea Arcangeli wrote:

> > This is an admittedly hacky solution that shouldn't cause anybody to 
> > regress based on NUMA and the semantics of MADV_HUGEPAGE for the past 
> > 4 1/2 years for users whose workload does fit within a socket.
> 
> How can you live with the below if you can't live with 5.3-rc6? Here
> you allocate remote THP if the local THP allocation fails.
> 
> >  			page = __alloc_pages_node(hpage_node,
> >  						gfp | __GFP_THISNODE, order);
> > +
> > +			/*
> > +			 * If hugepage allocations are configured to always
> > +			 * synchronous compact or the vma has been madvised
> > +			 * to prefer hugepage backing, retry allowing remote
> > +			 * memory as well.
> > +			 */
> > +			if (!page && (gfp & __GFP_DIRECT_RECLAIM))
> > +				page = __alloc_pages_node(hpage_node,
> > +						gfp | __GFP_NORETRY, order);
> > +
> 
> You're still going to get THP allocate remote _before_ you have a
> chance to allocate 4k local this way. __GFP_NORETRY won't make any
> difference when there's THP immediately available in the remote nodes.
> 

This is incorrect: the fallback allocation here is only if the initial 
allocation with __GFP_THISNODE fails.  In that case, we were able to 
compact memory to make a local hugepage available without incurring 
excessive swap based on the RFC patch that appears as patch 3 in this 
series.  I very much believe your usecase would benefit from this as well 
(or at least not cause others to regress).  We *want* remote thp if they 
are immediately available but only after we have tried to allocate locally 
from the initial allocation and allowed memory compaction fail first.

Likely there can be discussion around the fourth patch of this series to 
get exactly the right policy.  We can construct it as necessary for 
hugetlbfs to not have any change in behavior, that's simple.  We could 
also check per-zone watermarks in mm/huge_memory.c to determine if local 
memory is low-on-memory and, if so, allow remote allocation.  In that case 
it's certainly better to allocate remotely when we'd be reclaiming locally 
even for fallback native pages.

> I said one good thing about this patch series, that it fixes the swap
> storms. But upstream 5.3 fixes the swap storms too and what you sent
> is not nearly equivalent to the mempolicy that Michal was willing
> to provide you and that we thought you needed to get bigger guarantees
> of getting only local 2m or local 4k pages.
> 

I haven't seen such a patch series, is there a link?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ