linux-kernel - Re: MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1812061544360.162675@chino.kir.corp.google.com>
Date:   Thu, 6 Dec 2018 15:49:04 -0800 (PST)
From:   David Rientjes <rientjes@...gle.com>
To:     Michal Hocko <mhocko@...nel.org>
cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        mgorman@...hsingularity.net, Vlastimil Babka <vbabka@...e.cz>,
        ying.huang@...el.com, s.priebe@...fihost.ag,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        alex.williamson@...hat.com, lkp@...org, kirill@...temov.name,
        Andrew Morton <akpm@...ux-foundation.org>,
        zi.yan@...rutgers.edu
Subject: Re: MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891:
 vm-scalability.throughput -61.3% regression)

On Thu, 6 Dec 2018, Michal Hocko wrote:

> MADV_HUGEPAGE changes the picture because the caller expressed a need
> for THP and is willing to go extra mile to get it. That involves
> allocation latency and as of now also a potential remote access. We do
> not have complete agreement on the later but the prevailing argument is
> that any strong NUMA locality is just reinventing node-reclaim story
> again or makes THP success rate down the toilet (to quote Mel). I agree
> that we do not want to fallback to a remote node overeagerly. I believe
> that something like the below would be sensible
> 	1) THP on a local node with compaction not giving up too early
> 	2) THP on a remote node in NOWAIT mode - so no direct
> 	   compaction/reclaim (trigger kswapd/kcompactd only for
> 	   defrag=defer+madvise)
> 	3) fallback to the base page allocation
> 

I disagree that MADV_HUGEPAGE should take on any new semantic that 
overrides the preference of node local memory for a hugepage, which is the 
nearly four year behavior.  The order of MADV_HUGEPAGE preferences listed 
above would cause current users to regress who rely on local small page 
fallback rather than remote hugepages because the access latency is much 
better.  I think the preference of remote hugepages over local small pages 
needs to be expressed differently to prevent regression.