linux-kernel - Re: [PATCH 2/2] Revert "mm, thp: restore node-local hugepage allocations"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1906211415050.77141@chino.kir.corp.google.com>
Date:   Fri, 21 Jun 2019 14:16:41 -0700 (PDT)
From:   David Rientjes <rientjes@...gle.com>
To:     Andrea Arcangeli <aarcange@...hat.com>
cc:     Michal Hocko <mhocko@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>, Vlastimil Babka <vbabka@...e.cz>,
        Zi Yan <zi.yan@...rutgers.edu>,
        Stefan Priebe - Profihost AG <s.priebe@...fihost.ag>,
        "Kirill A. Shutemov" <kirill@...temov.name>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] Revert "mm, thp: restore node-local hugepage
 allocations"

On Thu, 6 Jun 2019, David Rientjes wrote:

> The idea that I had was snipped from this, however, and it would be nice 
> to get some feedback on it: I've suggested that direct reclaim for the 
> purposes of hugepage allocation on the local node is never worthwhile 
> unless and until memory compaction can both capture that page to use (not 
> rely on the freeing scanner to find it) and that migration of a number of 
> pages would eventually result in the ability to free a pageblock.
> 
> I'm hoping that we can all agree to that because otherwise it leads us 
> down a bad road if reclaim is doing pointless work (freeing scanner can't 
> find it or it gets allocated again before it can find it) or compaction 
> can't make progress as a result of it (even though we can migrate, it 
> still won't free a pageblock).
> 
> In the interim, I think we should suppress direct reclaim entirely for 
> thp allocations, regardless of enabled=always or MADV_HUGEPAGE because it 
> cannot be proven that the reclaim work is beneficial and I believe it 
> results in the swap storms that are being reported.
> 
> Any disagreements so far?
> 
> Furthermore, if we can agree to that, memory compaction when allocating a 
> transparent hugepage fails for different reasons, one of which is because 
> we fail watermark checks because we lack migration targets.  This is 
> normally what leads to direct reclaim.  Compaction is *supposed* to return 
> COMPACT_SKIPPED for this but that's overloaded as well: it happens when we 
> fail extfrag_threshold checks and wheng gfp flags doesn't allow it.  The 
> former matters for thp.
> 
> So my proposed change would be:
>  - give the page allocator a consistent indicator that compaction failed
>    because we are low on memory (make COMPACT_SKIPPED really mean this),
>  - if we get this in the page allocator and we are allocating thp, fail,
>    reclaim is unlikely to help here and is much more likely to be
>    disruptive
>      - we could retry compaction if we haven't scanned all memory and
>        were contended,
>  - if the hugepage allocation fails, have thp check watermarks for order-0 
>    pages without any padding,
>  - if watermarks succeed, fail the thp allocation: we can't allocate
>    because of fragmentation and it's better to return node local memory,
>  - if watermarks fail, a follow up allocation of the pte will likely also
>    fail, so thp retries the allocation with a cleared  __GFP_THISNODE.
> 
> This doesn't sound very invasive and I'll code it up if it will be tested.
> 

Following up on this since there has been no activity in a week, I am 
happy to prototype this.  Andrea, would you be able to test a patch once 
it is ready for you to try?