[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1906211415050.77141@chino.kir.corp.google.com>
Date: Fri, 21 Jun 2019 14:16:41 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Andrea Arcangeli <aarcange@...hat.com>
cc: Michal Hocko <mhocko@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mgorman@...e.de>, Vlastimil Babka <vbabka@...e.cz>,
Zi Yan <zi.yan@...rutgers.edu>,
Stefan Priebe - Profihost AG <s.priebe@...fihost.ag>,
"Kirill A. Shutemov" <kirill@...temov.name>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] Revert "mm, thp: restore node-local hugepage
allocations"
On Thu, 6 Jun 2019, David Rientjes wrote:
> The idea that I had was snipped from this, however, and it would be nice
> to get some feedback on it: I've suggested that direct reclaim for the
> purposes of hugepage allocation on the local node is never worthwhile
> unless and until memory compaction can both capture that page to use (not
> rely on the freeing scanner to find it) and that migration of a number of
> pages would eventually result in the ability to free a pageblock.
>
> I'm hoping that we can all agree to that because otherwise it leads us
> down a bad road if reclaim is doing pointless work (freeing scanner can't
> find it or it gets allocated again before it can find it) or compaction
> can't make progress as a result of it (even though we can migrate, it
> still won't free a pageblock).
>
> In the interim, I think we should suppress direct reclaim entirely for
> thp allocations, regardless of enabled=always or MADV_HUGEPAGE because it
> cannot be proven that the reclaim work is beneficial and I believe it
> results in the swap storms that are being reported.
>
> Any disagreements so far?
>
> Furthermore, if we can agree to that, memory compaction when allocating a
> transparent hugepage fails for different reasons, one of which is because
> we fail watermark checks because we lack migration targets. This is
> normally what leads to direct reclaim. Compaction is *supposed* to return
> COMPACT_SKIPPED for this but that's overloaded as well: it happens when we
> fail extfrag_threshold checks and wheng gfp flags doesn't allow it. The
> former matters for thp.
>
> So my proposed change would be:
> - give the page allocator a consistent indicator that compaction failed
> because we are low on memory (make COMPACT_SKIPPED really mean this),
> - if we get this in the page allocator and we are allocating thp, fail,
> reclaim is unlikely to help here and is much more likely to be
> disruptive
> - we could retry compaction if we haven't scanned all memory and
> were contended,
> - if the hugepage allocation fails, have thp check watermarks for order-0
> pages without any padding,
> - if watermarks succeed, fail the thp allocation: we can't allocate
> because of fragmentation and it's better to return node local memory,
> - if watermarks fail, a follow up allocation of the pte will likely also
> fail, so thp retries the allocation with a cleared __GFP_THISNODE.
>
> This doesn't sound very invasive and I'll code it up if it will be tested.
>
Following up on this since there has been no activity in a week, I am
happy to prototype this. Andrea, would you be able to test a patch once
it is ready for you to try?
Powered by blists - more mailing lists