[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1812051126250.240991@chino.kir.corp.google.com>
Date: Wed, 5 Dec 2018 11:41:31 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: Mel Gorman <mgorman@...hsingularity.net>
cc: Michal Hocko <mhocko@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>, ying.huang@...el.com,
s.priebe@...fihost.ag,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
alex.williamson@...hat.com, lkp@...org, kirill@...temov.name,
Andrew Morton <akpm@...ux-foundation.org>,
zi.yan@...rutgers.edu, Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation
regressions
On Wed, 5 Dec 2018, Mel Gorman wrote:
> > This is a single MADV_HUGEPAGE usecase, there is nothing special about it.
> > It would be the same as if you did mmap(), madvise(MADV_HUGEPAGE), and
> > faulted the memory with a fragmented local node and then measured the
> > remote access latency to the remote hugepage that occurs without setting
> > __GFP_THISNODE. You can also measure the remote allocation latency by
> > fragmenting the entire system and then faulting.
> >
>
> I'll make the same point as before, the form the fragmentation takes
> matters as well as the types of pages that are resident and whether
> they are active or not. It affects the level of work the system does
> as well as the overall success rate of operations (be it reclaim, THP
> allocation, compaction, whatever). This is why a reproduction case that is
> representative of the problem you're facing on the real workload matters
> would have been helpful because then any alternative proposal could have
> taken your workload into account during testing.
>
We know from Andrea's report that compaction is failing, and repeatedly
failing because otherwise we would not need excessive swapping to make it
work. That can mean one of two things: (1) a general low-on-memory
situation that causes us repeatedly to be under watermarks to deem
compaction suitable (isolate_freepages() will be too painful) or (2)
compaction has the memory that it needs but is failing to make a hugepage
available because all pages from a pageblock cannot be migrated.
If (1), perhaps in the presence of an antagonist that is quickly
allocating the memory before compaction can pass watermark checks, further
reclaim is not beneficial: the allocation is becoming too expensive and
there is no guarantee that compaction can find this reclaimed memory in
isolate_freepages().
I chose to duplicate (2) by synthetically introducing fragmentation
(high-order slab, free every other one) locally to test the patch that
does not set __GFP_THISNODE. The result is a remote transparent hugepage,
but we do not even need to get to the point of local compaction for that
fallback to happen. And this is where I measure the 13.9% access latency
regression for the lifetime of the binary as a result of this patch.
If local compaction works the first time, great! But that is not what is
happening in Andrea's report and as a result of not setting __GFP_THISNODE
we are *guaranteed* worse access latency and may encounter even worse
allocation latency if the remote memory is fragmented as well.
So while I'm only testing the functional behavior of the patch itself, I
cannot speak to the nature of the local fragmentation on Andrea's systems.
Powered by blists - more mailing lists