[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c8e16ca6-b78d-6066-4d5a-bb6be337c93e@oracle.com>
Date: Wed, 31 Jan 2018 17:09:48 -0800
From: Nitin Gupta <nitin.m.gupta@...cle.com>
To: Mel Gorman <mgorman@...e.de>
Cc: Zi Yan <zi.yan@...rutgers.edu>, Michal Hocko <mhocko@...nel.org>,
Nitin Gupta <nitingupta910@...il.com>,
steven.sistare@...cle.com,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...nel.org>, Nadav Amit <namit@...are.com>,
Minchan Kim <minchan@...nel.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
Vegard Nossum <vegard.nossum@...cle.com>,
"Levin, Alexander" <alexander.levin@...izon.com>,
Mike Rapoport <rppt@...ux.vnet.ibm.com>,
Hillf Danton <hillf.zj@...baba-inc.com>,
Shaohua Li <shli@...com>,
Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
Andrea Arcangeli <aarcange@...hat.com>,
David Rientjes <rientjes@...gle.com>,
Rik van Riel <riel@...hat.com>, Jan Kara <jack@...e.cz>,
Dave Jiang <dave.jiang@...el.com>,
J?r?me Glisse <jglisse@...hat.com>,
Matthew Wilcox <willy@...ux.intel.com>,
Ross Zwisler <ross.zwisler@...ux.intel.com>,
Hugh Dickins <hughd@...gle.com>, Tobin C Harding <me@...in.cc>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v2] mm: Reduce memory bloat with THP
On 01/25/2018 01:13 PM, Mel Gorman wrote:
> On Thu, Jan 25, 2018 at 11:41:03AM -0800, Nitin Gupta wrote:
>>>> It's not really about memory scarcity but a more efficient use of it.
>>>> Applications may want hugepage benefits without requiring any changes to
>>>> app code which is what THP is supposed to provide, while still avoiding
>>>> memory bloat.
>>>>
>>> I read these links and find that there are mainly two complains:
>>> 1. THP causes latency spikes, because direction compaction slows down THP allocation,
>>> 2. THP bloats memory footprint when jemalloc uses MADV_DONTNEED to return memory ranges smaller than
>>> THP size and fails because of THP.
>>>
>>> The first complain is not related to this patch.
>>
>> I'm trying to address many different THP issues and memory bloat is
>> first among them.
>
> Expecting userspace to get this right is probably going to go sideways.
> It'll be screwed up and be sub-optimal or have odd semantics for existing
> madvise flags. The fact is that an application may not even know if it's
> going to be sparsely using memory in advance if it's a computation load
> modelling from unknown input data.
>
> I suggest you read the old Talluri paper "Superpassing the TLB Performance
> of Superpages with Less Operating System Support" and pay attention to
> Section 4. There it discusses a page reservation scheme whereby on fault
> a naturally aligned set of base pages are reserved and only one correctly
> placed base page is inserted into the faulting address. It was tied into
> a hypothetical piece of hardware that doesn't exist to give best-effort
> support for superpages so it does not directly help you but the initial
> idea is sound. There are holes in the paper from todays perspective but
> it was written in the 90's.
>
> From there, read "Transparent operating system support for superpages"
> by Navarro, particularly chapter 4 paying attention to the parts where
> it talks about opportunism and promotion threshold.
>
> Superficially, it goes like this
>
> 1. On fault, reserve a THP in the allocator and use one base page that
> is correctly-aligned for the faulting addresses. By correctly-aligned,
> I mean that you use base page whose offset would be naturally contiguous
> if it ever was part of a huge page.
> 2. On subsequent faults, attempt to use a base page that is naturally
> aligned to be a THP
> 3. When a "threshold" of base pages are inserted, allocate the remaining
> pages and promote it to a THP
> 4. If there is memory pressure, spill "reserved" pages into the main
> allocation pool and lose the opportunity to promote (which will need
> khugepaged to recover)
>
> By definition, a promotion threshold of 1 would be the existing scheme
> of allocation a THP on the first fault and some users will want that. It
> also should be the default to avoid unexpected overhead. For workloads
> where memory is being sparsely addressed and the increased overhead of
> THP is unwelcome then the threshold should be tuned higher with a maximum
> possible value of HPAGE_PMD_NR.
>
> It's non-trivial to do this because at minimum a page fault has to check
> if there is a potential promotion candidate by checking the PTEs around
> the faulting address searching for a correctly-aligned base page that is
> already inserted. If there is, then check if the correctly aligned base
> page for the current faulting address is free and if so use it. It'll
> also then need to check the remaining PTEs to see if both the promotion
> threshold has been reached and if so, promote it to a THP (or else teach
> khugepaged to do an in-place promotion if possible). In other words,
> implementing the promotion threshold is both hard and it's not free.
>
> However, if it did exist then the only tunable would be the "promotion
> threshold" and applications would not need any special awareness of their
> address space.
>
I went through both references you mentioned and I really like the
idea of reservation-based hugepage allocation. Navarro also extends
the idea to allow multiple hugepage sizes to be used (as support by
underlying hardware) which was next in order of what I wanted to do in
THP.
So, please ignore this patch and I would work towards implementing
ideas in these papers.
Thanks for the feedback.
Nitin
Powered by blists - more mailing lists