[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d3e77b2c-2164-743d-4f88-527091790006@oracle.com>
Date: Fri, 15 Dec 2017 23:04:03 -0800
From: Nitin Gupta <nitin.m.gupta@...cle.com>
To: "Kirill A. Shutemov" <kirill@...temov.name>
Cc: linux-mm@...ck.org, steven.sistare@...cle.com,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...nel.org>, Mel Gorman <mgorman@...e.de>,
Nadav Amit <namit@...are.com>,
Minchan Kim <minchan@...nel.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
Vegard Nossum <vegard.nossum@...cle.com>,
"Levin, Alexander (Sasha Levin)" <alexander.levin@...izon.com>,
Michal Hocko <mhocko@...e.com>,
David Rientjes <rientjes@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
SeongJae Park <sj38.park@...il.com>, Shaohua Li <shli@...com>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Mike Rapoport <rppt@...ux.vnet.ibm.com>,
Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
Rik van Riel <riel@...hat.com>,
Ross Zwisler <ross.zwisler@...ux.intel.com>,
Jan Kara <jack@...e.cz>, Dave Jiang <dave.jiang@...el.com>,
Jérôme Glisse <jglisse@...hat.com>,
Matthew Wilcox <willy@...ux.intel.com>,
Hugh Dickins <hughd@...gle.com>, Tobin C Harding <me@...in.cc>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm: Reduce memory bloat with THP
On 12/15/17 2:00 AM, Kirill A. Shutemov wrote:
> On Thu, Dec 14, 2017 at 05:28:52PM -0800, Nitin Gupta wrote:
>> Currently, if the THP enabled policy is "always", or the mode
>> is "madvise" and a region is marked as MADV_HUGEPAGE, a hugepage
>> is allocated on a page fault if the pud or pmd is empty. This
>> yields the best VA translation performance, but increases memory
>> consumption if some small page ranges within the huge page are
>> never accessed.
>>
>> An alternate behavior for such page faults is to install a
>> hugepage only when a region is actually found to be (almost)
>> fully mapped and active. This is a compromise between
>> translation performance and memory consumption. Currently there
>> is no way for an application to choose this compromise for the
>> page fault conditions above.
>>
>> With this change, when an application issues MADV_DONTNEED on a
>> memory region, the region is marked as "space-efficient". For
>> such regions, a hugepage is not immediately allocated on first
>> write. Instead, it is left to the khugepaged thread to do
>> delayed hugepage promotion depending on whether the region is
>> actually mapped and active. When application issues
>> MADV_HUGEPAGE, the region is marked again as non-space-efficient
>> wherein hugepage is allocated on first touch.
>
> I think this would be NAK. At least in this form.
>
> What performance testing have you done? Any numbers?
>
I wrote a throw-away code which mmaps 128G area and writes to a random
address in a loop. Together with writes, madvise(MADV_DONTNEED) are
issued at another random addresses. Writes are issued with 70%
probability and DONTNEED with 30%. With this test, I'm trying to emulate
workload of a large in-memory hash-table.
With the patch, I see that memory bloat is much less severe.
I've uploaded the test program with the memory usage plot here:
https://gist.github.com/nitingupta910/42ddf969e17556d74a14fbd84640ddb3
THP was set to 'always' mode in both cases but the result would be the
same if madvise mode was used instead.
> Making whole vma "space_efficient" just because somebody freed one page
> from it is just wrong. And there's no way back after this.
>
I'm using MADV_DONTNEED as a hint that although user wants to
transparently use hugepages but at the same time wants to be more
conservative with respect to memory usage. If a MADV_HUGEPAGE is issued
for a VMA range after any DONTNEEDs then the space_efficient bit is
again cleared, so we revert back to allocating hugepage on fault on
empty pud/pmd.
>>
>> Orabug: 26910556
>
> Wat?
>
It's oracle internal identifier used to track this work.
Thanks,
Nitin
Powered by blists - more mailing lists