lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180125095832.GN28465@dhcp22.suse.cz>
Date:   Thu, 25 Jan 2018 10:58:32 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Nitin Gupta <nitin.m.gupta@...cle.com>
Cc:     Nitin Gupta <nitingupta910@...il.com>, steven.sistare@...cle.com,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>, Mel Gorman <mgorman@...e.de>,
        Nadav Amit <namit@...are.com>,
        Minchan Kim <minchan@...nel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vegard Nossum <vegard.nossum@...cle.com>,
        "Levin, Alexander (Sasha Levin)" <alexander.levin@...izon.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        Hillf Danton <hillf.zj@...baba-inc.com>,
        Shaohua Li <shli@...com>,
        Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Rik van Riel <riel@...hat.com>, Jan Kara <jack@...e.cz>,
        Dave Jiang <dave.jiang@...el.com>,
        Jérôme Glisse <jglisse@...hat.com>,
        Matthew Wilcox <willy@...ux.intel.com>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>,
        Hugh Dickins <hughd@...gle.com>, Tobin C Harding <me@...in.cc>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v2] mm: Reduce memory bloat with THP

On Fri 19-01-18 12:59:17, Nitin Gupta wrote:
> On 1/19/18 4:49 AM, Michal Hocko wrote:
> > On Thu 18-01-18 15:33:16, Nitin Gupta wrote:
> >> From: Nitin Gupta <nitin.m.gupta@...cle.com>
> >>
> >> Currently, if the THP enabled policy is "always", or the mode
> >> is "madvise" and a region is marked as MADV_HUGEPAGE, a hugepage
> >> is allocated on a page fault if the pud or pmd is empty.  This
> >> yields the best VA translation performance, but increases memory
> >> consumption if some small page ranges within the huge page are
> >> never accessed.
> > 
> > Yes, this is true but hardly unexpected for MADV_HUGEPAGE or THP always
> > users.
> >  
> 
> Yes, allocating hugepage on first touch is the current behavior for
> above two cases. However, I see issues with this current behavior.
> Firstly, THP=always mode is often too aggressive/wasteful to be useful
> for any realistic workloads. For THP=madvise, users may want to back
> active parts of memory region with hugepages while avoiding aggressive
> hugepage allocation on first touch. Or, they may really want the current
> behavior.

Then they should use THP=never and rely on the khugepaged to compact
madvise regions. This will avoid first touch problem and you can also
control how large portion of the THP has to be mapped already.

> With this patch, users would have the option to pick what behavior they
> want by passing hints to the kernel in the form of MADV_HUGEPAGE and
> MADV_DONTNEED madvise calls.

more on this below
 
[...]
> >> With this change, whenever an application issues MADV_DONTNEED on a
> >> memory region, the region is marked as "space-efficient". For such
> >> regions, a hugepage is not immediately allocated on first write.
> > 
> > Kirill didn't like it in the previous version and I do not like this
> > either. You are adding a very subtle side effect which might completely
> > unexpected. Consider userspace memory allocator which uses MADV_DONTNEED
> > to free up unused memory. Now you have put it out of THP usage
> > basically.
> >
> 
> Userpsace may want a region to be considered by khugepaged while opting
> out of hugepage allocation on first touch. Asking userspace memory
> allocators to have to track and reclaim unused parts of a THP allocated
> hugepage does not seems right, as the kernel can use simple userspace
> hints to avoid allocating extra memory in the first place.

Yes. This is in sync with what I wrote. Allocators shouldn't care and
that is why MADV_DONTNEED with side effect is simply wrong.

> I agree that this patch is adding a subtle side-effect which may take
> some applications by surprise. However, I often see the opposite too:
> for many workloads, disabling THP is the first advise as this aggressive
> allocation of hugepages on first touch is unexpected and is too
> wasteful. For e.g.:

Ohh, absolutely. And that is why we have changed the default in upstream
444eb2a449ef ("mm: thp: set THP defrag by default to madvise and add a
stall-free defrag option")
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ