lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180125222910.GA4454@redhat.com>
Date:   Thu, 25 Jan 2018 23:29:10 +0100
From:   Andrea Arcangeli <aarcange@...hat.com>
To:     Nitin Gupta <nitin.m.gupta@...cle.com>
Cc:     Zi Yan <zi.yan@...rutgers.edu>, Michal Hocko <mhocko@...nel.org>,
        Nitin Gupta <nitingupta910@...il.com>,
        steven.sistare@...cle.com,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>, Mel Gorman <mgorman@...e.de>,
        Nadav Amit <namit@...are.com>,
        Minchan Kim <minchan@...nel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vegard Nossum <vegard.nossum@...cle.com>,
        "Levin, Alexander" <alexander.levin@...izon.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        Hillf Danton <hillf.zj@...baba-inc.com>,
        Shaohua Li <shli@...com>,
        Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
        David Rientjes <rientjes@...gle.com>,
        Rik van Riel <riel@...hat.com>, Jan Kara <jack@...e.cz>,
        Dave Jiang <dave.jiang@...el.com>,
        Jérôme Glisse <jglisse@...hat.com>,
        Matthew Wilcox <willy@...ux.intel.com>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>,
        Hugh Dickins <hughd@...gle.com>, Tobin C Harding <me@...in.cc>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v2] mm: Reduce memory bloat with THP

On Thu, Jan 25, 2018 at 11:41:03AM -0800, Nitin Gupta wrote:
> I'm trying to address many different THP issues and memory bloat is
> first among them.

You quoted redis in an earlier email, the redis issue has nothing to
do with MADV_DONTNEED.

I can quickly explain the redis issue.

Redis uses fork() to create a readonly copy of the memory to do
snapshotting in the child, while parent still writes to the memory.

THP CoWs in the parent are higher latency than 4k CoWs, they also take
more memory, but that's secondary, in fact the maximum waste of memory
in this model will reach the same worst case (x2) with 4k CoWs
too, no difference.

The problem is the copy-user there, it adds latency and wastes CPU.

Redis can simply use userfaultfd WP mode once it'll be upstream and
then it will use 4k granularity as the granularity of the writeprotect
userfaults is up to userland to decide.

The main benefit is it can avoid the worst case degradation of using
x2 physical memory (disabling THP makes zero difference in that
regard, if storage is very slow x2 physical memory can still be used
if very unlucky), it can throttle the WP writes (anon COW cannot
throttle), it can avoid to fork altogether so it shares the same
pagetables. It can also put the "user-CoWed" pages (in the fault
handler) in front of the write queue, to be written first, using a
ring buffer for the CoWed 4k pages, to keep memory utilization even
lower despite THP stays on at all times for all pages that didn't get
a CoW yet. This will be an optimal snapshot method, much better than
fork() no matter if 4k or THP are backing the memory.

In short MADV_DONTNEED has nothing to do with redis, if mysql gets an
improvement surely you can post a benchmark instead of URLs.

If you want low memory usage at the cost of potentially slower
performance overall you should use transparent_hugepage=madvise .

The cases where THP is not a good tradeoff are genreally related to
lower performance in copy-user or the higher cost of compaction if the
app is only ever doing short lived allocations.

If you post a reproducible benchmark with real life app that gets an
improvement with whatever change you're doing, it'll be possible to
evaluate it.

Thanks,
Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ