lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 1 Feb 2018 10:46:28 +0000
From:   Mel Gorman <mgorman@...e.de>
To:     "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     Nitin Gupta <nitin.m.gupta@...cle.com>,
        Zi Yan <zi.yan@...rutgers.edu>,
        Michal Hocko <mhocko@...nel.org>,
        Nitin Gupta <nitingupta910@...il.com>,
        steven.sistare@...cle.com,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>, Nadav Amit <namit@...are.com>,
        Minchan Kim <minchan@...nel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vegard Nossum <vegard.nossum@...cle.com>,
        "Levin, Alexander" <alexander.levin@...izon.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        Hillf Danton <hillf.zj@...baba-inc.com>,
        Shaohua Li <shli@...com>,
        Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Rik van Riel <riel@...hat.com>, Jan Kara <jack@...e.cz>,
        Dave Jiang <dave.jiang@...el.com>,
        J?r?me Glisse <jglisse@...hat.com>,
        Matthew Wilcox <willy@...ux.intel.com>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>,
        Hugh Dickins <hughd@...gle.com>, Tobin C Harding <me@...in.cc>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v2] mm: Reduce memory bloat with THP

On Thu, Feb 01, 2018 at 01:27:30PM +0300, Kirill A. Shutemov wrote:
> > It's non-trivial to do this because at minimum a page fault has to check
> > if there is a potential promotion candidate by checking the PTEs around
> > the faulting address searching for a correctly-aligned base page that is
> > already inserted. If there is, then check if the correctly aligned base
> > page for the current faulting address is free and if so use it. It'll
> > also then need to check the remaining PTEs to see if both the promotion
> > threshold has been reached and if so, promote it to a THP (or else teach
> > khugepaged to do an in-place promotion if possible). In other words,
> > implementing the promotion threshold is both hard and it's not free.
> 
> "not free" is understatement.
> 
> Converting PTE page table to PMD would require down_write(mmap_sem).
> Doing it from within page fault path would also mean that we need to drop
> down_read(mmap) we hold, re-aquaire it with down_write(), find the vma again
> and re-validate that nothing changed in meanwhile...
> 
> That's an interesting exercise, but I'm skeptical it would result in anything
> practical.
> 

The details are painful but we're somewhat caught between a rock and a
hard place for workloads that sparsely reference memory and want to avoid
excessive memory usage. Given that the cost will be high, it may need to
dynamically detect what the promotion threshold is -- default high and
reduce it on a per-task basis if promotions are frequent.

Either way, expecting applications to get it right with hints is the road
to hell paved with good intentions. If they were able to get this right,
they would be using prctl(PR_SET_THP_DISABLE) already.

-- 
Mel Gorman
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ