lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1810151525460.247641@chino.kir.corp.google.com>
Date:   Mon, 15 Oct 2018 15:30:17 -0700 (PDT)
From:   David Rientjes <rientjes@...gle.com>
To:     Andrea Arcangeli <aarcange@...hat.com>
cc:     Michal Hocko <mhocko@...nel.org>, Mel Gorman <mgorman@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Andrea Argangeli <andrea@...nel.org>,
        Zi Yan <zi.yan@...rutgers.edu>,
        Stefan Priebe - Profihost AG <s.priebe@...fihost.ag>,
        "Kirill A. Shutemov" <kirill@...temov.name>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>,
        Stable tree <stable@...r.kernel.org>
Subject: Re: [PATCH 1/2] mm: thp:  relax __GFP_THISNODE for MADV_HUGEPAGE
 mappings

On Wed, 10 Oct 2018, David Rientjes wrote:

> > I think "madvise vs mbind" is more an issue of "no-permission vs
> > permission" required. And if the processes ends up swapping out all
> > other process with their memory already allocated in the node, I think
> > some permission is correct to be required, in which case an mbind
> > looks a better fit. MPOL_PREFERRED also looks a first candidate for
> > investigation as it's already not black and white and allows spillover
> > and may already do the right thing in fact if set on top of
> > MADV_HUGEPAGE.
> > 
> 
> We would never want to thrash the local node for hugepages because there 
> is no guarantee that any swapping is useful.  On COMPACT_SKIPPED due to 
> low memory, we have very clear evidence that pageblocks are already 
> sufficiently fragmented by unmovable pages such that compaction itself, 
> even with abundant free memory, fails to free an entire pageblock due to 
> the allocator's preference to fragment pageblocks of fallback migratetypes 
> over returning remote free memory.
> 
> As I've stated, we do not want to reclaim pointlessly when compaction is 
> unable to access the freed memory or there is no guarantee it can free an 
> entire pageblock.  Doing so allows thrashing of the local node, or remote 
> nodes if __GFP_THISNODE is removed, and the hugepage still cannot be 
> allocated.  If this proposed mbind() that requires permissions is geared 
> to me as the user, I'm afraid the details of what leads to the thrashing 
> are not well understood because I certainly would never use this.
> 

At the risk of beating a dead horse that has already been beaten, what are 
the plans for this patch when the merge window opens?  It would be rather 
unfortunate for us to start incurring a 14% increase in access latency and 
40% increase in fault latency.  Would it be possible to test with my 
patch[*] that does not try reclaim to address the thrashing issue?  If 
that is satisfactory, I don't have a strong preference if it is done with 
a hardcoded pageblock_order and __GFP_NORETRY check or a new 
__GFP_COMPACT_ONLY flag.

I think the second issue of faulting remote thp by removing __GFP_THISNODE 
needs supporting evidence that shows some platforms benefit from this (and 
not with numa=fake on the command line :).

 [*] https://marc.info/?l=linux-kernel&m=153903127717471

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ