linux-kernel - Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181029100834.GG32673@dhcp22.suse.cz>
Date:   Mon, 29 Oct 2018 11:08:34 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Balbir Singh <bsingharora@...il.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>, Vlastimil Babka <vbabka@...e.cz>,
        David Rientjes <rientjes@...gle.com>,
        Andrea Argangeli <andrea@...nel.org>,
        Zi Yan <zi.yan@...rutgers.edu>,
        Stefan Priebe - Profihost AG <s.priebe@...fihost.ag>,
        "Kirill A. Shutemov" <kirill@...temov.name>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Stable tree <stable@...r.kernel.org>
Subject: Re: [PATCH 1/2] mm: thp:  relax __GFP_THISNODE for MADV_HUGEPAGE
 mappings

On Mon 29-10-18 20:42:53, Balbir Singh wrote:
> On Mon, Oct 29, 2018 at 10:00:35AM +0100, Michal Hocko wrote:
[...]
> > These hugetlb allocations might be disruptive and that is an expected
> > behavior because this is an explicit requirement from an admin to
> > pre-allocate large pages for the future use. __GFP_RETRY_MAYFAIL just
> > underlines that requirement.
> 
> Yes, but in the absence of a particular node, for example via sysctl
> (as the compaction does), I don't think it is a hard requirement to get
> a page from a particular node.

Again this seems like a deliberate decision. You want your distributions
as even as possible otherwise the NUMA placement will be much less
deterministic. At least that was the case for a long time. If you
have different per-node preferences, just use NUMA aware pre-allocation.

> I agree we need __GFP_RETRY_FAIL, in any
> case the real root cause for me is should_reclaim_continue() which keeps
> the task looping without making forward progress.

This seems like a separate issue which should better be debugged. Please
open a new thread describing the problem and the state of the node.

-- 
Michal Hocko
SUSE Labs