linux-kernel - Re: [RFC PATCH 3/3] hugetlbfs: don't retry when pool page allocations start to fail

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190725081350.GD2708@suse.de>
Date:   Thu, 25 Jul 2019 09:13:50 +0100
From:   Mel Gorman <mgorman@...e.de>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Hillf Danton <hdanton@...a.com>,
        Michal Hocko <mhocko@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC PATCH 3/3] hugetlbfs: don't retry when pool page
 allocations start to fail

On Wed, Jul 24, 2019 at 10:50:14AM -0700, Mike Kravetz wrote:
> When allocating hugetlbfs pool pages via /proc/sys/vm/nr_hugepages,
> the pages will be interleaved between all nodes of the system.  If
> nodes are not equal, it is quite possible for one node to fill up
> before the others.  When this happens, the code still attempts to
> allocate pages from the full node.  This results in calls to direct
> reclaim and compaction which slow things down considerably.
> 
> When allocating pool pages, note the state of the previous allocation
> for each node.  If previous allocation failed, do not use the
> aggressive retry algorithm on successive attempts.  The allocation
> will still succeed if there is memory available, but it will not try
> as hard to free up memory.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@...cle.com>

set_max_huge_pages can fail the NODEMASK_ALLOC() alloc which you handle
*but* in the event of an allocation failure this bug can silently recur.
An informational message might be justified in that case in case the
stall should recur with no hint as to why. Technically passing NULL into
NODEMASK_FREE is also safe as kfree (if used for that kernel config) can
handle freeing of a NULL pointer. However, that is cosmetic more than
anything. Whether you decide to change either or not;

Acked-by: Mel Gorman <mgorman@...e.de>

-- 
Mel Gorman
SUSE Labs