linux-kernel - Re: [rfc] mm, hugetlb: allow hugepage allocations to excessively reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1910031243050.88296@chino.kir.corp.google.com>
Date:   Thu, 3 Oct 2019 12:52:33 -0700 (PDT)
From:   David Rientjes <rientjes@...gle.com>
To:     Vlastimil Babka <vbabka@...e.cz>
cc:     Mike Kravetz <mike.kravetz@...cle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>
Subject: Re: [rfc] mm, hugetlb: allow hugepage allocations to excessively
 reclaim

On Thu, 3 Oct 2019, Vlastimil Babka wrote:

> I think the key differences between Mike's tests and Michal's is this part
> from Mike's mail linked above:
> 
> "I 'tested' by simply creating some background activity and then seeing
> how many hugetlb pages could be allocated. Of course, many tries over
> time in a loop."
> 
> - "some background activity" might be different than Michal's pre-filling
>   of the memory with (clean) page cache
> - "many tries over time in a loop" could mean that kswapd has time to 
>   reclaim and eventually the new condition for pageblock order will pass
>   every few retries, because there's enough memory for compaction and it
>   won't return COMPACT_SKIPPED
> 

I'll rely on Mike, the hugetlb maintainer, to assess the trade-off between 
the potential for encountering very expensive reclaim as Andrea did and 
the possibility of being able to allocate additional hugetlb pages at 
runtime if we did that expensive reclaim.

For parity with previous kernels it seems reasonable to ask that this 
remains unchanged since allocating large amounts of hugetlb pages has 
different latency expectations than during page fault.  This patch is 
available if he'd prefer to go that route.

On the other hand, userspace could achieve similar results if it were to 
use vm.drop_caches and explicitly triggered compaction through either 
procfs or sysfs before writing to vm.nr_hugepages, and that would be much 
faster because it would be done in one go.  Users who allocate through the 
kernel command line would obviously be unaffected.

Commit b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when 
compaction may not succeed") was written with the latter in mind.  Mike 
subsequently requested that hugetlb not be impacted at least provisionally 
until it could be further assessed.

I'd suggest that latter: let the user initiate expensive reclaim and/or 
compaction when tuning vm.nr_hugepages and leave no surprises for users 
using hugetlb overcommit, but I wouldn't argue against either approach, he 
knows the users and expectations of hugetlb far better than I do.