[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1910031243050.88296@chino.kir.corp.google.com>
Date: Thu, 3 Oct 2019 12:52:33 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Vlastimil Babka <vbabka@...e.cz>
cc: Mike Kravetz <mike.kravetz@...cle.com>,
Michal Hocko <mhocko@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mgorman@...e.de>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>
Subject: Re: [rfc] mm, hugetlb: allow hugepage allocations to excessively
reclaim
On Thu, 3 Oct 2019, Vlastimil Babka wrote:
> I think the key differences between Mike's tests and Michal's is this part
> from Mike's mail linked above:
>
> "I 'tested' by simply creating some background activity and then seeing
> how many hugetlb pages could be allocated. Of course, many tries over
> time in a loop."
>
> - "some background activity" might be different than Michal's pre-filling
> of the memory with (clean) page cache
> - "many tries over time in a loop" could mean that kswapd has time to
> reclaim and eventually the new condition for pageblock order will pass
> every few retries, because there's enough memory for compaction and it
> won't return COMPACT_SKIPPED
>
I'll rely on Mike, the hugetlb maintainer, to assess the trade-off between
the potential for encountering very expensive reclaim as Andrea did and
the possibility of being able to allocate additional hugetlb pages at
runtime if we did that expensive reclaim.
For parity with previous kernels it seems reasonable to ask that this
remains unchanged since allocating large amounts of hugetlb pages has
different latency expectations than during page fault. This patch is
available if he'd prefer to go that route.
On the other hand, userspace could achieve similar results if it were to
use vm.drop_caches and explicitly triggered compaction through either
procfs or sysfs before writing to vm.nr_hugepages, and that would be much
faster because it would be done in one go. Users who allocate through the
kernel command line would obviously be unaffected.
Commit b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when
compaction may not succeed") was written with the latter in mind. Mike
subsequently requested that hugetlb not be impacted at least provisionally
until it could be further assessed.
I'd suggest that latter: let the user initiate expensive reclaim and/or
compaction when tuning vm.nr_hugepages and leave no surprises for users
using hugetlb overcommit, but I wouldn't argue against either approach, he
knows the users and expectations of hugetlb far better than I do.
Powered by blists - more mailing lists