[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2afd12e0-60cd-0f9e-99a8-8ded09644504@redhat.com>
Date: Fri, 12 Feb 2021 15:15:08 +0100
From: David Hildenbrand <david@...hat.com>
To: Mike Kravetz <mike.kravetz@...cle.com>,
Muchun Song <songmuchun@...edance.com>, corbet@....net,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, x86@...nel.org,
hpa@...or.com, dave.hansen@...ux.intel.com, luto@...nel.org,
peterz@...radead.org, viro@...iv.linux.org.uk,
akpm@...ux-foundation.org, paulmck@...nel.org,
mchehab+huawei@...nel.org, pawan.kumar.gupta@...ux.intel.com,
rdunlap@...radead.org, oneukum@...e.com, anshuman.khandual@....com,
jroedel@...e.de, almasrymina@...gle.com, rientjes@...gle.com,
willy@...radead.org, osalvador@...e.de, mhocko@...e.com,
song.bao.hua@...ilicon.com, naoya.horiguchi@....com,
joao.m.martins@...cle.com
Cc: duanxiongchun@...edance.com, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap pages associated
with each HugeTLB page
On 11.02.21 19:05, Mike Kravetz wrote:
> On 2/8/21 12:50 AM, Muchun Song wrote:
>> When we free a HugeTLB page to the buddy allocator, we should allocate the
>> vmemmap pages associated with it. But we may cannot allocate vmemmap pages
>> when the system is under memory pressure, in this case, we just refuse to
>> free the HugeTLB page instead of looping forever trying to allocate the
>> pages.
>>
>> Signed-off-by: Muchun Song <songmuchun@...edance.com>
>> ---
>> include/linux/mm.h | 2 ++
>> mm/hugetlb.c | 19 ++++++++++++-
>> mm/hugetlb_vmemmap.c | 30 +++++++++++++++++++++
>> mm/hugetlb_vmemmap.h | 6 +++++
>> mm/sparse-vmemmap.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 5 files changed, 130 insertions(+), 2 deletions(-)
>
> Muchun has done a great job simplifying this patch series and addressing
> issues as they are brought up. This patch addresses the issue which seems
> to be the biggest stumbling block to this series. The need to allocate
> vmemmap pages to dissolve a hugetlb page to the buddy allocator. The way
> it is addressed in this patch is to simply fail to dissolve the hugetlb
> page if the vmmemmap pages can not be allocated. IMO, this is an 'acceptable'
> strategy. If we find ourselves in this situation then we are likely to be
> hitting other corner cases in the system. I wish there was a perfect way
> to address this issue, but we have been unable to come up with one.
>
> There was a decent discussion about this is a previous version of the
> series starting here:
> https://lore.kernel.org/linux-mm/20210126092942.GA10602@linux/
> In this thread various other options were suggested and discussed.
>
> I would like to come to some agreement on an acceptable way to handle this
> specific issue. IMO, it makes little sense to continue refining other
> parts of this series if we can not figure out how to move forward on this
> issue.
>
> It would be great if David H, David R and Michal could share their opinions
> on this. No need to review details the code yet (unless you want), but
> let's start a discussion on how to move past this issue if we can.
So a summary from my side:
We might fail freeing a huge page at any point in time iff we are low on
kernel (!CMA, !ZONE_MOVABLE) memory. While we could play games with
allocating the vmemmap from a huge page itself in some cases (e.g.,
!CMA, !ZONE_MOVABLE), simply retrying is way easier and we don't turn
the huge page forever unusable.
Corner cases might be having many huge pages in ZONE_MOVABLE, freeing
them all at once and eating up a lot of kernel memory. But then, the
same setup would already be problematic nowadays where we simply always
consume that kernel memory for the vmemmap.
I think this problem only really becomes visible in corner cases. And
someone actively has to enable new behavior.
1. Failing to free a huge page triggered by the user (decrease nr_pages):
Bad luck. Try again later.
2. Failing to free a surplus huge page when freed by the application:
Bad luck. But who will try again later?
3. Failing to dissolve a free huge page on ZONE_MOVABLE via offline_pages()
This is a bit unfortunate if we have plenty of ZONE_MOVABLE memory but
are low on kernel memory. For example, migration of huge pages would
still work, however, dissolving the free page does not work. I'd say
this is a corner cases. When the system is that much under memory
pressure, offlining/unplug can be expected to fail.
4. Failing to dissolve a huge page on CMA/ZONE_MOVABLE via
alloc_contig_range() - once we have that handling in place. Mainly
affects CMA and virtio-mem.
Similar to 3. However, we didn't even take care of huge pages *at all*
for now (neither migrate nor dissolve). So actually don't make the
current state any worse. virito-mem will handle migration errors
gracefully. CMA might be able to fallback on other free areas within the
CMA region.
I'd say, document the changed behavior properly so people are aware that
there might be issues in corner cases with huge pages on CMA / ZONE_MOVABLE.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists