linux-kernel - Re: [PATCH RESEND 0/8] hugetlb: add demote/split page functionality

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <10d86c18-f0cf-395f-4209-17ac71b9fc03@oracle.com>
Date:   Tue, 24 Aug 2021 15:08:46 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        David Hildenbrand <david@...hat.com>,
        Michal Hocko <mhocko@...e.com>,
        Oscar Salvador <osalvador@...e.de>, Zi Yan <ziy@...dia.com>,
        Muchun Song <songmuchun@...edance.com>,
        Naoya Horiguchi <naoya.horiguchi@...ux.dev>,
        David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Hillf Danton <hdanton@...a.com>
Subject: Re: [PATCH RESEND 0/8] hugetlb: add demote/split page functionality

Add Vlastimil and Hillf,

On 8/16/21 6:46 PM, Andrew Morton wrote:
> On Mon, 16 Aug 2021 17:46:58 -0700 Mike Kravetz <mike.kravetz@...cle.com> wrote:
> 
>>> It really is a ton of new code.  I think we're owed much more detail
>>> about the problem than the above.  To be confident that all this
>>> material is truly justified?
>>
>> The desired functionality for this specific use case is to simply
>> convert a 1G huegtlb page to 512 2MB hugetlb pages.  As mentioned
>>
>> "Converting larger to smaller hugetlb pages can be accomplished today by
>>  first freeing the larger page to the buddy allocator and then allocating
>>  the smaller pages.  However, there are two issues with this approach:
>>  1) This process can take quite some time, especially if allocation of
>>     the smaller pages is not immediate and requires migration/compaction.
>>  2) There is no guarantee that the total size of smaller pages allocated
>>     will match the size of the larger page which was freed.  This is
>>     because the area freed by the larger page could quickly be
>>     fragmented."
>>
>> These two issues have been experienced in practice.
> 
> Well the first issue is quantifiable.  What is "some time"?  If it's
> people trying to get a 5% speedup on a rare operation because hey,
> bugging the kernel developers doesn't cost me anything then perhaps we
> have better things to be doing.

Well, I set up a test environment on a larger system to get some
numbers.  My 'load' on the system was filling the page cache with
clean pages.  The thought is that these pages could easily be reclaimed.

When trying to get numbers I hit a hugetlb page allocation stall where
__alloc_pages(__GFP_RETRY_MAYFAIL, order 9) would stall forever (or at
least an hour).  It was very much like the symptoms addressed here:
https://lore.kernel.org/linux-mm/20190806014744.15446-1-mike.kravetz@oracle.com/

This was on 5.14.0-rc6-next-20210820.

I'll do some more digging as this appears to be some dark corner case of
reclaim and/or compaction.  The 'good news' is that I can reproduce
this.

> And the second problem would benefit from some words to help us
> understand how much real-world hurt this causes, and how frequently.
> And let's understand what the userspace workarounds look like, etc.

The stall above was from doing a simple 'free 1GB page' followed by
'allocate 512 MB pages' from userspace.

Getting out another version of this series will be delayed, as I think
we need to address or understand this issue first.
-- 
Mike Kravetz