linux-kernel - Re: [PATCH v13 05/12] mm: hugetlb: allocate the vmemmap pages associated with each HugeTLB page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9475b139-1b33-76c7-ef5c-d43d2ea1dba5@redhat.com>
Date:   Tue, 26 Jan 2021 16:56:05 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Oscar Salvador <osalvador@...e.de>
Cc:     Muchun Song <songmuchun@...edance.com>, corbet@....net,
        mike.kravetz@...cle.com, tglx@...utronix.de, mingo@...hat.com,
        bp@...en8.de, x86@...nel.org, hpa@...or.com,
        dave.hansen@...ux.intel.com, luto@...nel.org, peterz@...radead.org,
        viro@...iv.linux.org.uk, akpm@...ux-foundation.org,
        paulmck@...nel.org, mchehab+huawei@...nel.org,
        pawan.kumar.gupta@...ux.intel.com, rdunlap@...radead.org,
        oneukum@...e.com, anshuman.khandual@....com, jroedel@...e.de,
        almasrymina@...gle.com, rientjes@...gle.com, willy@...radead.org,
        mhocko@...e.com, song.bao.hua@...ilicon.com,
        naoya.horiguchi@....com, duanxiongchun@...edance.com,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v13 05/12] mm: hugetlb: allocate the vmemmap pages
 associated with each HugeTLB page

On 26.01.21 16:34, Oscar Salvador wrote:
> On Tue, Jan 26, 2021 at 04:10:53PM +0100, David Hildenbrand wrote:
>> The real issue seems to be discarding the vmemmap on any memory that has
>> movability constraints - CMA and ZONE_MOVABLE; otherwise, as discussed, we
>> can reuse parts of the thingy we're freeing for the vmemmap. Not that it
>> would be ideal: that once-a-huge-page thing will never ever be a huge page
>> again - but if it helps with OOM in corner cases, sure.
> 
> Yes, that is one way, but I am not sure how hard would it be to implement.
> Plus the fact that as you pointed out, once that memory is used for vmemmap
> array, we cannot use it again.
> Actually, we would fragment the memory eventually?
> 
>> Possible simplification: don't perform the optimization for now with free
>> huge pages residing on ZONE_MOVABLE or CMA. Certainly not perfect: what
>> happens when migrating a huge page from ZONE_NORMAL to (ZONE_MOVABLE|CMA)?
> 
> But if we do not allow theose pages to be in ZONE_MOVABLE or CMA, there is no
> point in migrate them, right?

Well, memory unplug "could" still work and migrate them and 
alloc_contig_range() "could in the future" still want to migrate them 
(virtio-mem, gigantic pages, powernv memtrace). Especially, the latter 
two don't work with ZONE_MOVABLE/CMA. But, I mean, it would be fair 
enough to say "there are no guarantees for 
alloc_contig_range()/offline_pages() with ZONE_NORMAL, so we can break 
these use cases when a magic switch is flipped and make these pages 
non-migratable anymore".

I assume compaction doesn't care about huge pages either way, not sure 
about numa balancing etc.

However, note that there is a fundamental issue with any approach that 
allocates a significant amount of unmovable memory for user-space 
purposes (excluding CMA allocations for unmovable stuff, CMA is 
special): pairing it with ZONE_MOVABLE becomes very tricky as your user 
space might just end up eating all kernel memory, although the system 
still looks like there is plenty of free memory residing in 
ZONE_MOVABLE. I mentioned that in the context of secretmem in a reduced 
form as well.

We theoretically have that issue with dynamic allocation of gigantic 
pages, but it's something a user explicitly/rarely triggers and it can 
be documented to cause problems well enough. We'll have the same issue 
with GUP+ZONE_MOVABLE that Pavel is fixing right now - but GUP is 
already known to be broken in various ways and that it has to be treated 
in a special way. I'd like to limit the nasty corner cases.

Of course, we could have smart rules like "don't online memory to 
ZONE_MOVABLE automatically when the magic switch is active". That's just 
ugly, but could work.

-- 
Thanks,

David / dhildenb