linux-kernel - Re: Commit 'hugetlbfs: extend the definition of hugepages parameter to support node allocation' breaks old numa less syntax of reserving hugepages on boot.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <dfdc836d4c9eb6b571a32b19cae74c0a426c5a9b.camel@redhat.com>
Date:   Mon, 29 Nov 2021 12:39:03 +0200
From:   Maxim Levitsky <mlevitsk@...hat.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>,
        Zhenguo Yao <yaozhenguo1@...il.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: Commit 'hugetlbfs: extend the definition of hugepages parameter
 to support node allocation' breaks old numa less syntax of reserving
 hugepages on boot.

On Sun, 2021-11-28 at 20:31 -0800, Mike Kravetz wrote:
> On 11/28/21 03:18, Maxim Levitsky wrote:
> > dmesg prints this:
> > 
> > HugeTLB: allocating 64 of page size 1.00 GiB failed.  Only allocated 0 hugepages
> > 
> > Huge pages were allocated on kernel command line (1/2 of 128GB system):
> > 
> > 'default_hugepagesz=1G hugepagesz=1G hugepages=64'
> > 
> > This is 3970X and no real support/need for NUMA, thus only fake NUMA node 0 is present.
> > 
> > Reverting the commit helps.
> > 
> > New syntax also works ( hugepages=0:64 )
> > 
> > I can test any patches for this bug.
> 
> Argh!  I think preallocation of gigantic pages on all systems with only
> a single node is broken.  The issue is at the beginning of
> __alloc_bootmem_huge_page:
> 
> int __alloc_bootmem_huge_page(struct hstate *h, int nid)
> {
>         struct huge_bootmem_page *m = NULL; /* initialize for clang */
>         int nr_nodes, node;
> 
>         if (nid >= nr_online_nodes)
>                 return 0;
> 
> Without using the node specific syntax, nid == NUMA_NO_NODE == -1.  For the
> comparison, nid will be converted to an unsigned into to match nr_online_nodes
> so we will immediately return 0 instead of doing the allocations.
> 
> Zhenguo Yao,
> Can you verify and perhaps put together a patch?does
> 
> > Also unrelated, is there any progress on allocating 1GB pages on demand so that I could
> > allocate them only when I run a VM?
> 
> That should be possible.  Such support was added back in 2014 with commit
> 944d9fec8d7a "hugetlb: add support for gigantic page allocation at runtime".
> 
> > i don't mind having these pages to be marked as to be used for userspace only,
> > since as far as I remember its the kernel usage that makes some page unmoveable.
> > 
> 
> Of course, finding 1GB of contiguous space for a gigantic page is often
> difficult at runtime.  So, allocations are likely to fail the longer the
> system is up and running and fragmentation increases.
> 
> > Last time (many years ago) I tried to create a zone with only userspace pages
> > (I don't remember what options I used) but it didn't work.
> 
> Not too long ago, support was added to use CMA for gigantic page allocation.
> See commit cf11e85fc08c "mm: hugetlb: optionally allocate gigantic hugepages
> using cma".  This sounds like something you might want to try.

This is exactly what I had in mind and it seems to work very well.
Thank you very much!

Best regards,
	Maxim Levitsky