lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28e28c2a-e72d-a181-e87a-39cecc8c3c76@google.com>
Date:   Fri, 24 Nov 2023 11:44:43 -0800 (PST)
From:   David Rientjes <rientjes@...gle.com>
To:     David Hildenbrand <david@...hat.com>
cc:     Gang Li <gang.li@...ux.dev>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Muchun Song <muchun.song@...ux.dev>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Gang Li <ligang.bdlg@...edance.com>
Subject: Re: [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation
 on boot

On Thu, 23 Nov 2023, David Hildenbrand wrote:

> On 23.11.23 14:30, Gang Li wrote:
> > From: Gang Li <ligang.bdlg@...edance.com>
> > 
> > Inspired by these patches [1][2], this series aims to speed up the
> > initialization of hugetlb during the boot process through
> > parallelization.
> > 
> > It is particularly effective in large systems. On a machine equipped
> > with 1TB of memory and two NUMA nodes, the time for hugetlb
> > initialization was reduced from 2 seconds to 1 second.
> 
> Sorry to say, but why is that a scenario worth adding complexity for /
> optimizing for? You don't cover that, so there is a clear lack in the
> motivation.
> 
> 2 vs. 1 second on a 1 TiB system is usually really just noise.
> 

The cost will continue to grow over time, so I presume that Gang is trying 
to get out in front of the issue even though it may not be a large savings 
today.

Running single boot tests, with the latest upstream kernel, allocating 
1,440 1GB hugetlb pages on a 1.5TB AMD host appears to take 1.47s.

But allocating 11,776 1GB hugetlb pages on a 12TB Intel host takes 65.2s 
today with the current implementation.

So it's likely something worth optimizing.

Gang, I'm curious about this in the cover letter:

"""
This series currently focuses on optimizing 2MB hugetlb. Since
gigantic pages are few in number, their optimization effects
are not as pronounced. We may explore optimizations for
gigantic pages in the future.
"""

For >1TB hosts, why the emphasis on 2MB hugetlb? :)  I would have expected 
1GB pages.  Are you really allocating ~500k 2MB hugetlb pages?

So if the patchset optimizes for the more likely scenario on these large 
hosts, which would be 1GB pages, that would be great.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ