lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250221-hugepage-parameter-v1-0-fa49a77c87c8@cyberus-technology.de>
Date: Fri, 21 Feb 2025 14:49:02 +0100
From: Thomas Prescher via B4 Relay <devnull+thomas.prescher.cyberus-technology.de@...nel.org>
To: Jonathan Corbet <corbet@....net>, Muchun Song <muchun.song@...ux.dev>, 
 Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, 
 linux-mm@...ck.org, Thomas Prescher <thomas.prescher@...erus-technology.de>
Subject: [PATCH 0/2] Add a command line option that enables control of how
 many threads per NUMA node should be used to allocate huge pages.

Allocating huge pages can take a very long time on servers
with terabytes of memory even when they are allocated at
boot time where the allocation happens in parallel.

The kernel currently uses a hard coded value of 2 threads per
NUMA node for these allocations. This value might have been good
enough in the past but it is not sufficient to fully utilize
newer systems.

This patch allows to override this value.

We tested this on 2 generations of Xeon CPUs and the results
show a big improvement of the overall allocation time.

+--------------------+-------+-------+-------+-------+-------+
| threads per node   |   2   |   4   |   8   |   16  |    32 |
+--------------------+-------+-------+-------+-------+-------+
| skylake 4node      |   44s |   22s |   16s |   19s |   20s |
| cascade lake 4node |   39s |   20s |   11s |   10s |    9s |
+--------------------+-------+-------+-------+-------+-------+

On skylake, we see an improvment of 2.75x when using 8 threads,
on cascade lake we can get even better at 4.3x when we use
32 threads per node.

This speedup is quite significant and users of large machines
like these should have the option to make the machines boot
as fast as possible.

Signed-off-by: Thomas Prescher <thomas.prescher@...erus-technology.de>
---
Thomas Prescher (2):
      mm: hugetlb: add hugetlb_alloc_threads cmdline option
      mm: hugetlb: log time needed to allocate hugepages

 Documentation/admin-guide/kernel-parameters.txt |  7 +++
 Documentation/admin-guide/mm/hugetlbpage.rst    |  9 +++-
 mm/hugetlb.c                                    | 59 ++++++++++++++++++-------
 3 files changed, 58 insertions(+), 17 deletions(-)
---
base-commit: 334426094588f8179fe175a09ecc887ff0c75758
change-id: 20250221-hugepage-parameter-e8542fdfc0ae

Best regards,
-- 
Thomas Prescher <thomas.prescher@...erus-technology.de>



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ