lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb9f3604-8a0a-478a-8bf7-2d139ccbc89d@linux.ibm.com>
Date: Thu, 18 Dec 2025 21:49:29 +0530
From: Sourabh Jain <sourabhjain@...ux.ibm.com>
To: lkml <linux-kernel@...r.kernel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Borislav Petkov
 <bp@...en8.de>,
        David Hildenbrand <david@...nel.org>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Madhavan Srinivasan
 <maddy@...ux.ibm.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Muchun Song <muchun.song@...ux.dev>,
        Oscar Salvador <osalvador@...e.de>,
        "Ritesh Harjani (IBM)" <ritesh.list@...il.com>,
        Vasily Gorbik <gor@...ux.ibm.com>
Subject: mm/hugetlb: kernel fail to boot if total hugepages size is almost
 equal to system RAM

Hello All,

I observed a kernel boot failure when the total hugepages size is almost
equal to the system RAM.

For example, a Power system with 255 GB RAM failed to boot with the
following kernel command-line arguments:

default_hugepagesz=2M hugepagesz=2M hugepages=128512

The failure occurred with the following logs:

   Booting a command list

OF stdout device is: /vdevice/vty@...00000
Preparing to boot Linux version 6.19.0-rc1+ (root@...t) (gcc (GCC), GNU 
ld version 2.35.2-63.el9) #4 SMP Thu Dec 18 09:02:16 CST 2025
Detected machine type: 0000000000000101
command line: 
BOOT_IMAGE=(ieee1275//vdevice/v-scsi@...00065/disk@...0000000000000,msdos2)/vmlinuz-6.19.0-rc1+ 
root=/dev/mapper/r-root ro rd.lvm.lv=root/root rd.lvm.lv=root/swap 
biosdevname=0 loglevel=7 ignore_loglevel debug console=hvc0 
earlycon=hvc0 earlyprintk crashkernel=4G default_hugepagesz=2M 
hugepagesz=2M hugepages=128512
Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
memory layout at init:
   memory_limit : 0000000000000000 (16 MB aligned)
   alloc_bottom : 0000000016050000
   alloc_top    : 0000000030000000
   alloc_top_hi : 0000000030000000
   rmo_top      : 0000000030000000
   ram_top      : 0000000030000000
instantiating rtas at 0x000000002ec50000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000016060000 -> 0x0000000016061844
Device tree struct  0x0000000016070000 -> 0x0000000016080000
Quiescing Open Firmware ...
Booting Linux via __start() @ 0x000000000a700000 ...
[    0.000000] printk: debug: ignoring loglevel setting.
[    0.000000] crashkernel reserved: 0x0000000018000000 - 
0x0000000118000000 (4096 MB)
[    0.000000] radix-mmu: Page sizes from device-tree:
[    0.000000] radix-mmu: Page size shift = 12 AP=0x0
[    0.000000] radix-mmu: Page size shift = 16 AP=0x5
[    0.000000] radix-mmu: Page size shift = 21 AP=0x1
[    0.000000] radix-mmu: Page size shift = 30 AP=0x2
[    0.000000] Activating Kernel Userspace Access Prevention
[    0.000000] Activating Kernel Userspace Execution Prevention
[    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000002800000 
with 2.00 MiB pages (exec)
[    0.000000] radix-mmu: Mapped 0x0000000002800000-0x0000003ffde00000 
with 2.00 MiB pages
[    0.000000] radix-mmu: Mapped 0x0000003ffde00000-0x0000003ffdff0000 
with 64.0 KiB pages
[    0.000000] radix-mmu: Mapped 0x0000003fffff0000-0x0000004000000000 
with 64.0 KiB pages
[    0.000000] radix-mmu: Mapped 0x0000003ffdff0000-0x0000003fffff0000 
with 64.0 KiB pages
[    0.000000] lpar: Using radix MMU under hypervisor
[    0.000000] Linux version 6.19.0-rc1+ (root) (gcc (GCC) GNU ld 
version 2.35.2-63.el9) #4 SMP Thu Dec 18 09:02:16 CST 202
5
[    0.000000] OF: reserved mem: Reserved memory: No reserved-memory 
node in the DT
[    0.000000] Found initrd at 0xc00000000f800000:0xc000000016046afe
[    0.000000] Hardware name: hv:phyp pSeries
[    0.000000] printk: legacy bootconsole [udbg0] enabled
[    0.000000] Partition configured for 72 cpus.
[    0.000000] CPU maps initialized for 8 threads per core
[    0.000000]  (thread shift is 3)

<snip>

[    0.000000] Initmem setup node 28 as memoryless
[    0.000000] Initmem setup node 29 as memoryless
[    0.000000] Initmem setup node 30 as memoryless
[    0.000000] Initmem setup node 31 as memoryless
[    0.000000] percpu: Embedded 3 pages/cpu s126488 r0 d70120 u196608
[    0.000000] pcpu-alloc: s126488 r0 d70120 u196608 alloc=3*65536
[    0.000000] pcpu-alloc: [0] 00 [0] 01 [0] 02 [0] 03 [0] 04 [0] 05 [0] 
06 [0] 07
[    0.000000] pcpu-alloc: [0] 08 [0] 09 [0] 10 [0] 11 [0] 12 [0] 13 [0] 
14 [0] 15
[    0.000000] pcpu-alloc: [0] 16 [0] 17 [0] 18 [0] 19 [0] 20 [0] 21 [0] 
22 [0] 23
[    0.000000] pcpu-alloc: [0] 24 [0] 25 [0] 26 [0] 27 [0] 28 [0] 29 [0] 
30 [0] 31
[    0.000000] pcpu-alloc: [1] 32 [1] 33 [1] 34 [1] 35 [1] 36 [1] 37 [1] 
38 [1] 39
[    0.000000] pcpu-alloc: [1] 40 [1] 41 [1] 42 [1] 43 [1] 44 [1] 45 [1] 
46 [1] 47
[    0.000000] pcpu-alloc: [1] 48 [1] 49 [1] 50 [1] 51 [1] 52 [1] 53 [1] 
54 [1] 55
[    0.000000] pcpu-alloc: [1] 56 [1] 57 [1] 58 [1] 59 [1] 60 [1] 61 [1] 
62 [1] 63
[    0.000000] pcpu-alloc: [2] 64 [2] 65 [2] 66 [2] 67 [2] 68 [2] 69 [2] 
70 [2] 71
[    0.000000] Kernel command line: 
BOOT_IMAGE=(ieee1275//vdevice/v-scsi@...00065/disk@...0000000000000,msdos2)/vmlinuz-6.19.0-rc1+ 
root=/dev/mapper/root ro rd.lvm.lv=root/root rd.lvm.lv=root/swap 
biosdevname=0 loglevel=7 ignore_loglevel debug console=hvc0 
earlycon=hvc0 earlyprintk crashkernel=4G default_hugepagesz=2M hugepagesz=
2M hugepages=128512
[    0.000000] Unknown kernel command line parameters "earlyprintk 
biosdevname=0", will be passed to user space.
[    0.000000] random: crng init done
[    0.000000] printk: log buffer data + meta data: 1048576 + 3670016 = 
4718592 bytes

<snip>

[    0.070655] thermal_sys: Registered thermal governor 'step_wise'
[    0.070709] cpuidle: using governor menu
[    0.070781] RTAS daemon started
[    0.070984] pstore: Using crash dump compression: deflate
[    0.070988] pstore: Registered nvram as persistent store backend
[    0.071386] EEH: pSeries platform initialized
[    0.071459] plpks: POWER LPAR Platform KeyStore is not supported or 
enabled
[    0.081865] kprobes: kprobe jump-optimization is enabled. All kprobes 
are optimized if possible.
[    2.828787] HugeTLB: allocation took 2740ms with 
hugepage_allocation_threads=18
[    2.828821] HugeTLB: allocating 128512 of page size 2.00 MiB failed.  
Only allocated 128429 hugepages.
[    2.828852] HugeTLB: registered 2.00 MiB page size, pre-allocated 
128429 pages
[    2.828855] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
[    2.828858] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
[    2.828862] HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
[    2.831713] swapper/0: page allocation failure: order:5, 
mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=1-3
[    2.831732] CPU: 51 UID: 0 PID: 1 Comm: swapper/0 Not tainted 
6.19.0-rc1+ #4 VOLUNTARY
[    2.831736] Hardware name: hv:phyp pSeries
[    2.831738] Call Trace:
[    2.831738] [c000001c801b77c0] [c00000000111ae6c] 
dump_stack_lvl+0x8c/0xf0 (unreliable)
[    2.831747] [c000001c801b77f0] [c00000000059a024] warn_alloc+0x12c/0x1d8
[    2.831752] [c000001c801b7890] [c00000000059a918] 
__alloc_pages_slowpath.constprop.0+0x848/0xa98
[    2.831755] [c000001c801b79d0] [c00000000059ae3c] 
__alloc_frozen_pages_noprof+0x2d4/0x3a8
[    2.831758] [c000001c801b7a50] [c0000000005eac64] 
alloc_pages_mpol+0x10c/0x1f4
[    2.831761] [c000001c801b7ab0] [c0000000005eadac] 
alloc_pages_noprof+0x60/0xe8
[    2.831763] [c000001c801b7ad0] [c0000000004d9978] 
mempool_alloc_pages+0x24/0x38
[    2.831767] [c000001c801b7af0] [c0000000004da4a0] 
mempool_init_node+0x138/0x1fc
[    2.831769] [c000001c801b7b40] [c00000000208844c] 
bio_integrity_initfn+0x40/0x70
[    2.831773] [c000001c801b7ba0] [c000000000010c44] 
do_one_initcall+0x60/0x36c
[    2.831776] [c000001c801b7c80] [c000000002006b2c] 
do_initcalls+0x12c/0x22c
[    2.831779] [c000001c801b7d30] [c000000002006f1c] 
kernel_init_freeable+0x23c/0x390
[    2.831781] [c000001c801b7de0] [c000000000011078] kernel_init+0x34/0x26c
[    2.831783] [c000001c801b7e50] [c00000000000dd3c] 
ret_from_kernel_user_thread+0x14/0x1c
[    2.831786] ---- interrupt: 0 at 0x0
[    2.831790] Mem-Info:
[    2.831871] active_anon:0 inactive_anon:0 isolated_anon:0
[    2.831871]  active_file:0 inactive_file:0 isolated_file:0
[    2.831871]  unevictable:0 dirty:0 writeback:0
[    2.831871]  slab_reclaimable:82 slab_unreclaimable:2106
[    2.831871]  mapped:0 shmem:0 pagetables:146
[    2.831871]  sec_pagetables:0 bounce:0
[    2.831871]  kernel_misc_reclaimable:0
[    2.831871]  free:944 free_pcp:3099 free_cma:0
[    2.831903] Node 1 active_anon:0kB inactive_anon:0kB active_file:0kB 
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
mapped:0kB dirty:0kB writeback:0kB shmem:0kB sh
mem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB kernel_stack:8000kB 
pagetables:4224kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
[    2.831925] Node 2 active_anon:0kB inactive_anon:0kB active_file:0kB 
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
mapped:0kB dirty:0kB writeback:0kB shmem:0kB sh
mem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB kernel_stack:7968kB 
pagetables:4096kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
[    2.831937] Node 3 active_anon:0kB inactive_anon:0kB active_file:0kB 
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
mapped:0kB dirty:0kB writeback:0kB shmem:0kB sh
mem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB kernel_stack:2272kB 
pagetables:1024kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
[    2.831962] Node 1 Normal free:19520kB boost:0kB min:29440kB 
low:144448kB high:259456kB reserved_highatomic:0KB free_highatomic:0KB 
active_anon:0kB inactive_anon:0kB active_file:0kB inacti
ve_file:0kB unevictable:0kB writepending:0kB zspages:0kB 
present:119537664kB managed:115056960kB mlocked:0kB bounce:0kB 
free_pcp:84992kB local_pcp:2048kB free_cma:0kB
[    2.831991] lowmem_reserve[]: 0 0 0
[    2.831997] Node 2 Normal free:39424kB boost:2048kB min:32512kB 
low:151360kB high:270208kB reserved_highatomic:0KB free_highatomic:0KB 
active_anon:0kB inactive_anon:0kB active_file:0kB ina
ctive_file:0kB unevictable:0kB writepending:0kB zspages:0kB 
present:119013376kB managed:118885632kB mlocked:0kB bounce:0kB 
free_pcp:95552kB local_pcp:2816kB free_cma:0kB
[    2.832008] lowmem_reserve[]: 0 0 0
[    2.832011] Node 3 Normal free:1472kB boost:0kB min:7616kB 
low:37376kB high:67136kB reserved_highatomic:0KB free_highatomic:0KB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_f
ile:0kB unevictable:0kB writepending:0kB zspages:0kB present:29884416kB 
managed:29784448kB mlocked:0kB bounce:0kB free_pcp:17792kB local_pcp:0kB 
free_cma:0kB
[    2.832021] lowmem_reserve[]: 0 0 0
[    2.832025] Node 1 Normal: 3*64kB (UME) 3*128kB (ME) 4*256kB (UME) 
3*512kB (UME) 4*1024kB (ME) 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 7232kB
[    2.832037] Node 2 Normal: 1*64kB (U) 0*128kB 1*256kB (M) 0*512kB 
2*1024kB (UM) 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 2368kB
[    2.832052] Node 3 Normal: 1*64kB (E) 1*128kB (M) 3*256kB (UME) 
1*512kB (U) 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 1472kB
[    2.832068] Node 1 hugepages_total=56043 hugepages_free=56043 
hugepages_surp=0 hugepages_size=2048kB
[    2.832078] Node 1 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=1048576kB
[    2.832086] Node 2 hugepages_total=57915 hugepages_free=57915 
hugepages_surp=0 hugepages_size=2048kB
[    2.832093] Node 2 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=1048576kB
[    2.832102] Node 3 hugepages_total=14471 hugepages_free=14471 
hugepages_surp=0 hugepages_size=2048kB
[    2.832111] Node 3 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=1048576kB
[    2.832119] 0 total pagecache pages
[    2.832122] 0 pages in swap cache
[    2.832127] Free swap  = 0kB
[    2.832130] Total swap = 0kB
[    2.832133] 4194304 pages RAM
[    2.832138] 0 pages HighMem/MovableOnly
[    2.832141] 73569 pages reserved
[    2.832143] 0 pages cma reserved
[    2.832146] 0 pages hwpoisoned
[    2.832153] Memory cgroup min protection 0kB -- low protection 0kB
[    2.832154] Kernel panic - not syncing: bio: can't create integrity 
buf pool
[    2.832160] CPU: 51 UID: 0 PID: 1 Comm: swapper/0 Not tainted 
6.19.0-rc1+ #4 VOLUNTARY
[    2.832164] Hardware name: hv:phyp pSeries
[    2.832167] Call Trace:
[    2.832169] [c000001c801b7a50] [c00000000111aeb8] 
dump_stack_lvl+0xd8/0xf0 (unreliable)
[    2.832180] [c000001c801b7a80] [c00000000015d79c] vpanic+0x2c8/0x4b4
[    2.832189] [c000001c801b7b20] [c00000000015d9c8] nmi_panic+0x0/0xa0
[    2.832197] [c000001c801b7b40] [c000000002088478] 
bio_integrity_initfn+0x6c/0x70
[    2.832205] [c000001c801b7ba0] [c000000000010c44] 
do_one_initcall+0x60/0x36c
[    2.832213] [c000001c801b7c80] [c000000002006b2c] 
do_initcalls+0x12c/0x22c
[    2.832221] [c000001c801b7d30] [c000000002006f1c] 
kernel_init_freeable+0x23c/0x390
[    2.832229] [c000001c801b7de0] [c000000000011078] kernel_init+0x34/0x26c
[    2.832237] [c000001c801b7e50] [c00000000000dd3c] 
ret_from_kernel_user_thread+0x14/0x1c
[    2.832247] ---- interrupt: 0 at 0x0
[    2.834181] pstore: backend (nvram) writing error (-1)
[    2.835809] Rebooting in 10 seconds..

I agree that reserving hugepages equal to the system RAM is not very
practical. However, would it be a good idea to make the hugepage
memory allocator aware of the total system memory and leave some
memory for the kernel to boot?

Thanks,
Sourabh Jain

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ