[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb9f3604-8a0a-478a-8bf7-2d139ccbc89d@linux.ibm.com>
Date: Thu, 18 Dec 2025 21:49:29 +0530
From: Sourabh Jain <sourabhjain@...ux.ibm.com>
To: lkml <linux-kernel@...r.kernel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Borislav Petkov
<bp@...en8.de>,
David Hildenbrand <david@...nel.org>,
Heiko Carstens <hca@...ux.ibm.com>,
Madhavan Srinivasan
<maddy@...ux.ibm.com>,
Michael Ellerman <mpe@...erman.id.au>,
Muchun Song <muchun.song@...ux.dev>,
Oscar Salvador <osalvador@...e.de>,
"Ritesh Harjani (IBM)" <ritesh.list@...il.com>,
Vasily Gorbik <gor@...ux.ibm.com>
Subject: mm/hugetlb: kernel fail to boot if total hugepages size is almost
equal to system RAM
Hello All,
I observed a kernel boot failure when the total hugepages size is almost
equal to the system RAM.
For example, a Power system with 255 GB RAM failed to boot with the
following kernel command-line arguments:
default_hugepagesz=2M hugepagesz=2M hugepages=128512
The failure occurred with the following logs:
Booting a command list
OF stdout device is: /vdevice/vty@...00000
Preparing to boot Linux version 6.19.0-rc1+ (root@...t) (gcc (GCC), GNU
ld version 2.35.2-63.el9) #4 SMP Thu Dec 18 09:02:16 CST 2025
Detected machine type: 0000000000000101
command line:
BOOT_IMAGE=(ieee1275//vdevice/v-scsi@...00065/disk@...0000000000000,msdos2)/vmlinuz-6.19.0-rc1+
root=/dev/mapper/r-root ro rd.lvm.lv=root/root rd.lvm.lv=root/swap
biosdevname=0 loglevel=7 ignore_loglevel debug console=hvc0
earlycon=hvc0 earlyprintk crashkernel=4G default_hugepagesz=2M
hugepagesz=2M hugepages=128512
Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
memory layout at init:
memory_limit : 0000000000000000 (16 MB aligned)
alloc_bottom : 0000000016050000
alloc_top : 0000000030000000
alloc_top_hi : 0000000030000000
rmo_top : 0000000030000000
ram_top : 0000000030000000
instantiating rtas at 0x000000002ec50000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000016060000 -> 0x0000000016061844
Device tree struct 0x0000000016070000 -> 0x0000000016080000
Quiescing Open Firmware ...
Booting Linux via __start() @ 0x000000000a700000 ...
[ 0.000000] printk: debug: ignoring loglevel setting.
[ 0.000000] crashkernel reserved: 0x0000000018000000 -
0x0000000118000000 (4096 MB)
[ 0.000000] radix-mmu: Page sizes from device-tree:
[ 0.000000] radix-mmu: Page size shift = 12 AP=0x0
[ 0.000000] radix-mmu: Page size shift = 16 AP=0x5
[ 0.000000] radix-mmu: Page size shift = 21 AP=0x1
[ 0.000000] radix-mmu: Page size shift = 30 AP=0x2
[ 0.000000] Activating Kernel Userspace Access Prevention
[ 0.000000] Activating Kernel Userspace Execution Prevention
[ 0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000002800000
with 2.00 MiB pages (exec)
[ 0.000000] radix-mmu: Mapped 0x0000000002800000-0x0000003ffde00000
with 2.00 MiB pages
[ 0.000000] radix-mmu: Mapped 0x0000003ffde00000-0x0000003ffdff0000
with 64.0 KiB pages
[ 0.000000] radix-mmu: Mapped 0x0000003fffff0000-0x0000004000000000
with 64.0 KiB pages
[ 0.000000] radix-mmu: Mapped 0x0000003ffdff0000-0x0000003fffff0000
with 64.0 KiB pages
[ 0.000000] lpar: Using radix MMU under hypervisor
[ 0.000000] Linux version 6.19.0-rc1+ (root) (gcc (GCC) GNU ld
version 2.35.2-63.el9) #4 SMP Thu Dec 18 09:02:16 CST 202
5
[ 0.000000] OF: reserved mem: Reserved memory: No reserved-memory
node in the DT
[ 0.000000] Found initrd at 0xc00000000f800000:0xc000000016046afe
[ 0.000000] Hardware name: hv:phyp pSeries
[ 0.000000] printk: legacy bootconsole [udbg0] enabled
[ 0.000000] Partition configured for 72 cpus.
[ 0.000000] CPU maps initialized for 8 threads per core
[ 0.000000] (thread shift is 3)
<snip>
[ 0.000000] Initmem setup node 28 as memoryless
[ 0.000000] Initmem setup node 29 as memoryless
[ 0.000000] Initmem setup node 30 as memoryless
[ 0.000000] Initmem setup node 31 as memoryless
[ 0.000000] percpu: Embedded 3 pages/cpu s126488 r0 d70120 u196608
[ 0.000000] pcpu-alloc: s126488 r0 d70120 u196608 alloc=3*65536
[ 0.000000] pcpu-alloc: [0] 00 [0] 01 [0] 02 [0] 03 [0] 04 [0] 05 [0]
06 [0] 07
[ 0.000000] pcpu-alloc: [0] 08 [0] 09 [0] 10 [0] 11 [0] 12 [0] 13 [0]
14 [0] 15
[ 0.000000] pcpu-alloc: [0] 16 [0] 17 [0] 18 [0] 19 [0] 20 [0] 21 [0]
22 [0] 23
[ 0.000000] pcpu-alloc: [0] 24 [0] 25 [0] 26 [0] 27 [0] 28 [0] 29 [0]
30 [0] 31
[ 0.000000] pcpu-alloc: [1] 32 [1] 33 [1] 34 [1] 35 [1] 36 [1] 37 [1]
38 [1] 39
[ 0.000000] pcpu-alloc: [1] 40 [1] 41 [1] 42 [1] 43 [1] 44 [1] 45 [1]
46 [1] 47
[ 0.000000] pcpu-alloc: [1] 48 [1] 49 [1] 50 [1] 51 [1] 52 [1] 53 [1]
54 [1] 55
[ 0.000000] pcpu-alloc: [1] 56 [1] 57 [1] 58 [1] 59 [1] 60 [1] 61 [1]
62 [1] 63
[ 0.000000] pcpu-alloc: [2] 64 [2] 65 [2] 66 [2] 67 [2] 68 [2] 69 [2]
70 [2] 71
[ 0.000000] Kernel command line:
BOOT_IMAGE=(ieee1275//vdevice/v-scsi@...00065/disk@...0000000000000,msdos2)/vmlinuz-6.19.0-rc1+
root=/dev/mapper/root ro rd.lvm.lv=root/root rd.lvm.lv=root/swap
biosdevname=0 loglevel=7 ignore_loglevel debug console=hvc0
earlycon=hvc0 earlyprintk crashkernel=4G default_hugepagesz=2M hugepagesz=
2M hugepages=128512
[ 0.000000] Unknown kernel command line parameters "earlyprintk
biosdevname=0", will be passed to user space.
[ 0.000000] random: crng init done
[ 0.000000] printk: log buffer data + meta data: 1048576 + 3670016 =
4718592 bytes
<snip>
[ 0.070655] thermal_sys: Registered thermal governor 'step_wise'
[ 0.070709] cpuidle: using governor menu
[ 0.070781] RTAS daemon started
[ 0.070984] pstore: Using crash dump compression: deflate
[ 0.070988] pstore: Registered nvram as persistent store backend
[ 0.071386] EEH: pSeries platform initialized
[ 0.071459] plpks: POWER LPAR Platform KeyStore is not supported or
enabled
[ 0.081865] kprobes: kprobe jump-optimization is enabled. All kprobes
are optimized if possible.
[ 2.828787] HugeTLB: allocation took 2740ms with
hugepage_allocation_threads=18
[ 2.828821] HugeTLB: allocating 128512 of page size 2.00 MiB failed.
Only allocated 128429 hugepages.
[ 2.828852] HugeTLB: registered 2.00 MiB page size, pre-allocated
128429 pages
[ 2.828855] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
[ 2.828858] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
[ 2.828862] HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
[ 2.831713] swapper/0: page allocation failure: order:5,
mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=1-3
[ 2.831732] CPU: 51 UID: 0 PID: 1 Comm: swapper/0 Not tainted
6.19.0-rc1+ #4 VOLUNTARY
[ 2.831736] Hardware name: hv:phyp pSeries
[ 2.831738] Call Trace:
[ 2.831738] [c000001c801b77c0] [c00000000111ae6c]
dump_stack_lvl+0x8c/0xf0 (unreliable)
[ 2.831747] [c000001c801b77f0] [c00000000059a024] warn_alloc+0x12c/0x1d8
[ 2.831752] [c000001c801b7890] [c00000000059a918]
__alloc_pages_slowpath.constprop.0+0x848/0xa98
[ 2.831755] [c000001c801b79d0] [c00000000059ae3c]
__alloc_frozen_pages_noprof+0x2d4/0x3a8
[ 2.831758] [c000001c801b7a50] [c0000000005eac64]
alloc_pages_mpol+0x10c/0x1f4
[ 2.831761] [c000001c801b7ab0] [c0000000005eadac]
alloc_pages_noprof+0x60/0xe8
[ 2.831763] [c000001c801b7ad0] [c0000000004d9978]
mempool_alloc_pages+0x24/0x38
[ 2.831767] [c000001c801b7af0] [c0000000004da4a0]
mempool_init_node+0x138/0x1fc
[ 2.831769] [c000001c801b7b40] [c00000000208844c]
bio_integrity_initfn+0x40/0x70
[ 2.831773] [c000001c801b7ba0] [c000000000010c44]
do_one_initcall+0x60/0x36c
[ 2.831776] [c000001c801b7c80] [c000000002006b2c]
do_initcalls+0x12c/0x22c
[ 2.831779] [c000001c801b7d30] [c000000002006f1c]
kernel_init_freeable+0x23c/0x390
[ 2.831781] [c000001c801b7de0] [c000000000011078] kernel_init+0x34/0x26c
[ 2.831783] [c000001c801b7e50] [c00000000000dd3c]
ret_from_kernel_user_thread+0x14/0x1c
[ 2.831786] ---- interrupt: 0 at 0x0
[ 2.831790] Mem-Info:
[ 2.831871] active_anon:0 inactive_anon:0 isolated_anon:0
[ 2.831871] active_file:0 inactive_file:0 isolated_file:0
[ 2.831871] unevictable:0 dirty:0 writeback:0
[ 2.831871] slab_reclaimable:82 slab_unreclaimable:2106
[ 2.831871] mapped:0 shmem:0 pagetables:146
[ 2.831871] sec_pagetables:0 bounce:0
[ 2.831871] kernel_misc_reclaimable:0
[ 2.831871] free:944 free_pcp:3099 free_cma:0
[ 2.831903] Node 1 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB sh
mem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB kernel_stack:8000kB
pagetables:4224kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
[ 2.831925] Node 2 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB sh
mem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB kernel_stack:7968kB
pagetables:4096kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
[ 2.831937] Node 3 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB sh
mem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB kernel_stack:2272kB
pagetables:1024kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
[ 2.831962] Node 1 Normal free:19520kB boost:0kB min:29440kB
low:144448kB high:259456kB reserved_highatomic:0KB free_highatomic:0KB
active_anon:0kB inactive_anon:0kB active_file:0kB inacti
ve_file:0kB unevictable:0kB writepending:0kB zspages:0kB
present:119537664kB managed:115056960kB mlocked:0kB bounce:0kB
free_pcp:84992kB local_pcp:2048kB free_cma:0kB
[ 2.831991] lowmem_reserve[]: 0 0 0
[ 2.831997] Node 2 Normal free:39424kB boost:2048kB min:32512kB
low:151360kB high:270208kB reserved_highatomic:0KB free_highatomic:0KB
active_anon:0kB inactive_anon:0kB active_file:0kB ina
ctive_file:0kB unevictable:0kB writepending:0kB zspages:0kB
present:119013376kB managed:118885632kB mlocked:0kB bounce:0kB
free_pcp:95552kB local_pcp:2816kB free_cma:0kB
[ 2.832008] lowmem_reserve[]: 0 0 0
[ 2.832011] Node 3 Normal free:1472kB boost:0kB min:7616kB
low:37376kB high:67136kB reserved_highatomic:0KB free_highatomic:0KB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_f
ile:0kB unevictable:0kB writepending:0kB zspages:0kB present:29884416kB
managed:29784448kB mlocked:0kB bounce:0kB free_pcp:17792kB local_pcp:0kB
free_cma:0kB
[ 2.832021] lowmem_reserve[]: 0 0 0
[ 2.832025] Node 1 Normal: 3*64kB (UME) 3*128kB (ME) 4*256kB (UME)
3*512kB (UME) 4*1024kB (ME) 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 7232kB
[ 2.832037] Node 2 Normal: 1*64kB (U) 0*128kB 1*256kB (M) 0*512kB
2*1024kB (UM) 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 2368kB
[ 2.832052] Node 3 Normal: 1*64kB (E) 1*128kB (M) 3*256kB (UME)
1*512kB (U) 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 1472kB
[ 2.832068] Node 1 hugepages_total=56043 hugepages_free=56043
hugepages_surp=0 hugepages_size=2048kB
[ 2.832078] Node 1 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[ 2.832086] Node 2 hugepages_total=57915 hugepages_free=57915
hugepages_surp=0 hugepages_size=2048kB
[ 2.832093] Node 2 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[ 2.832102] Node 3 hugepages_total=14471 hugepages_free=14471
hugepages_surp=0 hugepages_size=2048kB
[ 2.832111] Node 3 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[ 2.832119] 0 total pagecache pages
[ 2.832122] 0 pages in swap cache
[ 2.832127] Free swap = 0kB
[ 2.832130] Total swap = 0kB
[ 2.832133] 4194304 pages RAM
[ 2.832138] 0 pages HighMem/MovableOnly
[ 2.832141] 73569 pages reserved
[ 2.832143] 0 pages cma reserved
[ 2.832146] 0 pages hwpoisoned
[ 2.832153] Memory cgroup min protection 0kB -- low protection 0kB
[ 2.832154] Kernel panic - not syncing: bio: can't create integrity
buf pool
[ 2.832160] CPU: 51 UID: 0 PID: 1 Comm: swapper/0 Not tainted
6.19.0-rc1+ #4 VOLUNTARY
[ 2.832164] Hardware name: hv:phyp pSeries
[ 2.832167] Call Trace:
[ 2.832169] [c000001c801b7a50] [c00000000111aeb8]
dump_stack_lvl+0xd8/0xf0 (unreliable)
[ 2.832180] [c000001c801b7a80] [c00000000015d79c] vpanic+0x2c8/0x4b4
[ 2.832189] [c000001c801b7b20] [c00000000015d9c8] nmi_panic+0x0/0xa0
[ 2.832197] [c000001c801b7b40] [c000000002088478]
bio_integrity_initfn+0x6c/0x70
[ 2.832205] [c000001c801b7ba0] [c000000000010c44]
do_one_initcall+0x60/0x36c
[ 2.832213] [c000001c801b7c80] [c000000002006b2c]
do_initcalls+0x12c/0x22c
[ 2.832221] [c000001c801b7d30] [c000000002006f1c]
kernel_init_freeable+0x23c/0x390
[ 2.832229] [c000001c801b7de0] [c000000000011078] kernel_init+0x34/0x26c
[ 2.832237] [c000001c801b7e50] [c00000000000dd3c]
ret_from_kernel_user_thread+0x14/0x1c
[ 2.832247] ---- interrupt: 0 at 0x0
[ 2.834181] pstore: backend (nvram) writing error (-1)
[ 2.835809] Rebooting in 10 seconds..
I agree that reserving hugepages equal to the system RAM is not very
practical. However, would it be a good idea to make the hugepage
memory allocator aware of the total system memory and leave some
memory for the kernel to boot?
Thanks,
Sourabh Jain
Powered by blists - more mailing lists