lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d338d7a4-ca69-400b-86b5-35e46f6da2df@kernel.org>
Date: Fri, 19 Dec 2025 09:10:49 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Sourabh Jain <sourabhjain@...ux.ibm.com>,
 lkml <linux-kernel@...r.kernel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Borislav Petkov
 <bp@...en8.de>, Heiko Carstens <hca@...ux.ibm.com>,
 Madhavan Srinivasan <maddy@...ux.ibm.com>,
 Michael Ellerman <mpe@...erman.id.au>, Muchun Song <muchun.song@...ux.dev>,
 Oscar Salvador <osalvador@...e.de>,
 "Ritesh Harjani (IBM)" <ritesh.list@...il.com>,
 Vasily Gorbik <gor@...ux.ibm.com>
Subject: Re: mm/hugetlb: kernel fail to boot if total hugepages size is almost
 equal to system RAM

On 12/18/25 17:19, Sourabh Jain wrote:
> Hello All,
> 
> I observed a kernel boot failure when the total hugepages size is almost
> equal to the system RAM.
> 
> For example, a Power system with 255 GB RAM failed to boot with the
> following kernel command-line arguments:
> 
> default_hugepagesz=2M hugepagesz=2M hugepages=128512
> 
> The failure occurred with the following logs:
> 
>     Booting a command list
> 
> OF stdout device is: /vdevice/vty@...00000
> Preparing to boot Linux version 6.19.0-rc1+ (root@...t) (gcc (GCC), GNU
> ld version 2.35.2-63.el9) #4 SMP Thu Dec 18 09:02:16 CST 2025
> Detected machine type: 0000000000000101
> command line:
> BOOT_IMAGE=(ieee1275//vdevice/v-scsi@...00065/disk@...0000000000000,msdos2)/vmlinuz-6.19.0-rc1+
> root=/dev/mapper/r-root ro rd.lvm.lv=root/root rd.lvm.lv=root/swap
> biosdevname=0 loglevel=7 ignore_loglevel debug console=hvc0
> earlycon=hvc0 earlyprintk crashkernel=4G default_hugepagesz=2M
> hugepagesz=2M hugepages=128512
> Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
> Calling ibm,client-architecture-support... done
> memory layout at init:
>     memory_limit : 0000000000000000 (16 MB aligned)
>     alloc_bottom : 0000000016050000
>     alloc_top    : 0000000030000000
>     alloc_top_hi : 0000000030000000
>     rmo_top      : 0000000030000000
>     ram_top      : 0000000030000000
> instantiating rtas at 0x000000002ec50000... done
> prom_hold_cpus: skipped
> copying OF device tree...
> Building dt strings...
> Building dt structure...
> Device tree strings 0x0000000016060000 -> 0x0000000016061844
> Device tree struct  0x0000000016070000 -> 0x0000000016080000
> Quiescing Open Firmware ...
> Booting Linux via __start() @ 0x000000000a700000 ...
> [    0.000000] printk: debug: ignoring loglevel setting.
> [    0.000000] crashkernel reserved: 0x0000000018000000 -
> 0x0000000118000000 (4096 MB)
> [    0.000000] radix-mmu: Page sizes from device-tree:
> [    0.000000] radix-mmu: Page size shift = 12 AP=0x0
> [    0.000000] radix-mmu: Page size shift = 16 AP=0x5
> [    0.000000] radix-mmu: Page size shift = 21 AP=0x1
> [    0.000000] radix-mmu: Page size shift = 30 AP=0x2
> [    0.000000] Activating Kernel Userspace Access Prevention
> [    0.000000] Activating Kernel Userspace Execution Prevention
> [    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000002800000
> with 2.00 MiB pages (exec)
> [    0.000000] radix-mmu: Mapped 0x0000000002800000-0x0000003ffde00000
> with 2.00 MiB pages
> [    0.000000] radix-mmu: Mapped 0x0000003ffde00000-0x0000003ffdff0000
> with 64.0 KiB pages
> [    0.000000] radix-mmu: Mapped 0x0000003fffff0000-0x0000004000000000
> with 64.0 KiB pages
> [    0.000000] radix-mmu: Mapped 0x0000003ffdff0000-0x0000003fffff0000
> with 64.0 KiB pages
> [    0.000000] lpar: Using radix MMU under hypervisor
> [    0.000000] Linux version 6.19.0-rc1+ (root) (gcc (GCC) GNU ld
> version 2.35.2-63.el9) #4 SMP Thu Dec 18 09:02:16 CST 202
> 5
> [    0.000000] OF: reserved mem: Reserved memory: No reserved-memory
> node in the DT
> [    0.000000] Found initrd at 0xc00000000f800000:0xc000000016046afe
> [    0.000000] Hardware name: hv:phyp pSeries
> [    0.000000] printk: legacy bootconsole [udbg0] enabled
> [    0.000000] Partition configured for 72 cpus.
> [    0.000000] CPU maps initialized for 8 threads per core
> [    0.000000]  (thread shift is 3)
> 
> <snip>
> 
> [    0.000000] Initmem setup node 28 as memoryless
> [    0.000000] Initmem setup node 29 as memoryless
> [    0.000000] Initmem setup node 30 as memoryless
> [    0.000000] Initmem setup node 31 as memoryless
> [    0.000000] percpu: Embedded 3 pages/cpu s126488 r0 d70120 u196608
> [    0.000000] pcpu-alloc: s126488 r0 d70120 u196608 alloc=3*65536
> [    0.000000] pcpu-alloc: [0] 00 [0] 01 [0] 02 [0] 03 [0] 04 [0] 05 [0]
> 06 [0] 07
> [    0.000000] pcpu-alloc: [0] 08 [0] 09 [0] 10 [0] 11 [0] 12 [0] 13 [0]
> 14 [0] 15
> [    0.000000] pcpu-alloc: [0] 16 [0] 17 [0] 18 [0] 19 [0] 20 [0] 21 [0]
> 22 [0] 23
> [    0.000000] pcpu-alloc: [0] 24 [0] 25 [0] 26 [0] 27 [0] 28 [0] 29 [0]
> 30 [0] 31
> [    0.000000] pcpu-alloc: [1] 32 [1] 33 [1] 34 [1] 35 [1] 36 [1] 37 [1]
> 38 [1] 39
> [    0.000000] pcpu-alloc: [1] 40 [1] 41 [1] 42 [1] 43 [1] 44 [1] 45 [1]
> 46 [1] 47
> [    0.000000] pcpu-alloc: [1] 48 [1] 49 [1] 50 [1] 51 [1] 52 [1] 53 [1]
> 54 [1] 55
> [    0.000000] pcpu-alloc: [1] 56 [1] 57 [1] 58 [1] 59 [1] 60 [1] 61 [1]
> 62 [1] 63
> [    0.000000] pcpu-alloc: [2] 64 [2] 65 [2] 66 [2] 67 [2] 68 [2] 69 [2]
> 70 [2] 71
> [    0.000000] Kernel command line:
> BOOT_IMAGE=(ieee1275//vdevice/v-scsi@...00065/disk@...0000000000000,msdos2)/vmlinuz-6.19.0-rc1+
> root=/dev/mapper/root ro rd.lvm.lv=root/root rd.lvm.lv=root/swap
> biosdevname=0 loglevel=7 ignore_loglevel debug console=hvc0
> earlycon=hvc0 earlyprintk crashkernel=4G default_hugepagesz=2M hugepagesz=
> 2M hugepages=128512
> [    0.000000] Unknown kernel command line parameters "earlyprintk
> biosdevname=0", will be passed to user space.
> [    0.000000] random: crng init done
> [    0.000000] printk: log buffer data + meta data: 1048576 + 3670016 =
> 4718592 bytes
> 
> <snip>
> 
> [    0.070655] thermal_sys: Registered thermal governor 'step_wise'
> [    0.070709] cpuidle: using governor menu
> [    0.070781] RTAS daemon started
> [    0.070984] pstore: Using crash dump compression: deflate
> [    0.070988] pstore: Registered nvram as persistent store backend
> [    0.071386] EEH: pSeries platform initialized
> [    0.071459] plpks: POWER LPAR Platform KeyStore is not supported or
> enabled
> [    0.081865] kprobes: kprobe jump-optimization is enabled. All kprobes
> are optimized if possible.
> [    2.828787] HugeTLB: allocation took 2740ms with
> hugepage_allocation_threads=18
> [    2.828821] HugeTLB: allocating 128512 of page size 2.00 MiB failed.
> Only allocated 128429 hugepages.
> [    2.828852] HugeTLB: registered 2.00 MiB page size, pre-allocated
> 128429 pages
> [    2.828855] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
> [    2.828858] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
> [    2.828862] HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
> [    2.831713] swapper/0: page allocation failure: order:5,
> mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=1-3
> [    2.831732] CPU: 51 UID: 0 PID: 1 Comm: swapper/0 Not tainted
> 6.19.0-rc1+ #4 VOLUNTARY
> [    2.831736] Hardware name: hv:phyp pSeries
> [    2.831738] Call Trace:
> [    2.831738] [c000001c801b77c0] [c00000000111ae6c]
> dump_stack_lvl+0x8c/0xf0 (unreliable)
> [    2.831747] [c000001c801b77f0] [c00000000059a024] warn_alloc+0x12c/0x1d8
> [    2.831752] [c000001c801b7890] [c00000000059a918]
> __alloc_pages_slowpath.constprop.0+0x848/0xa98
> [    2.831755] [c000001c801b79d0] [c00000000059ae3c]
> __alloc_frozen_pages_noprof+0x2d4/0x3a8
> [    2.831758] [c000001c801b7a50] [c0000000005eac64]
> alloc_pages_mpol+0x10c/0x1f4
> [    2.831761] [c000001c801b7ab0] [c0000000005eadac]
> alloc_pages_noprof+0x60/0xe8
> [    2.831763] [c000001c801b7ad0] [c0000000004d9978]
> mempool_alloc_pages+0x24/0x38
> [    2.831767] [c000001c801b7af0] [c0000000004da4a0]
> mempool_init_node+0x138/0x1fc
> [    2.831769] [c000001c801b7b40] [c00000000208844c]
> bio_integrity_initfn+0x40/0x70
> [    2.831773] [c000001c801b7ba0] [c000000000010c44]
> do_one_initcall+0x60/0x36c
> [    2.831776] [c000001c801b7c80] [c000000002006b2c]
> do_initcalls+0x12c/0x22c
> [    2.831779] [c000001c801b7d30] [c000000002006f1c]
> kernel_init_freeable+0x23c/0x390
> [    2.831781] [c000001c801b7de0] [c000000000011078] kernel_init+0x34/0x26c
> [    2.831783] [c000001c801b7e50] [c00000000000dd3c]
> ret_from_kernel_user_thread+0x14/0x1c
> [    2.831786] ---- interrupt: 0 at 0x0
> [    2.831790] Mem-Info:
> [    2.831871] active_anon:0 inactive_anon:0 isolated_anon:0
> [    2.831871]  active_file:0 inactive_file:0 isolated_file:0
> [    2.831871]  unevictable:0 dirty:0 writeback:0
> [    2.831871]  slab_reclaimable:82 slab_unreclaimable:2106
> [    2.831871]  mapped:0 shmem:0 pagetables:146
> [    2.831871]  sec_pagetables:0 bounce:0
> [    2.831871]  kernel_misc_reclaimable:0
> [    2.831871]  free:944 free_pcp:3099 free_cma:0
> [    2.831903] Node 1 active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> mapped:0kB dirty:0kB writeback:0kB shmem:0kB sh
> mem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB kernel_stack:8000kB
> pagetables:4224kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
> [    2.831925] Node 2 active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> mapped:0kB dirty:0kB writeback:0kB shmem:0kB sh
> mem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB kernel_stack:7968kB
> pagetables:4096kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
> [    2.831937] Node 3 active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> mapped:0kB dirty:0kB writeback:0kB shmem:0kB sh
> mem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB kernel_stack:2272kB
> pagetables:1024kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
> [    2.831962] Node 1 Normal free:19520kB boost:0kB min:29440kB
> low:144448kB high:259456kB reserved_highatomic:0KB free_highatomic:0KB
> active_anon:0kB inactive_anon:0kB active_file:0kB inacti
> ve_file:0kB unevictable:0kB writepending:0kB zspages:0kB
> present:119537664kB managed:115056960kB mlocked:0kB bounce:0kB
> free_pcp:84992kB local_pcp:2048kB free_cma:0kB
> [    2.831991] lowmem_reserve[]: 0 0 0
> [    2.831997] Node 2 Normal free:39424kB boost:2048kB min:32512kB
> low:151360kB high:270208kB reserved_highatomic:0KB free_highatomic:0KB
> active_anon:0kB inactive_anon:0kB active_file:0kB ina
> ctive_file:0kB unevictable:0kB writepending:0kB zspages:0kB
> present:119013376kB managed:118885632kB mlocked:0kB bounce:0kB
> free_pcp:95552kB local_pcp:2816kB free_cma:0kB
> [    2.832008] lowmem_reserve[]: 0 0 0
> [    2.832011] Node 3 Normal free:1472kB boost:0kB min:7616kB
> low:37376kB high:67136kB reserved_highatomic:0KB free_highatomic:0KB
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_f
> ile:0kB unevictable:0kB writepending:0kB zspages:0kB present:29884416kB
> managed:29784448kB mlocked:0kB bounce:0kB free_pcp:17792kB local_pcp:0kB
> free_cma:0kB
> [    2.832021] lowmem_reserve[]: 0 0 0
> [    2.832025] Node 1 Normal: 3*64kB (UME) 3*128kB (ME) 4*256kB (UME)
> 3*512kB (UME) 4*1024kB (ME) 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 7232kB
> [    2.832037] Node 2 Normal: 1*64kB (U) 0*128kB 1*256kB (M) 0*512kB
> 2*1024kB (UM) 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 2368kB
> [    2.832052] Node 3 Normal: 1*64kB (E) 1*128kB (M) 3*256kB (UME)
> 1*512kB (U) 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 1472kB
> [    2.832068] Node 1 hugepages_total=56043 hugepages_free=56043
> hugepages_surp=0 hugepages_size=2048kB
> [    2.832078] Node 1 hugepages_total=0 hugepages_free=0
> hugepages_surp=0 hugepages_size=1048576kB
> [    2.832086] Node 2 hugepages_total=57915 hugepages_free=57915
> hugepages_surp=0 hugepages_size=2048kB
> [    2.832093] Node 2 hugepages_total=0 hugepages_free=0
> hugepages_surp=0 hugepages_size=1048576kB
> [    2.832102] Node 3 hugepages_total=14471 hugepages_free=14471
> hugepages_surp=0 hugepages_size=2048kB
> [    2.832111] Node 3 hugepages_total=0 hugepages_free=0
> hugepages_surp=0 hugepages_size=1048576kB
> [    2.832119] 0 total pagecache pages
> [    2.832122] 0 pages in swap cache
> [    2.832127] Free swap  = 0kB
> [    2.832130] Total swap = 0kB
> [    2.832133] 4194304 pages RAM
> [    2.832138] 0 pages HighMem/MovableOnly
> [    2.832141] 73569 pages reserved
> [    2.832143] 0 pages cma reserved
> [    2.832146] 0 pages hwpoisoned
> [    2.832153] Memory cgroup min protection 0kB -- low protection 0kB
> [    2.832154] Kernel panic - not syncing: bio: can't create integrity
> buf pool
> [    2.832160] CPU: 51 UID: 0 PID: 1 Comm: swapper/0 Not tainted
> 6.19.0-rc1+ #4 VOLUNTARY
> [    2.832164] Hardware name: hv:phyp pSeries
> [    2.832167] Call Trace:
> [    2.832169] [c000001c801b7a50] [c00000000111aeb8]
> dump_stack_lvl+0xd8/0xf0 (unreliable)
> [    2.832180] [c000001c801b7a80] [c00000000015d79c] vpanic+0x2c8/0x4b4
> [    2.832189] [c000001c801b7b20] [c00000000015d9c8] nmi_panic+0x0/0xa0
> [    2.832197] [c000001c801b7b40] [c000000002088478]
> bio_integrity_initfn+0x6c/0x70
> [    2.832205] [c000001c801b7ba0] [c000000000010c44]
> do_one_initcall+0x60/0x36c
> [    2.832213] [c000001c801b7c80] [c000000002006b2c]
> do_initcalls+0x12c/0x22c
> [    2.832221] [c000001c801b7d30] [c000000002006f1c]
> kernel_init_freeable+0x23c/0x390
> [    2.832229] [c000001c801b7de0] [c000000000011078] kernel_init+0x34/0x26c
> [    2.832237] [c000001c801b7e50] [c00000000000dd3c]
> ret_from_kernel_user_thread+0x14/0x1c
> [    2.832247] ---- interrupt: 0 at 0x0
> [    2.834181] pstore: backend (nvram) writing error (-1)
> [    2.835809] Rebooting in 10 seconds..
> 
> I agree that reserving hugepages equal to the system RAM is not very
> practical. However, would it be a good idea to make the hugepage
> memory allocator aware of the total system memory and leave some
> memory for the kernel to boot?

IMHO it's the same as with any other system mis-configuration where you 
end up with too little usable RAM; like setting mem= or cma= or 
crashmem= etc in a wrong way.

How are we supposed to know how much memory the kernel+user space will 
actually require without running easily OOM?

-- 
Cheers

David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ