lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <aDYXQDdT3dTodTlq@colin-ia-desktop>
Date: Tue, 27 May 2025 14:49:20 -0500
From: Colin Foster <colin.foster@...advantage.com>
To: linux-kernel@...r.kernel.org
Subject: Crash when user space allocates non-highmem memory

Hello,

I'm running up against a reliable failure mode on my hardware and want
to see if there are any suggestions for tracking down what might be
going on.

The kernel is currently 6.12.28, with essentially no out-of-tree patches
besides our DTS. The hardware is an OMAP 4460 processor.

The behavior I see is quite repeatable. I can allocate about 220MB of
RAM by running:

dd if=/dev/zero of=/tmp/myfile1 bs=1M count=220

After that point I can essentially allocate smaller chunks of RAM.
Eventually - somewhere around 225MB - I see the kernel lock up during
memory allocation. An example of a stack dump:

# dd if=/dev/zero of=/tmp/myfile13 bs=4K count=150
[  457.552825] ------------[ cut here ]------------
[  457.557556] WARNING: CPU: 0 PID: 577 at mm/gup.c:149 try_grab_folio+0x1c0/0x200
[  457.564941] Modules linked in: bq27xxx_battery_hdq bq27xxx_battery phy_omap_usb2 omap_mailbox ehci_omap omap_hdq wire cn pwm_twl pwm_twl_led tps62360_regulator tmp102 hwmon cpufreq_dt nfnetlink
[  457.582397] CPU: 0 UID: 0 PID: 577 Comm: lpc_manager Tainted: G        W          6.12.28 #30
[  457.582397] Tainted: [W]=WARN
[  457.582397] Hardware name: Generic OMAP4 (Flattened Device Tree)
[  457.582397] Call trace:
[  457.600006]  unwind_backtrace from show_stack+0x18/0x1c
[  457.600006]  show_stack from dump_stack_lvl+0x38/0x58
[  457.600006]  dump_stack_lvl from __warn+0x84/0x158
[  457.612945]  __warn from warn_slowpath_fmt+0x1a8/0x1bc
[  457.612945]  warn_slowpath_fmt from try_grab_folio+0x1c0/0x200
[  457.628845]  try_grab_folio from follow_page_pte+0x138/0x440
[  457.628845]  follow_page_pte from __get_user_pages+0x17c/0x824
[  457.628845]  __get_user_pages from __gup_longterm_locked+0xec/0xc68
[  457.646728]  __gup_longterm_locked from gup_fast_fallback+0xcc/0x1ac
[  457.653137]  gup_fast_fallback from get_user_pages_fast+0x50/0x60
[  457.653137]  get_user_pages_fast from get_futex_key+0x88/0x43c
[  457.653137]  get_futex_key from futex_wake+0x5c/0x1c4
[  457.653137]  futex_wake from do_futex+0xd4/0x188
[  457.674896]  do_futex from mm_release+0x10c/0x110
[  457.674896]  mm_release from do_exit+0x2cc/0xc20
[  457.684295]  do_exit from do_group_exit+0x0/0xc8
[  457.688964]  do_group_exit from __sys_trace_return+0x0/0x10
[  457.688964] Exception stack(0xf0aa9fa8 to 0xf0aa9ff0)
[  457.688964] 9fa0:                   b3efe000 b46fe614 00000000 00800000 00000000 00000000
[  457.699676] 9fc0: b3efe000 b46fe614 00801000 00000001 b46fe400 b46fe400 b3efe000 00000000
[  457.699676] 9fe0: 00000001 b46fdd80 b6b42197 b6b03826
[  457.721221] ---[ end trace 0000000000000000 ]---
[  457.732635] page: refcount:0 mapcount:1 mapping:00000000 index:0x0 pfn:0xbdb69
[  457.732635] flags: 0x0(zone=0)
[  457.743011] raw: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  457.751159] raw: 00000000
[  457.751159] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))
[  457.763336] ------------[ cut here ]------------
[  457.763336] kernel BUG at include/linux/mm.h:1444!
[  457.763336] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
[  457.763336] Modules linked in: bq27xxx_battery_hdq bq27xxx_battery phy_omap_usb2 omap_mailbox ehci_omap omap_hdq wire cn pwm_twl pwm_twl_led tps62360_regulator tmp102 hwmon cpufreq_dt nfnetlink
[  457.778472] CPU: 0 UID: 0 PID: 109 Comm: systemd-journal Tainted: G        W          6.12.28 #30
[  457.804687] Tainted: [W]=WARN
[  457.804687] Hardware name: Generic OMAP4 (Flattened Device Tree)
[  457.804687] PC is at do_wp_page+0x9a4/0xea4
[  457.817932] LR is at do_wp_page+0x9a4/0xea4
[  457.817932] pc : [<c02f2bd4>]    lr : [<c02f2bd4>]    psr: 600e0113
[  457.817932] sp : f0195e80  ip : 00000000  fp : f0195ee8
[  457.817932] r10: c2b91480  r9 : 00000002  r8 : c2b91480
[  457.817932] r7 : c2b91480  r6 : 00000000  r5 : efda6ac4  r4 : f0195ee8
[  457.838958] r3 : 00000000  r2 : 00000000  r1 : c13f6c50  r0 : 0000005c
[  457.852081] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  457.852081] Control: 10c5387d  Table: 82b40059  DAC: 00000051
[  457.852081] Register r0 information: non-paged memory
[  457.870117] Register r1 information: non-slab/vmalloc memory
[  457.870117] Register r2 information: NULL pointer
[  457.875823] Register r3 information: NULL pointer
[  457.880554] Register r4 information: 2-page vmalloc region starting at 0xf0194000 allocated at kernel_clone+0xb0/0x414
[  457.880554] Register r5 information: non-slab/vmalloc memory
[  457.880554] Register r6 information: NULL pointer
[  457.901763] Register r7 information: slab vm_area_struct start c2b91480 pointer offset 0 size 72
[  457.906524] Register r8 information: slab vm_area_struct start c2b91480 pointer offset 0 size 72
[  457.915374] Register r9 information: non-paged memory
[  457.929290] Register r10 information: slab vm_area_struct start c2b91480 pointer offset 0 size 72
[  457.929321] Register r11 information: 2-page vmalloc region starting at 0xf0194000 allocated at kernel_clone+0xb0/0x414
[  457.949096] Register r12 information: NULL pointer
[  457.949096] Process systemd-journal (pid: 109, stack limit = 0x40806a5f)
[  457.949096] Stack: (0xf0195e80 to 0xf0196000)
[  457.960662] 5e80: c2bb3600 c02eb5f4 00000001 00000000 00000000 00000000 00000000 c2bb3600
[  457.973266] 5ea0: 002d8788 00000255 b61e2000 f0195fb0 bdb6938f c2bb3600 00000002 c2b91480
[  457.973266] 5ec0: f0195ee8 c02f4d94 b6158000 b6393fff c1d0cc0c 00000000 b69c5fff 00000000
[  457.989715] 5ee0: 00000000 18000801 c2b91480 00100cca 0000008a b61e2000 b61e2000 00000a55
[  457.989715] 5f00: c2b42d80 c2b42d80 bdb6938f 00000000 efda6ac4 ffefd788 c2bb362c 00000000
[  457.989715] 5f20: 00000001 23c69c7f 00000000 0000081f f0195fb0 b61e2ff8 00000255 0000081f
[  458.006164] 5f40: 00000002 c2bb3600 b61e2000 c0d5dd00 000001c9 00000000 2b85e570 00000000
[  458.006164] 5f60: 0000006d 0000081f c130af80 b61e2ff8 f0195fb0 c0d5dba4 19dbb5a5 00000000
[  458.030822] 5f80: be88c2f0 c0113ce0 2b85e570 00000000 be88bff8 23c69c7f b6e399a4 600e0110
[  458.030822] 5fa0: ffffffff 10c5387d 10c5387d c0100f4c 00000000 00000001 00000000 b61e2ff8
[  458.030822] 5fc0: 004bf530 00000000 0000006d 00000000 00000001 19dbb5a5 00000000 be88c2f0
[  458.047271] 5fe0: 00000000 be88c1f0 b6e4b9ec b6e399a4 600e0110 ffffffff 00000000 00000000
[  458.047271] Call trace:
[  458.063720]  do_wp_page from handle_mm_fault+0x758/0x121c
[  458.063720]  handle_mm_fault from do_page_fault+0x15c/0x380
[  458.071716]  do_page_fault from do_DataAbort+0x40/0xc0
[  458.071716]  do_DataAbort from __dabt_usr+0x4c/0x60
[  458.071716] Exception stack(0xf0195fb0 to 0xf0195ff8)
[  458.092529] 5fa0:                                     00000000 00000001 00000000 b61e2ff8
[  458.092529] 5fc0: 004bf530 00000000 0000006d 00000000 00000001 19dbb5a5 00000000 be88c2f0
[  458.108978] 5fe0: 00000000 be88c1f0 b6e4b9ec b6e399a4 600e0110 ffffffff
[  458.108978] Code: eaffff6f e59f14ec e1a00005 ebffc6dd (e7f001f2)
[  458.121765] ---[ end trace 0000000000000000 ]---
[  458.121765] Kernel panic - not syncing: Fatal exception
[  458.121765] ---[ end Kernel panic - not syncing: Fatal exception ]---


My observations:

This feels to be related to a transition from the HighMem to the
Normal region in memory.

[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000080000000-0x00000000afdfffff]
[    0.000000]   HighMem  [mem 0x00000000afe00000-0x00000000bfffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080000000-0x00000000afdfffff]
[    0.000000]   node   0: [mem 0x00000000b0000000-0x00000000bfffffff]

If I boot with "mem=900M" then the same behavior happens around 125MB
instead of 225MB.

I'm currently running with SMP disabled, to aid in debugging.

If I allocate too much memory (via "memhog 250M") the kernel locks up
and I don't get any traces / panics.

A couple searches led me to this ancient thread, which seems like
similar behavior, but of course was resolved 15 years ago.

https://bugs.launchpad.net/ubuntu/+source/linux-ti-omap4/+bug/633227

I figured it can't hurt to throw these observations out to the mailing
lists, in case there are any ideas or if it is actually a bug. I'm not
well-versed in the memory subsystem, so if there are suggestions on
things to try I'm all ears!


Many thanks,

Colin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ