[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFQaY4Bxle8-GT6O@harry>
Date: Thu, 19 Jun 2025 23:10:43 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: Uladzislau Rezki <urezki@...il.com>, oe-lkp@...ts.linux.dev, lkp@...el.com,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>, Baoquan He <bhe@...hat.com>,
Adrian Huang <ahuang12@...ovo.com>,
Christop Hellwig <hch@...radead.org>,
Mateusz Guzik <mjguzik@...il.com>, linux-mm@...ck.org,
Suren Baghdasaryan <surenb@...gle.com>,
Kent Overstreet <kent.overstreet@...ux.dev>
Subject: Kernel crash due to alloc_tag_top_users() being called when
!mem_profiling_support?
On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
>
> Hello,
>
> for this change, we reported
> "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> in
> https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
>
> at that time, we made some tests with x86_64 config which runs well.
>
> now we noticed the commit is in mainline now.
(Re-sending due to not Ccing people and the list...)
Hi, I'm facing the same error on my testing environment.
I think this is related to memory allocation profiling & code tagging
subsystems rather than vmalloc, so let's add related folks to Cc.
After a quick skimming of the code, it seems the condition
to trigger this is that on 1) MEM_ALLOC_PROFILING is compiled but
2) not enabled by default. and 3) allocation somehow failed, calling
alloc_tag_top_users().
I see "Memory allocation profiling is not supported!" in the dmesg,
which means it did not alloc & inititialize alloc_tag_cttype properly,
but alloc_tag_top_users() tries to acquire the semaphore.
I think the kernel should not call alloc_tag_top_users() at all (or it
should return an error) if mem_profiling_support == false?
Does the following work on your testing environment?
(Only did very light testing on my QEMU, but seems to fix the issue for me.)
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index d48b80f3f007..57d4d5673855 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -134,7 +134,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
struct codetag_bytes n;
unsigned int i, nr = 0;
- if (can_sleep)
+ if (!mem_profiling_support)
+ return 0;
+ else if (can_sleep)
codetag_lock_module_list(alloc_tag_cttype, true);
else if (!codetag_trylock_module_list(alloc_tag_cttype))
return 0;
> the config still has expected diff with parent:
>
> --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
> +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
> @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
> CONFIG_TEST_MISC_MINOR=m
> # CONFIG_TEST_LKM is not set
> CONFIG_TEST_BITOPS=m
> -CONFIG_TEST_VMALLOC=m
> +CONFIG_TEST_VMALLOC=y
> # CONFIG_TEST_BPF is not set
> CONFIG_FIND_BIT_BENCHMARK=m
> # CONFIG_TEST_FIRMWARE is not set
>
>
> then we noticed similar random issue with x86_64 randconfig this time.
>
> 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
> :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
> :199 34% 67:200 dmesg.Mem-Info
> :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
> :199 34% 67:200 dmesg.RIP:down_read_trylock
>
> we don't have enough knowledge to understand the relationship between code
> change and the random issues. just report what we obsverved in our tests FYI.
>
> below is full report.
>
>
>
> kernel test robot noticed "Kernel_panic-not_syncing:Fatal_exception" on:
>
> commit: 2d76e79315e403aab595d4c8830b7a46c19f0f3b ("lib/test_vmalloc.c: allow built-in execution")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master e04c78d86a9699d136910cfc0bdcf01087e3267e]
> [test failed on linux-next/master 050f8ad7b58d9079455af171ac279c4b9b828c11]
>
> in testcase: boot
>
> config: x86_64-randconfig-161-20250614
> compiler: gcc-12
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@...el.com>
> | Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
>
>
> [ 36.902716][ T60] vmalloc_node_range for size 8192 failed: Address range restricted to 0xffffc90000000000 - 0xffffe8ffffffffff
> [ 36.903981][ T60] vmalloc_test/0: vmalloc error: size 4096, vm_struct allocation failed, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
> [ 36.905195][ T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY
> [ 36.905201][ T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 36.905203][ T60] Call Trace:
> [ 36.905206][ T60] <TASK>
> [ 36.905209][ T60] dump_stack_lvl+0x87/0xd6
> [ 36.905223][ T60] warn_alloc+0x15e/0x291
> [ 36.905230][ T60] ? has_managed_dma+0x37/0x37
> [ 36.905237][ T60] ? __get_vm_area_node+0x33a/0x3c0
> [ 36.905244][ T60] ? __get_vm_area_node+0x33a/0x3c0
> [ 36.905250][ T60] __vmalloc_node_range_noprof+0x170/0x306
> [ 36.905255][ T60] ? __vmalloc_area_node+0x460/0x460
> [ 36.905260][ T60] ? test_func+0x2ae/0x469
> [ 36.905264][ T60] __vmalloc_node_noprof+0xb8/0xd9
> [ 36.905267][ T60] ? test_func+0x2ae/0x469
> [ 36.905272][ T60] align_shift_alloc_test+0xa8/0x165
> [ 36.905277][ T60] test_func+0x2ae/0x469
> [ 36.905281][ T60] ? pcpu_alloc_test+0x31b/0x31b
> [ 36.905286][ T60] ? __kthread_parkme+0xcb/0x1a3
> [ 36.905293][ T60] ? pcpu_alloc_test+0x31b/0x31b
> [ 36.905297][ T60] kthread+0x452/0x464
> [ 36.905301][ T60] ? kthread_is_per_cpu+0x51/0x51
> [ 36.905304][ T60] ? _raw_spin_unlock_irq+0x23/0x35
> [ 36.905308][ T60] ? kthread_is_per_cpu+0x51/0x51
> [ 36.905311][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413)
> [ 36.905314][ T60] ret_from_fork (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/kernel/process.c:153)
> [ 36.905318][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413)
> [ 36.905321][ T60] ret_from_fork_asm (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/entry/entry_64.S:255)
> [ 36.905330][ T60] </TASK>
> [ 36.905332][ T60] Mem-Info:
> [ 36.919941][ T60] active_anon:0 inactive_anon:0 isolated_anon:0
> [ 36.919941][ T60] active_file:0 inactive_file:0 isolated_file:0
> [ 36.919941][ T60] unevictable:41612 dirty:0 writeback:0
> [ 36.919941][ T60] slab_reclaimable:7429 slab_unreclaimable:145259
> [ 36.919941][ T60] mapped:0 shmem:0 pagetables:145
> [ 36.919941][ T60] sec_pagetables:0 bounce:0
> [ 36.919941][ T60] kernel_misc_reclaimable:0
> [ 36.919941][ T60] free:3233392 free_pcp:1185 free_cma:0
> [ 36.923830][ T60] Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB writeback_tmp:0kB kernel_stack:1952kB pagetables:580kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
> [ 36.926265][ T60] DMA free:15360kB boost:0kB min:16kB low:28kB high:40kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 36.928855][ T60] lowmem_reserve[]: 0 2991 13741 13741
> [ 36.929411][ T60] DMA32 free:3060560kB boost:0kB min:3224kB low:6244kB high:9264kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:3063680kB mlocked:0kB bounce:0kB free_pcp:3120kB local_pcp:3120kB free_cma:0kB
> [ 36.932080][ T60] lowmem_reserve[]: 0 0 10749 10749
> [ 36.932604][ T60] Normal free:9857648kB boost:0kB min:11744kB low:22748kB high:33752kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB writepending:0kB present:13631488kB managed:11007884kB mlocked:0kB bounce:0kB free_pcp:1620kB local_pcp:740kB free_cma:0kB
> [ 36.935336][ T60] lowmem_reserve[]: 0 0 0 0
> [ 36.935802][ T60] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15360kB
> [ 36.936931][ T60] DMA32: 0*4kB 0*8kB 1*16kB (M) 2*32kB (M) 2*64kB (M) 1*128kB (M) 2*256kB (M) 2*512kB (M) 1*1024kB (M) 1*2048kB (M) 746*4096kB (M) = 3060560kB
> [ 36.938318][ T60] Normal: 6*4kB (ME) 2*8kB (ME) 7*16kB (UME) 5*32kB (M) 3*64kB (ME) 4*128kB (M) 6*256kB (UME) 2*512kB (M) 1*1024kB (M) 3*2048kB (UME) 2404*4096kB (M) = 9857528kB
> [ 36.939849][ T60] 41618 total pagecache pages
> [ 36.940324][ T60] 4194174 pages RAM
> [ 36.940721][ T60] 0 pages HighMem/MovableOnly
> [ 36.941188][ T60] 672443 pages reserved
> [ 36.941626][ T60] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#1] SMP KASAN
> [ 36.942185][ T60] KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> [ 36.942185][ T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY
> [ 36.942185][ T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 36.942185][ T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
> [ 36.942185][ T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
> [ 36.942185][ T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
> [ 36.942185][ T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
> [ 36.942185][ T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> [ 36.942185][ T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
> [ 36.942185][ T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
> [ 36.942185][ T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
> [ 36.942185][ T60] FS: 0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
> [ 36.942185][ T60] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 36.942185][ T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
> [ 36.942185][ T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 36.942185][ T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 36.942185][ T60] Call Trace:
> [ 36.942185][ T60] <TASK>
> [ 36.942185][ T60] ? clear_nonspinnable+0x32/0x32
> [ 36.942185][ T60] ? vprintk_emit+0x165/0x194
> [ 36.942185][ T60] codetag_trylock_module_list+0xd/0x19
> [ 36.942185][ T60] alloc_tag_top_users+0x95/0x216
> [ 36.942185][ T60] ? _printk+0xad/0xdf
> [ 36.942185][ T60] ? reserve_module_tags+0x308/0x308
> [ 36.942185][ T60] __show_mem+0x167/0x54b
> [ 36.942185][ T60] ? _printk+0xad/0xdf
> [ 36.942185][ T60] ? printk_get_console_flush_type+0x272/0x272
> [ 36.942185][ T60] ? show_free_areas+0x115d/0x115d
> [ 36.942185][ T60] ? tracer_hardirqs_on+0x1b/0x28d
> [ 36.942185][ T60] ? dump_stack_lvl+0x91/0xd6
> [ 36.942185][ T60] ? warn_alloc+0x251/0x291
> [ 36.942185][ T60] warn_alloc+0x251/0x291
> [ 36.942185][ T60] ? has_managed_dma+0x37/0x37
> [ 36.942185][ T60] ? __get_vm_area_node+0x33a/0x3c0
> [ 36.942185][ T60] __vmalloc_node_range_noprof+0x170/0x306
> [ 36.942185][ T60] ? __vmalloc_area_node+0x460/0x460
> [ 36.942185][ T60] ? test_func+0x2ae/0x469
> [ 36.942185][ T60] __vmalloc_node_noprof+0xb8/0xd9
> [ 36.942185][ T60] ? test_func+0x2ae/0x469
> [ 36.942185][ T60] align_shift_alloc_test+0xa8/0x165
> [ 36.942185][ T60] test_func+0x2ae/0x469
> [ 36.942185][ T60] ? pcpu_alloc_test+0x31b/0x31b
> [ 36.942185][ T60] ? __kthread_parkme+0xcb/0x1a3
> [ 36.942185][ T60] ? pcpu_alloc_test+0x31b/0x31b
> [ 36.942185][ T60] kthread+0x452/0x464
> [ 36.942185][ T60] ? kthread_is_per_cpu+0x51/0x51
> [ 36.942185][ T60] ? _raw_spin_unlock_irq+0x23/0x35
> [ 36.942185][ T60] ? kthread_is_per_cpu+0x51/0x51
> [ 36.942185][ T60] ret_from_fork+0x20/0x54
> [ 36.942185][ T60] ? kthread_is_per_cpu+0x51/0x51
> [ 36.942185][ T60] ret_from_fork_asm+0x11/0x20
> [ 36.942185][ T60] </TASK>
> [ 36.942185][ T60] Modules linked in:
> [ 37.000652][ T60] ---[ end trace 0000000000000000 ]---
> [ 37.001188][ T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
> [ 37.001731][ T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
> [ 37.003488][ T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
> [ 37.004072][ T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
> [ 37.004848][ T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> [ 37.005610][ T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
> [ 37.006381][ T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
> [ 37.007178][ T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
> [ 37.007940][ T60] FS: 0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
> [ 37.008792][ T60] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 37.009411][ T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
> [ 37.010175][ T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 37.010950][ T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 37.011716][ T60] Kernel panic - not syncing: Fatal exception
> [ 37.012397][ T60] Kernel Offset: 0x6200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250618/202506181351.bba867dd-lkp@intel.com
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
>
--
Cheers,
Harry / Hyeonggon
Powered by blists - more mailing lists