linux-kernel - Re: [syzbot] [mm?] INFO: rcu detected stall in sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aKh2vr9HtFRKcnR6@harry>
Date: Fri, 22 Aug 2025 22:55:10 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: syzbot <syzbot+8785aaf121cfb2141e0d@...kaller.appspotmail.com>,
        Liam.Howlett@...cle.com, akpm@...ux-foundation.org, jannh@...gle.com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org, pfalcato@...e.de,
        syzkaller-bugs@...glegroups.com, vbabka@...e.cz,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>, jpoimboe@...nel.org,
        peterz@...radead.org
Subject: Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)

On Fri, Aug 22, 2025 at 01:08:02PM +0100, Lorenzo Stoakes wrote:
> +cc Sebastian for RCU ORC change...
> 
> +cc Harry for slab side.

+cc Josh and Peter for stack unwinding stuff.

> Pinging Jann for the CONFIG_SLUB_RCU_DEBUG element.
> 
> Jann - could this possibly be related to CONFIG_SLUB_RCU_DEBUG? As it seems to
> the stack is within KASAN, but no KASAN report so maybe it's KASAN itself that's
> having an issue?
> 
> Though I'm thinking maybe it's the orc unwinder itself that could be problematic
> here (yet invoked by CONFIG_SLUB_RCU_DEBUG though)... and yeah kinda suspcious
> because:
> 
> - We have two threads freeing VMAs using SLAB_TYPESAFE_BY_RCU
> - CONFIG_SLUB_RCU_DEBUG means that we use KASAN to save an aux stack, which
>   makes us do an unwind via ORC, which then takes an RCU read lock on
>   unwind_next_frame(), and both are doing this unwinding at the time of report.
> - ???
> - Somehow things get locked up?
> 
> I'm not an RCU expert (clearly :) so I'm not sure exactly how this could result
> in a stall, but it's suspicious.

Can this be because of misleading ORC data or logical error in ORC unwinder
that makes it fall into an infinite loop (unwind_done() never returning
true in arch_stack_walk())?

...because the reported line number reported doesn't really make sense
as a cause of stalls.

-- 
Cheers,
Harry / Hyeonggon

> On Thu, Aug 21, 2025 at 09:15:37PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=142508fb116c212f
> 
> lockdep (CONFIG_PROVE_LOCKING) is on, so I'm guessing there's no deadlock here.
> 
> CONFIG_DEBUG_VM_MAPLE_TREE is enabled, which will cause _major_ slowdown on VMA
> operations as the tree is constantly being fully validated.
> 
> This may explain the stalls...
> 
> > dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
> > compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000
> 
> No C repro yet...
> 
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+8785aaf121cfb2141e0d@...kaller.appspotmail.com
> >
> > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
> > rcu: 	(detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)
> 
> So 105s, or 1m45s, that's pretty long...
> 
> > task:dhcpcd          state:R  running task     stack:28896 pid:6030  tgid:6030  ppid:5513   task_flags:0x400040 flags:0x00004002
> > Call Trace:
> >  <TASK>
> >  context_switch kernel/sched/core.c:5357 [inline]
> >  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> >  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> >  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> >  asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> > RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664
> 
> Hmm, while the line number is not pertinent, I notice unwind_next_frame() has:
> 
> guard(rcu)()
> 
> In it from commit 14daa3bca217 ("x86: Use RCU in all users of
> __module_address().") though from Jan 2025...
> 
> This is defined (took me a while to track down!!) in include/linux/rcupdate.h:
> 
> DEFINE_LOCK_GUARD_0(rcu,
> 	do {
> 		rcu_read_lock();
> 		/*
> 		 * sparse doesn't call the cleanup function,
> 		 * so just release immediately and don't track
> 		 * the context. We don't need to anyway, since
> 		 * the whole point of the guard is to not need
> 		 * the explicit unlock.
> 		 */
> 		__release(RCU);
> 	} while (0),
> 	rcu_read_unlock())
> 
> Meaning it's equivalent to a scoped rcu_read_lock() / rcu_read_unlock().
> 
> But since no C repro this is likely a race of some kind that might be very hard to hit.
> 
> > Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
> > RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
> > RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
> > RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
> > RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
> > R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
> > R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
> >  arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> >  stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> >  kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> >  kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
> >  slab_free_hook mm/slub.c:2378 [inline]
> 
> Invokes the CONFIG_SLUB_RCU_DEBUG stack trace saving stuff
> 
> >  slab_free mm/slub.c:4680 [inline]
> >  kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782
> 
> Note that VMAs are SLAB_TYPESAFE_BY_RCU so maybe that's somehow playing a role
> here?
> 
> In free_slab():
> 
> 	if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU))
> 		call_rcu(&slab->rcu_head, rcu_free_slab);
> 
> >  vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> >  do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> >  do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> >  __vm_munmap+0x19a/0x390 mm/vma.c:3155
> >  __do_sys_munmap mm/mmap.c:1080 [inline]
> >  __se_sys_munmap mm/mmap.c:1077 [inline]
> >  __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> >  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> >  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> Seems a normal trace for an unmap, note (inlining removes stuff here) it's:
> 
> vms_complete_munmap_vmas() -> remove_vma() -> vm_area_free() -> kmem_cache_free()
> 
> > RIP: 0033:0x7fb13ec2f2e7
> > RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
> > RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
> > RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
> > RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
> > R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
> > R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
> >  </TASK>
> > task:syz-executor    state:R  running task     stack:27632 pid:6031  tgid:6031  ppid:5870   task_flags:0x400000 flags:0x00004000
> > Call Trace:
> >  <TASK>
> >  context_switch kernel/sched/core.c:5357 [inline]
> >  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> >  preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
> >  preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
> >  __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
> >  _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
> >  spin_unlock include/linux/spinlock.h:391 [inline]
> >  filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
> >  do_fault_around mm/memory.c:5531 [inline]
> >  do_read_fault mm/memory.c:5564 [inline]
> >  do_fault mm/memory.c:5707 [inline]
> >  do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
> >  handle_pte_fault mm/memory.c:6052 [inline]
> >  __handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
> >  handle_mm_fault+0x589/0xd10 mm/memory.c:6364
> >  do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
> >  handle_page_fault arch/x86/mm/fault.c:1476 [inline]
> >  exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
> >  asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
> 
> Faulting path being context switched on unlock of PTE spinlock...
> 
> > RIP: 0033:0x7f54cd7177c7
> > RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
> > RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
> > RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> > R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
> >  </TASK>
> > task:kworker/0:3     state:R  running task     stack:25368 pid:1208  tgid:1208  ppid:2      task_flags:0x4208060 flags:0x00004000
> > Workqueue: events_power_efficient gc_worker
> > Call Trace:
> >  <TASK>
> >  context_switch kernel/sched/core.c:5357 [inline]
> >  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> >  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> >  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> >  asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> > RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
> > Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
> > RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
> > RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
> > RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
> > RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
> > R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
> > R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
> >  gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
> >  process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
> >  process_scheduled_works kernel/workqueue.c:3319 [inline]
> >  worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
> >  kthread+0x3c5/0x780 kernel/kthread.c:463
> >  ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
> >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> >  </TASK>
> > task:dhcpcd          state:R  running task     stack:26072 pid:6029  tgid:6029  ppid:5513   task_flags:0x400040 flags:0x00004002
> > Call Trace:
> >  <TASK>
> >  context_switch kernel/sched/core.c:5357 [inline]
> >  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> >  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> >  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> >  asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> > RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
> > RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
> > Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
> > RSP: 0018:ffffc90003337648 EFLAGS: 00000202
> > RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
> > RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
> > RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
> > R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
> > R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
> >  orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
> >  unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494
> 
> This is also RCU-read locked.
> 
> >  arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> >  stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> >  kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> >  kasan_save_track+0x14/0x30 mm/kasan/common.c:68
> >  poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
> >  __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
> >  kmalloc_noprof include/linux/slab.h:905 [inline]
> >  slab_free_hook mm/slub.c:2369 [inline]
> >  slab_free mm/slub.c:4680 [inline]
> >  kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
> >  vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> >  do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> >  do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> >  __vm_munmap+0x19a/0x390 mm/vma.c:3155
> 
> Simultaneous unmap?
> 
> >  __do_sys_munmap mm/mmap.c:1080 [inline]
> >  __se_sys_munmap mm/mmap.c:1077 [inline]
> >  __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> >  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> >  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7fb13ec2f2e7
> > RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
> > RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
> > RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
> > RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
> > R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
> > R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
> >  </TASK>
> >
> >
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@...glegroups.com.
> >
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >
> > If the report is already addressed, let syzbot know by replying with:
> > #syz fix: exact-commit-title
> >
> > If you want syzbot to run the reproducer, reply with:
> > #syz test: git://repo/address.git branch-or-commit-hash
> > If you attach or paste a git patch, syzbot will apply it before testing.
> >
> > If you want to overwrite report's subsystems, reply with:
> > #syz set subsystems: new-subsystem
> > (See the list of subsystem names on the web dashboard)
> >
> > If the report is a duplicate of another one, reply with:
> > #syz dup: exact-subject-of-another-report
> >
> > If you want to undo deduplication, reply with:
> > #syz undup
> 
> Cheers, Lorenzo