[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aKh2vr9HtFRKcnR6@harry>
Date: Fri, 22 Aug 2025 22:55:10 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: syzbot <syzbot+8785aaf121cfb2141e0d@...kaller.appspotmail.com>,
Liam.Howlett@...cle.com, akpm@...ux-foundation.org, jannh@...gle.com,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, pfalcato@...e.de,
syzkaller-bugs@...glegroups.com, vbabka@...e.cz,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>, jpoimboe@...nel.org,
peterz@...radead.org
Subject: Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
On Fri, Aug 22, 2025 at 01:08:02PM +0100, Lorenzo Stoakes wrote:
> +cc Sebastian for RCU ORC change...
>
> +cc Harry for slab side.
+cc Josh and Peter for stack unwinding stuff.
> Pinging Jann for the CONFIG_SLUB_RCU_DEBUG element.
>
> Jann - could this possibly be related to CONFIG_SLUB_RCU_DEBUG? As it seems to
> the stack is within KASAN, but no KASAN report so maybe it's KASAN itself that's
> having an issue?
>
> Though I'm thinking maybe it's the orc unwinder itself that could be problematic
> here (yet invoked by CONFIG_SLUB_RCU_DEBUG though)... and yeah kinda suspcious
> because:
>
> - We have two threads freeing VMAs using SLAB_TYPESAFE_BY_RCU
> - CONFIG_SLUB_RCU_DEBUG means that we use KASAN to save an aux stack, which
> makes us do an unwind via ORC, which then takes an RCU read lock on
> unwind_next_frame(), and both are doing this unwinding at the time of report.
> - ???
> - Somehow things get locked up?
>
> I'm not an RCU expert (clearly :) so I'm not sure exactly how this could result
> in a stall, but it's suspicious.
Can this be because of misleading ORC data or logical error in ORC unwinder
that makes it fall into an infinite loop (unwind_done() never returning
true in arch_stack_walk())?
...because the reported line number reported doesn't really make sense
as a cause of stalls.
--
Cheers,
Harry / Hyeonggon
> On Thu, Aug 21, 2025 at 09:15:37PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=142508fb116c212f
>
> lockdep (CONFIG_PROVE_LOCKING) is on, so I'm guessing there's no deadlock here.
>
> CONFIG_DEBUG_VM_MAPLE_TREE is enabled, which will cause _major_ slowdown on VMA
> operations as the tree is constantly being fully validated.
>
> This may explain the stalls...
>
> > dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
> > compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000
>
> No C repro yet...
>
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+8785aaf121cfb2141e0d@...kaller.appspotmail.com
> >
> > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
> > rcu: (detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)
>
> So 105s, or 1m45s, that's pretty long...
>
> > task:dhcpcd state:R running task stack:28896 pid:6030 tgid:6030 ppid:5513 task_flags:0x400040 flags:0x00004002
> > Call Trace:
> > <TASK>
> > context_switch kernel/sched/core.c:5357 [inline]
> > __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> > preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> > irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> > asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> > RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664
>
> Hmm, while the line number is not pertinent, I notice unwind_next_frame() has:
>
> guard(rcu)()
>
> In it from commit 14daa3bca217 ("x86: Use RCU in all users of
> __module_address().") though from Jan 2025...
>
> This is defined (took me a while to track down!!) in include/linux/rcupdate.h:
>
> DEFINE_LOCK_GUARD_0(rcu,
> do {
> rcu_read_lock();
> /*
> * sparse doesn't call the cleanup function,
> * so just release immediately and don't track
> * the context. We don't need to anyway, since
> * the whole point of the guard is to not need
> * the explicit unlock.
> */
> __release(RCU);
> } while (0),
> rcu_read_unlock())
>
> Meaning it's equivalent to a scoped rcu_read_lock() / rcu_read_unlock().
>
> But since no C repro this is likely a race of some kind that might be very hard to hit.
>
> > Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
> > RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
> > RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
> > RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
> > RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
> > R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
> > R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
> > arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> > stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> > kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> > kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
> > slab_free_hook mm/slub.c:2378 [inline]
>
> Invokes the CONFIG_SLUB_RCU_DEBUG stack trace saving stuff
>
> > slab_free mm/slub.c:4680 [inline]
> > kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782
>
> Note that VMAs are SLAB_TYPESAFE_BY_RCU so maybe that's somehow playing a role
> here?
>
> In free_slab():
>
> if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU))
> call_rcu(&slab->rcu_head, rcu_free_slab);
>
> > vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> > do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> > do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> > __vm_munmap+0x19a/0x390 mm/vma.c:3155
> > __do_sys_munmap mm/mmap.c:1080 [inline]
> > __se_sys_munmap mm/mmap.c:1077 [inline]
> > __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Seems a normal trace for an unmap, note (inlining removes stuff here) it's:
>
> vms_complete_munmap_vmas() -> remove_vma() -> vm_area_free() -> kmem_cache_free()
>
> > RIP: 0033:0x7fb13ec2f2e7
> > RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
> > RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
> > RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
> > RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
> > R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
> > R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
> > </TASK>
> > task:syz-executor state:R running task stack:27632 pid:6031 tgid:6031 ppid:5870 task_flags:0x400000 flags:0x00004000
> > Call Trace:
> > <TASK>
> > context_switch kernel/sched/core.c:5357 [inline]
> > __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> > preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
> > preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
> > __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
> > _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
> > spin_unlock include/linux/spinlock.h:391 [inline]
> > filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
> > do_fault_around mm/memory.c:5531 [inline]
> > do_read_fault mm/memory.c:5564 [inline]
> > do_fault mm/memory.c:5707 [inline]
> > do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
> > handle_pte_fault mm/memory.c:6052 [inline]
> > __handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
> > handle_mm_fault+0x589/0xd10 mm/memory.c:6364
> > do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
> > handle_page_fault arch/x86/mm/fault.c:1476 [inline]
> > exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
> > asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
>
> Faulting path being context switched on unlock of PTE spinlock...
>
> > RIP: 0033:0x7f54cd7177c7
> > RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
> > RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
> > RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> > R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
> > </TASK>
> > task:kworker/0:3 state:R running task stack:25368 pid:1208 tgid:1208 ppid:2 task_flags:0x4208060 flags:0x00004000
> > Workqueue: events_power_efficient gc_worker
> > Call Trace:
> > <TASK>
> > context_switch kernel/sched/core.c:5357 [inline]
> > __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> > preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> > irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> > asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> > RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
> > Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
> > RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
> > RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
> > RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
> > RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
> > R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
> > R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
> > gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
> > process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
> > process_scheduled_works kernel/workqueue.c:3319 [inline]
> > worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
> > kthread+0x3c5/0x780 kernel/kthread.c:463
> > ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > </TASK>
> > task:dhcpcd state:R running task stack:26072 pid:6029 tgid:6029 ppid:5513 task_flags:0x400040 flags:0x00004002
> > Call Trace:
> > <TASK>
> > context_switch kernel/sched/core.c:5357 [inline]
> > __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> > preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> > irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> > RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
> > RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
> > Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
> > RSP: 0018:ffffc90003337648 EFLAGS: 00000202
> > RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
> > RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
> > RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
> > R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
> > R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
> > orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
> > unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494
>
> This is also RCU-read locked.
>
> > arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> > stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> > kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> > kasan_save_track+0x14/0x30 mm/kasan/common.c:68
> > poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
> > __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
> > kmalloc_noprof include/linux/slab.h:905 [inline]
> > slab_free_hook mm/slub.c:2369 [inline]
> > slab_free mm/slub.c:4680 [inline]
> > kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
> > vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> > do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> > do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> > __vm_munmap+0x19a/0x390 mm/vma.c:3155
>
> Simultaneous unmap?
>
> > __do_sys_munmap mm/mmap.c:1080 [inline]
> > __se_sys_munmap mm/mmap.c:1077 [inline]
> > __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7fb13ec2f2e7
> > RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
> > RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
> > RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
> > RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
> > R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
> > R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
> > </TASK>
> >
> >
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@...glegroups.com.
> >
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >
> > If the report is already addressed, let syzbot know by replying with:
> > #syz fix: exact-commit-title
> >
> > If you want syzbot to run the reproducer, reply with:
> > #syz test: git://repo/address.git branch-or-commit-hash
> > If you attach or paste a git patch, syzbot will apply it before testing.
> >
> > If you want to overwrite report's subsystems, reply with:
> > #syz set subsystems: new-subsystem
> > (See the list of subsystem names on the web dashboard)
> >
> > If the report is a duplicate of another one, reply with:
> > #syz dup: exact-subject-of-another-report
> >
> > If you want to undo deduplication, reply with:
> > #syz undup
>
> Cheers, Lorenzo
Powered by blists - more mailing lists