linux-kernel - Forwarded: Private message regarding: [syzbot] [mm?] INFO: rcu detected stall in purge_vmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6964c137.050a0220.eaf7.0097.GAE@google.com>
Date: Mon, 12 Jan 2026 01:39:03 -0800
From: syzbot <syzbot+d8d4c31d40f868eaea30@...kaller.appspotmail.com>
To: linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Forwarded: Private message regarding: [syzbot] [mm?] INFO: rcu
 detected stall in purge_vmap_node

For archival purposes, forwarding an incoming command email to
linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com.

***

Subject: Private message regarding: [syzbot] [mm?] INFO: rcu detected stall in purge_vmap_node
Author: kapoorarnav43@...il.com

From: Arnav Kapoor <kapoorarnav43@...il.com>
Date: Sun, 12 Jan 2026 15:30:00 +0000
Subject: [PATCH] mm/kasan: add cond_resched() in shadow page table walk

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master

Syzbot reported RCU stalls during vmalloc cleanup:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks
task:kworker/0:17 state:R running task
purge_vmap_node+0x1ba/0xad0 mm/vmalloc.c:2299

When CONFIG_PAGE_OWNER is enabled, freeing KASAN shadow pages during
vmalloc cleanup triggers expensive stack unwinding via save_stack() ->
unwind_next_frame(), which acquires RCU read locks. Processing large
vmalloc regions can free thousands of shadow pages without yielding,
causing the worker to monopolize CPU for 10+ seconds, leading to RCU
stalls and potential OOM.

The issue occurs in this call chain:
purge_vmap_node()
-> kasan_release_vmalloc_node()
-> kasan_release_vmalloc() [for each vmap_area]
-> __kasan_release_vmalloc()
-> apply_to_existing_page_range()
-> kasan_depopulate_vmalloc_pte() [for each PTE]
-> __free_page()
-> __reset_page_owner() [CONFIG_PAGE_OWNER]
-> save_stack()
-> unwind_next_frame() [RCU read lock held]

Each shadow page free triggers stack unwinding under RCU lock. A single
large vmalloc region can have thousands of shadow pages, creating an
unbounded RCU critical section.

The previous attempt to fix this added cond_resched() between
processing each vmap_area in kasan_release_vmalloc_node(), but that's
insufficient because a single vmap_area can still contain many pages.

Fix this by adding cond_resched() in the page table walk callback
kasan_depopulate_vmalloc_pte() after every 32 pages. This ensures
regular scheduling points during large shadow region depopulation while
minimizing overhead for typical cases.

The batch size of 32 is chosen to:
- Amortize cond_resched() overhead (typically ~100ns) over multiple pages
- Limit worst-case non-preemptible time to ~3ms on typical hardware
(32 pages × ~100μs per stack unwind)
- Match common TLB and cache behavior

Note: We can't use need_resched() alone because under light CPU load,
need_resched() may remain false while RCU grace periods starve. The
batch count provides a guaranteed upper bound.

Reported-by: syzbot+d8d4c31d40f868eaea30@...kaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d8d4c31d40f868eaea30
Signed-off-by: Arnav Kapoor <kapoorarnav43@...il.com>
---
mm/kasan/shadow.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 000000000000..111111111111 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -468,9 +468,23 @@ static int kasan_depopulate_vmalloc_pte(pte_t *ptep, 
unsigned long addr,
void *unused)
{
pte_t pte;
int none;
+ static DEFINE_PER_CPU(unsigned int, depopulate_batch_count);
+ unsigned int *batch = this_cpu_ptr(&depopulate_batch_count);
arch_leave_lazy_mmu_mode();
+ /*
+ * With CONFIG_PAGE_OWNER, each page free triggers expensive stack
+ * unwinding under RCU lock. Yield periodically to prevent RCU stalls
+ * when processing large vmalloc regions with thousands of shadow pages.
+ */
+ if (++(*batch) >= 32) {
+ *batch = 0;
+ cond_resched();
+ arch_enter_lazy_mmu_mode();
+ }
+
spin_lock(&init_mm.page_table_lock);
pte = ptep_get(ptep);
none = pte_none(pte);




On Monday, 12 January 2026 at 14:10:07 UTC+5:30 syzbot wrote:

Hello, 

syzbot has tested the proposed patch but the reproducer is still triggering 
an issue: 
INFO: rcu detected stall in unwind_next_frame 

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: 
rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6892/1:b..l 
P6893/3:b..l P6746/1:b..l 
rcu: (detected by 1, t=10502 jiffies, g=16737, q=586 ncpus=2) 
task:kworker/u8:18 state:R running task stack:24088 pid:6746 tgid:6746 
ppid:2 task_flags:0x4208060 flags:0x00080000 
Workqueue: kvfree_rcu_reclaim kfree_rcu_monitor 
Call Trace: 
<TASK> 
context_switch kernel/sched/core.c:5256 [inline] 
__schedule+0x1139/0x6150 kernel/sched/core.c:6863 
preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7190 
irqentry_exit+0x1d8/0x8c0 kernel/entry/common.c:216 
asm_sysvec_apic_timer_interrupt+0x1a/0x20 
arch/x86/include/asm/idtentry.h:697 
RIP: 0010:lock_acquire+0x62/0x330 kernel/locking/lockdep.c:5872 
Code: b4 18 12 83 f8 07 0f 87 a2 02 00 00 89 c0 48 0f a3 05 e2 c1 ee 0e 0f 
82 74 02 00 00 8b 35 7a f2 ee 0e 85 f6 0f 85 8d 00 00 00 <48> 8b 44 24 30 
65 48 2b 05 f9 b3 18 12 0f 85 ad 02 00 00 48 83 c4 
RSP: 0018:ffffc90003fbf5b8 EFLAGS: 00000206 
RAX: 0000000000000046 RBX: ffffffff8e3c96a0 RCX: 00000000993b8195 
RDX: 0000000000000000 RSI: ffffffff8daa8a1d RDI: ffffffff8bf2b400 
RBP: 0000000000000002 R08: 00000000e61a05bb R09: 00000000be61a05b 
R10: 0000000000000002 R11: ffff888029058b30 R12: 0000000000000000 
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 
rcu_lock_acquire include/linux/rcupdate.h:331 [inline] 
rcu_read_lock include/linux/rcupdate.h:867 [inline] 
class_rcu_constructor include/linux/rcupdate.h:1195 [inline] 
unwind_next_frame+0xd1/0x20b0 arch/x86/kernel/unwind_orc.c:495 
arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25 
stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122 
kasan_save_stack+0x33/0x60 mm/kasan/common.c:57 
kasan_save_track+0x14/0x30 mm/kasan/common.c:78 
kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:584 
poison_slab_object mm/kasan/common.c:253 [inline] 
__kasan_slab_free+0x5f/0x80 mm/kasan/common.c:285 
kasan_slab_free include/linux/kasan.h:235 [inline] 
slab_free_hook mm/slub.c:2540 [inline] 
slab_free_freelist_hook mm/slub.c:2569 [inline] 
slab_free_bulk mm/slub.c:6703 [inline] 
kmem_cache_free_bulk mm/slub.c:7390 [inline] 
kmem_cache_free_bulk+0x2bf/0x680 mm/slub.c:7369 
kfree_bulk include/linux/slab.h:830 [inline] 
kvfree_rcu_bulk+0x1b7/0x1e0 mm/slab_common.c:1523 
kvfree_rcu_drain_ready mm/slab_common.c:1728 [inline] 
kfree_rcu_monitor+0x1d0/0x2f0 mm/slab_common.c:1801 
process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257 
process_scheduled_works kernel/workqueue.c:3340 [inline] 
worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421 
kthread+0x3c5/0x780 kernel/kthread.c:463 
ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 
</TASK> 
task:sed state:R running task stack:25800 pid:6893 tgid:6893 ppid:6890 
task_flags:0x400000 flags:0x00080000 
Call Trace: 
<TASK> 
context_switch kernel/sched/core.c:5256 [inline] 
__schedule+0x1139/0x6150 kernel/sched/core.c:6863 
preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7047 
preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12 
__raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline] 
_raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186 
spin_unlock include/linux/spinlock.h:391 [inline] 
filemap_map_pages+0x1194/0x1e00 mm/filemap.c:3931 
do_fault_around mm/memory.c:5713 [inline] 
do_read_fault mm/memory.c:5746 [inline] 
do_fault+0x9cd/0x1ad0 mm/memory.c:5889 
do_pte_missing mm/memory.c:4401 [inline] 
handle_pte_fault mm/memory.c:6273 [inline] 
__handle_mm_fault+0x1919/0x2bb0 mm/memory.c:6411 
handle_mm_fault+0x3fe/0xad0 mm/memory.c:6580 
do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336 
handle_page_fault arch/x86/mm/fault.c:1476 [inline] 
exc_page_fault+0x64/0xc0 arch/x86/mm/fault.c:1532 
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618 
RIP: 0033:0x7fc6d7657c50 
RSP: 002b:00007ffe6008c528 EFLAGS: 00010246 
RAX: 0000000000000000 RBX: 00007fc6d768e490 RCX: 00007ffe6008c560 
RDX: 00007fc6d7689d63 RSI: 00007fc6d7689d36 RDI: 00007ffe6008c748 
RBP: 0000000000000041 R08: 00007ffe6008c550 R09: 00007ffe6008c558 
R10: 0000000000000004 R11: 0000000000000246 R12: 00007ffe6008c748 
R13: 00007ffe6008c550 R14: 00007fc6d76ce000 R15: 00005602504c1d98 
</TASK> 
task:udevd state:R running task stack:28152 pid:6892 tgid:6892 ppid:5186 
task_flags:0x400140 flags:0x00080000 
Call Trace: 
<TASK> 
context_switch kernel/sched/core.c:5256 [inline] 
__schedule+0x1139/0x6150 kernel/sched/core.c:6863 
preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7047 
preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12 
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline] 
_raw_spin_unlock_irqrestore+0x61/0x80 kernel/locking/spinlock.c:194 
sock_def_readable+0x15b/0x5d0 net/core/sock.c:3611 
unix_dgram_sendmsg+0xcbd/0x1830 net/unix/af_unix.c:2286 
sock_sendmsg_nosec net/socket.c:727 [inline] 
__sock_sendmsg net/socket.c:742 [inline] 
sock_write_iter+0x566/0x610 net/socket.c:1195 
new_sync_write fs/read_write.c:593 [inline] 
vfs_write+0x7d3/0x11d0 fs/read_write.c:686 
ksys_write+0x1f8/0x250 fs/read_write.c:738 
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] 
do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 
entry_SYSCALL_64_after_hwframe+0x77/0x7f 
RIP: 0033:0x7f6502bdf407 
RSP: 002b:00007ffc4b535850 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 
RAX: ffffffffffffffda RBX: 00007f6502b53880 RCX: 00007f6502bdf407 
RDX: 0000000000000000 RSI: 00007ffc4b5358f7 RDI: 000000000000000a 
RBP: 000000000000000a R08: 0000000000000000 R09: 0000000000000000 
R10: 0000000000000000 R11: 0000000000000202 R12: 00007f6502b536e8 
R13: 0000000000000000 R14: 0000000000000000 R15: 000055685a0a1150 
</TASK> 
rcu: rcu_preempt kthread starved for 10572 jiffies! g16737 f0x0 
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now 
expected behavior. 
rcu: RCU grace-period kthread stack dump: 
task:rcu_preempt state:R running task stack:28120 pid:16 tgid:16 ppid:2 
task_flags:0x208040 flags:0x00080000 
Call Trace: 
<TASK> 
context_switch kernel/sched/core.c:5256 [inline] 
__schedule+0x1139/0x6150 kernel/sched/core.c:6863 
__schedule_loop kernel/sched/core.c:6945 [inline] 
schedule+0xe7/0x3a0 kernel/sched/core.c:6960 
schedule_timeout+0x123/0x290 kernel/time/sleep_timeout.c:99 
rcu_gp_fqs_loop+0x1ea/0xaf0 kernel/rcu/tree.c:2083 
rcu_gp_kthread+0x26d/0x380 kernel/rcu/tree.c:2285 
kthread+0x3c5/0x780 kernel/kthread.c:463 
ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 
</TASK> 
rcu: Stack dump where RCU GP kthread last ran: 
Sending NMI from CPU 1 to CPUs 0: 
NMI backtrace for cpu 0 
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
Google 10/25/2025 
RIP: 0010:pv_native_safe_halt+0xf/0x20 arch/x86/kernel/paravirt.c:82 
Code: a6 5f 02 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 
90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 13 19 12 00 fb f4 <e9> cc 35 03 00 
66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 
RSP: 0018:ffffffff8e007df8 EFLAGS: 000002c6 
RAX: 0000000000186859 RBX: 0000000000000000 RCX: ffffffff8b7846d9 
RDX: 0000000000000000 RSI: ffffffff8daceab2 RDI: ffffffff8bf2b400 
RBP: fffffbfff1c12f68 R08: 0000000000000001 R09: ffffed101708673d 
R10: ffff8880b84339eb R11: ffffffff8e098670 R12: 0000000000000000 
R13: ffffffff8e097b40 R14: ffffffff9088bdd0 R15: 0000000000000000 
FS: 0000000000000000(0000) GS:ffff8881248f5000(0000) knlGS:0000000000000000 
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
CR2: 000055555eaef588 CR3: 00000000522b3000 CR4: 00000000003526f0 
Call Trace: 
<TASK> 
arch_safe_halt arch/x86/include/asm/paravirt.h:107 [inline] 
default_idle+0x13/0x20 arch/x86/kernel/process.c:767 
default_idle_call+0x6c/0xb0 kernel/sched/idle.c:122 
cpuidle_idle_call kernel/sched/idle.c:191 [inline] 
do_idle+0x38d/0x510 kernel/sched/idle.c:332 
cpu_startup_entry+0x4f/0x60 kernel/sched/idle.c:430 
rest_init+0x16b/0x2b0 init/main.c:757 
start_kernel+0x3ef/0x4d0 init/main.c:1206 
x86_64_start_reservations+0x18/0x30 arch/x86/kernel/head64.c:310 
x86_64_start_kernel+0x130/0x190 arch/x86/kernel/head64.c:291 
common_startup_64+0x13e/0x148 
</TASK> 


Tested on: 

commit: 0f61b186 Linux 6.19-rc5 
git tree: upstream 
console output: https://syzkaller.appspot.com/x/log.txt?x=1397199a580000 
kernel config: https://syzkaller.appspot.com/x/.config?x=1859476832863c41 
dashboard link: https://syzkaller.appspot.com/bug?extid=d8d4c31d40f868eaea30 
compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for 
Debian) 2.40 
patch: https://syzkaller.appspot.com/x/patch.diff?x=1704399a580000