linux-kernel - Re: [syzbot] [io-uring?] INFO: rcu detected stall in sys_io_uring

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0d88fc54-93a7-4075-996f-b2d343c0ba28@kernel.dk>
Date: Fri, 20 Sep 2024 02:51:09 -0600
From: Jens Axboe <axboe@...nel.dk>
To: syzbot <syzbot+5fca234bd7eb378ff78e@...kaller.appspotmail.com>,
 asml.silence@...il.com, io-uring@...r.kernel.org,
 linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [io-uring?] INFO: rcu detected stall in
 sys_io_uring_enter (2)

On 9/19/24 11:20 PM, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    98f7e32f20d2 Linux 6.11
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=17271c07980000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=c78874575ba70f27
> dashboard link: https://syzkaller.appspot.com/bug?extid=5fca234bd7eb378ff78e
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/20d79fec7eb2/disk-98f7e32f.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/57606ddb0989/vmlinux-98f7e32f.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/901e6ba22e57/bzImage-98f7e32f.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+5fca234bd7eb378ff78e@...kaller.appspotmail.com
> 
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: 	1-...!: (0 ticks this GP) idle=11bc/1/0x4000000000000000 softirq=116660/116660 fqs=17
> rcu: 	(detected by 0, t=10502 jiffies, g=200145, q=315 ncpus=2)
> Sending NMI from CPU 0 to CPUs 1:
> NMI backtrace for cpu 1
> CPU: 1 UID: 0 PID: 6917 Comm: syz.2.16175 Not tainted 6.11.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
> RIP: 0010:match_held_lock+0x0/0xb0 kernel/locking/lockdep.c:5204
> Code: 08 75 11 48 89 d8 48 83 c4 10 5b 41 5e 41 5f c3 cc cc cc cc e8 11 f9 ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <55> 53 bd 01 00 00 00 48 39 77 10 74 67 48 89 fb 81 7f 20 00 00 20
> RSP: 0018:ffffc90000a18d10 EFLAGS: 00000083
> RAX: 0000000000000002 RBX: ffff888057310b08 RCX: ffff888057310000
> RDX: ffff888057310000 RSI: ffff8880b892c898 RDI: ffff888057310b08
> RBP: 0000000000000001 R08: ffffffff8180cfbe R09: 0000000000000000
> R10: ffff88803641a340 R11: ffffed1006c8346b R12: 0000000000000046
> R13: ffff888057310000 R14: 00000000ffffffff R15: ffff8880b892c898
> FS:  00007f183bd6c6c0(0000) GS:ffff8880b8900000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f7205d61f98 CR3: 000000001bb10000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <NMI>
>  </NMI>
>  <IRQ>
>  __lock_is_held kernel/locking/lockdep.c:5500 [inline]
>  lock_is_held_type+0xa9/0x190 kernel/locking/lockdep.c:5831
>  lock_is_held include/linux/lockdep.h:249 [inline]
>  __run_hrtimer kernel/time/hrtimer.c:1655 [inline]
>  __hrtimer_run_queues+0x2d9/0xd50 kernel/time/hrtimer.c:1753
>  hrtimer_interrupt+0x396/0x990 kernel/time/hrtimer.c:1815
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1032 [inline]
>  __sysvec_apic_timer_interrupt+0x110/0x3f0 arch/x86/kernel/apic/apic.c:1049
>  instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
>  sysvec_apic_timer_interrupt+0xa1/0xc0 arch/x86/kernel/apic/apic.c:1043
>  </IRQ>
>  <TASK>
>  asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> RIP: 0010:lock_acquire+0x264/0x550 kernel/locking/lockdep.c:5763
> Code: 2b 00 74 08 4c 89 f7 e8 ea e1 87 00 f6 44 24 61 02 0f 85 85 01 00 00 41 f7 c7 00 02 00 00 74 01 fb 48 c7 44 24 40 0e 36 e0 45 <4b> c7 44 25 00 00 00 00 00 43 c7 44 25 09 00 00 00 00 43 c7 44 25
> RSP: 0018:ffffc9000c5c79a0 EFLAGS: 00000206
> RAX: 0000000000000001 RBX: 1ffff920018b8f40 RCX: 2e46bf6ba4daf100
> RDX: dffffc0000000000 RSI: ffffffff8beae6c0 RDI: ffffffff8c3fbac0
> RBP: ffffc9000c5c7ae8 R08: ffffffff93fa6967 R09: 1ffffffff27f4d2c
> R10: dffffc0000000000 R11: fffffbfff27f4d2d R12: 1ffff920018b8f3c
> R13: dffffc0000000000 R14: ffffc9000c5c7a00 R15: 0000000000000246
>  __mutex_lock_common kernel/locking/mutex.c:608 [inline]
>  __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
>  io_cqring_do_overflow_flush io_uring/io_uring.c:644 [inline]
>  io_cqring_wait io_uring/io_uring.c:2486 [inline]
>  __do_sys_io_uring_enter io_uring/io_uring.c:3255 [inline]
>  __se_sys_io_uring_enter+0x1c2a/0x2670 io_uring/io_uring.c:3147
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f

I know there's no reproducer for this and hence it can't get tested, but
this is obviously some syzbot nonsense that just wildly overflows the
CQE list. The below should fix it, will add it.

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index f3570e81ecb4..c03d523ff468 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -635,6 +635,21 @@ static void __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool dying)
 		}
 		list_del(&ocqe->list);
 		kfree(ocqe);
+
+		/*
+		 * For silly syzbot cases that deliberately overflow by huge
+		 * amounts, check if we need to resched and drop and
+		 * reacquire the locks if so. Nothing real would ever hit this.
+		 * Ideally we'd have a non-posting unlock for this, but hard
+		 * to care for a non-real case.
+		 */
+		if (need_resched()) {
+			io_cq_unlock_post(ctx);
+			mutex_unlock(&ctx->uring_lock);
+			cond_resched();
+			mutex_lock(&ctx->uring_lock);
+			io_cq_lock(ctx);
+		}
 	}
 
 	if (list_empty(&ctx->cq_overflow_list)) {

-- 
Jens Axboe