[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bywi4rrt2iyjrpdxuyg4tqi3byrchydi42mkfpxsbv6n62inlz@xeswhf7ftekb>
Date: Sun, 12 May 2024 16:41:11 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: syzbot <syzbot+a941018a091f1a1f9546@...kaller.appspotmail.com>,
akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, lstoakes@...il.com,
syzkaller-bugs@...glegroups.com, vbabka@...e.cz
Subject: Re: [syzbot] [mm?] INFO: rcu detected stall in validate_mm (3)
* Liam R. Howlett <Liam.Howlett@...cle.com> [240512 13:28]:
> * syzbot <syzbot+a941018a091f1a1f9546@...kaller.appspotmail.com> [240512 05:19]:
> > Hello,
> >
> > syzbot found the following issue on:
>
> First, excellent timing of this report - Sunday on an -rc7 release the
> day before LSF/MM/BPF.
>
> >
> > HEAD commit: dccb07f2914c Merge tag 'for-6.9-rc7-tag' of git://git.kern..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13f6734c980000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=7144b4fe7fbf5900
> > dashboard link: https://syzkaller.appspot.com/bug?extid=a941018a091f1a1f9546
> > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10306760980000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138c8970980000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/e1fea5a49470/disk-dccb07f2.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/5f7d53577fef/vmlinux-dccb07f2.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/430b18473a18/bzImage-dccb07f2.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+a941018a091f1a1f9546@...kaller.appspotmail.com
> >
> > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P17678/1:b..l
> > rcu: (detected by 1, t=10502 jiffies, g=36541, q=38 ncpus=2)
> > task:syz-executor952 state:R running task stack:28968 pid:17678 tgid:17678 ppid:5114 flags:0x00000002
..
>
> I cannot say that this isn't the maple tree in an infinite loop, but I
> don't think it is given the information above. Considering the infinite
> loop scenario would produce the same crash on reproduction but this is
> not what syzbot sees on the git bisect, I think it is not an issue in
> the tree but an issue somewhere else - and probably a corruption issue
> that wasn't detected by kasan (is this possible?).
I was able to recreate this with the provided config and reproducer (but
not my own config). My trace has no maple tree calls at all:
[ 866.380945][ C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 866.381464][ C1] rcu: (detected by 1, t=10502 jiffies, g=161409, q=149 ncpus=2)
[ 866.382152][ C1] rcu: All QSes seen, last rcu_preempt kthread activity 10500 (4295023801-4295013301), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 866.383324][ C1] rcu: rcu_preempt kthread starved for 10500 jiffies! g161409 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[ 866.384952][ C1] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 866.385972][ C1] rcu: RCU grace-period kthread stack dump:
[ 866.386582][ C1] task:rcu_preempt state:R running task stack:27648 pid:16 tgid:16 ppid:2 flags:0x00004000
[ 866.387811][ C1] Call Trace:
[ 866.388164][ C1] <TASK>
[ 866.388475][ C1] __schedule+0xf06/0x5cb0
[ 866.388961][ C1] ? __pfx___lock_acquire+0x10/0x10
[ 866.389528][ C1] ? __pfx___schedule+0x10/0x10
[ 866.390065][ C1] ? schedule+0x298/0x350
[ 866.390541][ C1] ? __pfx_lock_release+0x10/0x10
[ 866.391090][ C1] ? __pfx___mod_timer+0x10/0x10
[ 866.391633][ C1] ? lock_acquire+0x1b1/0x560
[ 866.392133][ C1] ? lockdep_init_map_type+0x16d/0x7e0
[ 866.392709][ C1] schedule+0xe7/0x350
[ 866.393139][ C1] schedule_timeout+0x136/0x2a0
[ 866.393654][ C1] ? __pfx_schedule_timeout+0x10/0x10
[ 866.394142][ C1] ? __pfx_process_timeout+0x10/0x10
[ 866.394596][ C1] ? _raw_spin_unlock_irqrestore+0x3b/0x80
[ 866.395137][ C1] ? prepare_to_swait_event+0xf0/0x470
[ 866.395714][ C1] rcu_gp_fqs_loop+0x1ab/0xbd0
[ 866.396246][ C1] ? __pfx_rcu_gp_fqs_loop+0x10/0x10
[ 866.396852][ C1] ? rcu_gp_init+0xbdb/0x1480
[ 866.397393][ C1] ? __pfx_rcu_gp_cleanup+0x10/0x10
[ 866.397988][ C1] rcu_gp_kthread+0x271/0x380
[ 866.398493][ C1] ? __pfx_rcu_gp_kthread+0x10/0x10
[ 866.399063][ C1] ? lockdep_hardirqs_on+0x7c/0x110
[ 866.399570][ C1] ? __kthread_parkme+0x143/0x220
[ 866.400045][ C1] ? __pfx_rcu_gp_kthread+0x10/0x10
[ 866.400535][ C1] kthread+0x2c1/0x3a0
[ 866.400916][ C1] ? _raw_spin_unlock_irq+0x23/0x50
[ 866.401409][ C1] ? __pfx_kthread+0x10/0x10
[ 866.401854][ C1] ret_from_fork+0x45/0x80
[ 866.402284][ C1] ? __pfx_kthread+0x10/0x10
[ 866.402718][ C1] ret_from_fork_asm+0x1a/0x30
[ 866.403167][ C1] </TASK>
I'm going to see if I can hit the corrupted stack version with kasan enabled.
Thanks,
Liam
Powered by blists - more mailing lists