[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <878qw4ei10.ffs@tglx>
Date: Fri, 06 Sep 2024 20:47:23 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: syzbot <syzbot+b3f9c9d700eadf2be3a9@...kaller.appspotmail.com>,
linux-kernel@...r.kernel.org, luto@...nel.org, netdev@...r.kernel.org,
peterz@...radead.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [kernel?] possible deadlock in __run_timer_base
On Fri, Sep 06 2024 at 08:36, syzbot wrote:
> HEAD commit: b408473ea01b bpf: Fix a crash when btf_parse_base() return..
> git tree: bpf
> console output: https://syzkaller.appspot.com/x/log.txt?x=10840739980000
> kernel config: https://syzkaller.appspot.com/x/.config?x=eb19570bf3f0c14f
> dashboard link: https://syzkaller.appspot.com/bug?extid=b3f9c9d700eadf2be3a9
> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> ------------[ cut here ]------------
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.11.0-rc4-syzkaller-gb408473ea01b #0 Not tainted
> ------------------------------------------------------
> syz.2.317/6997 is trying to acquire lock:
> ffffffff8e813cb8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x20/0xa0 kernel/locking/semaphore.c:139
>
> but task is already holding lock:
> ffff8880b892a718 (&base->lock){-.-.}-{2:2}, at: expire_timers kernel/time/timer.c:1839 [inline]
> ffff8880b892a718 (&base->lock){-.-.}-{2:2}, at: __run_timers kernel/time/timer.c:2417 [inline]
> ffff8880b892a718 (&base->lock){-.-.}-{2:2}, at: __run_timer_base+0x69d/0x8e0 kernel/time/timer.c:2428
>
> which lock already depends on the new lock.
Right, but that's not the real problem and I fear we can't do much about
this potential deadlock. The real issue is this:
> WARNING: CPU: 1 PID: 6997 at kernel/time/timer.c:1830 expire_timers kernel/time/timer.c:1830 [inline]
if (WARN_ON_ONCE(!fn)) {
/* Should never happen. Emphasis on should! */
So some code enqueued a timer, which at the time of enqueue must have
had a timer->function set because there is an explicit check for this.
Now something set timer->function to NULL, which triggers that warning.
That potential deadlock probably cannot be cured, but that warning
should never happen. So that's a really screwed up situation and trying
to get the warning out has priority.
No idea how to find the culprit though.
Thanks,
tglx
Powered by blists - more mailing lists