lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <87r0cuqzsf.ffs@tglx>
Date: Tue, 18 Jun 2024 13:02:40 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: syzbot <syzbot+e620313b27e2be807d3b@...kaller.appspotmail.com>,
 anna-maria@...utronix.de, frederic@...nel.org,
 linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com,
 netdev@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>, Petr Mladek
 <pmladek@...e.com>, John Ogness <jogness@...utronix.de>
Subject: Re: [syzbot] [kernel?] possible deadlock in hrtimer_try_to_cancel

On Tue, Jun 18 2024 at 00:40, syzbot wrote:
> ------------[ cut here ]------------
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.10.0-rc3-syzkaller-00044-g2ccbdf43d5e7 #0 Not tainted
> ------------------------------------------------------
> kworker/u32:10/1146 is trying to acquire lock:
> ffffffff8dba3118 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x12/0x70 kernel/locking/semaphore.c:139
>
> but task is already holding lock:
> ffff88802c32c9d8 (hrtimer_bases.lock){-.-.}-{2:2}, at: lock_hrtimer_base kernel/time/hrtimer.c:175 [inline]
> ffff88802c32c9d8 (hrtimer_bases.lock){-.-.}-{2:2}, at: hrtimer_try_to_cancel+0xa9/0x500 kernel/time/hrtimer.c:1333
>
> which lock already depends on the new lock.

Right. That's caused by this:

> WARNING: CPU: 3 PID: 1146 at lib/timerqueue.c:55 timerqueue_del+0xfe/0x150 lib/timerqueue.c:55

         WARN_ON_ONCE(RB_EMPTY_NODE(&node->node));

The warning is inside the hrtimer base lock held region which is known
to be problematic vs. printk...

> Modules linked in:
> CPU: 3 PID: 1146 Comm: kworker/u32:10 Not tainted 6.10.0-rc3-syzkaller-00044-g2ccbdf43d5e7 #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> Workqueue: netns cleanup_net
> RIP: 0010:timerqueue_del+0xfe/0x150 lib/timerqueue.c:55
> Code: 28 9e ff ff 4c 89 e1 48 ba 00 00 00 00 00 fc ff df 48 c1 e9 03 80 3c 11 00 75 45 48 89 45 08 e9 7b ff ff ff e8 f3 90 c0 f6 90 <0f> 0b 90 e9 43 ff ff ff 48 89 df e8 a2 c4 1d f7 eb 8a 4c 89 e7 e8
> RSP: 0018:ffffc90007267918 EFLAGS: 00010093
> RAX: 0000000000000000 RBX: ffffe8ffad04d080 RCX: ffffffff8acdfe20
> RDX: ffff88802045a440 RSI: ffffffff8acdfedd RDI: 0000000000000006
> RBP: ffff88802c32ca90 R08: 0000000000000006 R09: ffffe8ffad04d080
> R10: ffffe8ffad04d080 R11: 0000000000000001 R12: ffffe8ffad04d080
> R13: 0000000000000001 R14: ffff88802c32c9c0 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff88802c300000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000ffeb8c2c CR3: 000000005beb8000 CR4: 0000000000350ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  __remove_hrtimer+0x99/0x290 kernel/time/hrtimer.c:1118
>  remove_hrtimer kernel/time/hrtimer.c:1167 [inline]
>  hrtimer_try_to_cancel+0x2a5/0x500 kernel/time/hrtimer.c:1336
>  hrtimer_cancel+0x16/0x40 kernel/time/hrtimer.c:1445
>  napi_disable+0x13a/0x1e0 net/core/dev.c:6648
>  gro_cells_destroy net/core/gro_cells.c:116 [inline]
>  gro_cells_destroy+0x102/0x4d0 net/core/gro_cells.c:106
>  netdev_run_todo+0x775/0x1250 net/core/dev.c:10693
>  cleanup_net+0x591/0xbf0 net/core/net_namespace.c:636
>  process_one_work+0x958/0x1ad0 kernel/workqueue.c:3231
>  process_scheduled_works kernel/workqueue.c:3312 [inline]
>  worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393
>  kthread+0x2c1/0x3a0 kernel/kthread.c:389
>  ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

IOW, this tries to remove a hrtimer which is not queued, but has
hrtimer::state != HRTIMER_STATE_INACTIVE.

This means the timer is either not initialized or got corrupted.

The circular locking problem is the fallout which cannot be solved due
to the current printk semantics. The upcoming atomic consoles should
handle this nicely.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ