linux-kernel - Re: [syzbot] [bluetooth?] INFO: task hung in hci_conn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZbgJqc06rEZ_FqRw@boqun-archlinux>
Date: Mon, 29 Jan 2024 12:25:13 -0800
From: Boqun Feng <boqun.feng@...il.com>
To: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc: Hillf Danton <hdanton@...a.com>,
	syzbot <syzbot+a984066a63e9c1e62662@...kaller.appspotmail.com>,
	eadavis@...com, Peter Zijlstra <peterz@...radead.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [bluetooth?] INFO: task hung in hci_conn_failed

On Mon, Jan 29, 2024 at 08:28:50PM +0900, Tetsuo Handa wrote:
> On 2024/01/29 19:47, Hillf Danton wrote:
> > On Mon, 29 Jan 2024 14:53:20 +0900 Tetsuo Handa wrote:
> >> On 2024/01/29 13:48, Hillf Danton wrote:
> >>>> 3 locks held by kworker/1:2/779:
> >>>>  #0: ffff8880b993ccd8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x29/0x130 kernel/sched/core.c:559
> >>>>  #1: ffffc900038c7d80 ((work_completion)(&aux->work)#2){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> >>>>  #2: ffff888052984c80 (&aux->poke_mutex){+.+.}-{3:3}, at: __fd_array_map_delete_elem+0x125/0x2f0 kernel/bpf/arraymap.c:884
> >>>
> >>> Could locking people shed any light on the failure of detecting the
> >>> poke_mutex with rq lock held?
> >>>
> >>
> >> Showing held locks (lockdep_print_held_locks()) is not a snapshot.
> >> Synchronous printk() can make #0 already released by the moment #2 is taken.
> >>
> >> Please consult printk() people for possibility of making printk() from
> >> lockdep reports (and/or hung task reports) asynchronous.
> > 
> > If printk did its work well then we have difficulty explaining why RCU
> > stalled more than two minutes.
> > 
> 
> The report you are talking about is https://syzkaller.appspot.com/x/report.txt?x=14300d80180000 .
> 
> It says
> 
> INFO: task kworker/u5:2:5062 blocked for more than 143 seconds.
>       Not tainted 6.8.0-rc1-syzkaller-g8a696a29c690-dirty #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u5:2    state:D stack:27840 pid:5062  tgid:5062  ppid:2      flags:0x00004000
> Workqueue: hci5 hci_cmd_sync_work
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5400 [inline]
>  __schedule+0xf12/0x5c00 kernel/sched/core.c:6727
>  __schedule_loop kernel/sched/core.c:6802 [inline]
>  schedule+0xe9/0x270 kernel/sched/core.c:6817
>  schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6874
>  __mutex_lock_common kernel/locking/mutex.c:684 [inline]
>  __mutex_lock+0x5b9/0x9d0 kernel/locking/mutex.c:752
>  hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
>  hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
>  hci_abort_conn_sync+0x7f8/0xb70 net/bluetooth/hci_sync.c:5356
>  abort_conn_sync+0x187/0x390 net/bluetooth/hci_conn.c:2988
>  hci_cmd_sync_work+0x1a4/0x410 net/bluetooth/hci_sync.c:306
>  process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
>  process_scheduled_works kernel/workqueue.c:2706 [inline]
>  worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
>  kthread+0x2c6/0x3a0 kernel/kthread.c:388
>  ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
>  ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
>  </TASK>
> 
> and lockdep says kworker/u5:2/5062 was blocked at
> 
> 5 locks held by kworker/u5:2/5062:
>  #0: ffff88802271a938 ((wq_completion)hci5){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
>  #1: ffffc900041cfd80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
>  #2: ffff888019ff1060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
>  #3: ffff888019ff0078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb70 net/bluetooth/hci_sync.c:5337
>  #4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
>  #4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
> 
> and ffffffff8ef23348 was held by kworker/u5:0/51.
> 
> 6 locks held by kworker/u5:0/51:
>  #0: ffff88802a730d38 ((wq_completion)hci2){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
>  #1: ffffc90000bc7d80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
>  #2: ffff888020989060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
>  #3: ffff888020988078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb70 net/bluetooth/hci_sync.c:5337
>  #4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
>  #4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
>  #5: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
>  #5: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
> 
> Since RCU reader side is not permitted to sleep, kworker/u5:0/51 should have released ffffffff8ef23348 shortly.
> 
> But the thread who was blocking ffffffff8d1b83f8 is not reported (presumably one of threads that are reported as
> "X lock[s] held by Y" without stack trace), we can't know what the thread blocking ffffffff8d1b83f8 was doing.
> 
> Showing all backtraces of threads (including TASK_RUNNING state) could explain.
> 
> So, your question will be "Lockdep developers, can we please make it possible to report all backtraces of
> threads which are holding at least one lock, tolerating increasing possibility of UAF when trying to report
> locks held?" Although Linus does not like, maybe we need to tune hung task report using a kernel config option.
> 

Looks to me that debug_show_all_locks() doesn't hold the lockdep lock,
so it's really a best effort race read of all tasks on lock hold
information. Maybe Hillf wants the following? (Completely untested, it
will stop your whole system and print lock holding information).

Regards,
Boqun

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 151bd3de5936..17a9392e6bd8 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -6611,6 +6611,8 @@ void debug_show_all_locks(void)
 	}
 	pr_warn("\nShowing all locks held in the system:\n");
 
+	local_irq_disable();
+	lockdep_lock();
 	rcu_read_lock();
 	for_each_process_thread(g, p) {
 		if (!p->lockdep_depth)
@@ -6620,6 +6622,8 @@ void debug_show_all_locks(void)
 		touch_all_softlockup_watchdogs();
 	}
 	rcu_read_unlock();
+	lockdep_unlock();
+	local_irq_enable();
 
 	pr_warn("\n");
 	pr_warn("=============================================\n\n");