linux-kernel - Re: [PATCH v4 1/2] hung_task: Show the blocker task if the task is hung on mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <tfzs3z7yjs6ppobm53hxwjzhhptgq2aqc2obylblz5rk7mdstg@bkas4xcq66xk>
Date: Wed, 30 Jul 2025 16:59:22 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Will Deacon <will@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	Boqun Feng <boqun.feng@...il.com>, Waiman Long <longman@...hat.com>, 
	Joel Granados <joel.granados@...nel.org>, Anna Schumaker <anna.schumaker@...cle.com>, 
	Lance Yang <ioworker0@...il.com>, Kent Overstreet <kent.overstreet@...ux.dev>, 
	Yongliang Gao <leonylgao@...cent.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Tomasz Figa <tfiga@...omium.org>, Sergey Senozhatsky <senozhatsky@...omium.org>, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 1/2] hung_task: Show the blocker task if the task is
 hung on mutex

On (25/02/25 16:02), Masami Hiramatsu (Google) wrote:
> The "hung_task" shows a long-time uninterruptible slept task, but most
> often, it's blocked on a mutex acquired by another task. Without
> dumping such a task, investigating the root cause of the hung task
> problem is very difficult.
> 
> This introduce task_struct::blocker_mutex to point the mutex lock
> which this task is waiting for. Since the mutex has "owner"
> information, we can find the owner task and dump it with hung tasks.
> 
> Note: the owner can be changed while dumping the owner task, so
> this is "likely" the owner of the mutex.
> 
> With this change, the hung task shows blocker task's info like below;
> 
>  INFO: task cat:115 blocked for more than 122 seconds.
>        Not tainted 6.14.0-rc3-00003-ga8946be3de00 #156
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  task:cat             state:D stack:13432 pid:115   tgid:115   ppid:106    task_flags:0x400100 flags:0x00000002
>  Call Trace:
>   <TASK>
>   __schedule+0x731/0x960
>   ? schedule_preempt_disabled+0x54/0xa0
>   schedule+0xb7/0x140
>   ? __mutex_lock+0x51b/0xa60
>   ? __mutex_lock+0x51b/0xa60
>   schedule_preempt_disabled+0x54/0xa0
>   __mutex_lock+0x51b/0xa60
>   read_dummy+0x23/0x70
>   full_proxy_read+0x6a/0xc0
>   vfs_read+0xc2/0x340
>   ? __pfx_direct_file_splice_eof+0x10/0x10
>   ? do_sendfile+0x1bd/0x2e0
>   ksys_read+0x76/0xe0
>   do_syscall_64+0xe3/0x1c0
>   ? exc_page_fault+0xa9/0x1d0
>   entry_SYSCALL_64_after_hwframe+0x77/0x7f
>  RIP: 0033:0x4840cd
>  RSP: 002b:00007ffe99071828 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>  RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd
>  RDX: 0000000000001000 RSI: 00007ffe99071870 RDI: 0000000000000003
>  RBP: 00007ffe99071870 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000
>  R13: 00000000132fd3a0 R14: 0000000000000001 R15: ffffffffffffffff
>   </TASK>
>  INFO: task cat:115 is blocked on a mutex likely owned by task cat:114.
>  task:cat             state:S stack:13432 pid:114   tgid:114   ppid:106    task_flags:0x400100 flags:0x00000002
>  Call Trace:
>   <TASK>
>   __schedule+0x731/0x960
>   ? schedule_timeout+0xa8/0x120
>   schedule+0xb7/0x140
>   schedule_timeout+0xa8/0x120
>   ? __pfx_process_timeout+0x10/0x10
>   msleep_interruptible+0x3e/0x60
>   read_dummy+0x2d/0x70
>   full_proxy_read+0x6a/0xc0
>   vfs_read+0xc2/0x340
>   ? __pfx_direct_file_splice_eof+0x10/0x10
>   ? do_sendfile+0x1bd/0x2e0
>   ksys_read+0x76/0xe0
>   do_syscall_64+0xe3/0x1c0
>   ? exc_page_fault+0xa9/0x1d0
>   entry_SYSCALL_64_after_hwframe+0x77/0x7f
>  RIP: 0033:0x4840cd
>  RSP: 002b:00007ffe3e0147b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>  RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd
>  RDX: 0000000000001000 RSI: 00007ffe3e014800 RDI: 0000000000000003
>  RBP: 00007ffe3e014800 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000
>  R13: 000000001a0a93a0 R14: 0000000000000001 R15: ffffffffffffffff
>   </TASK>

One thing that gives me a bit of "inconvenience" is that in certain
cases this significantly increases the amount of stack traces to go
through.  A distilled real life example:
- task T1 acquires lock L1, attempts to acquire L2
- task T2 acquires lock L2, attempts to acquire L3
- task T3 acquires lock L3, attempts to acquire L1

So we'd now see:
- a backtrace of T1, followed by a backtrace of T2 (owner of L2)
- a backtrace of T2, followed by a backtrace of T3 (owner of L3)
- a backtrace of T3, followed by a backtrace of T1 (owner of L1)

Notice how each task is backtraced twice.  I wonder if it's worth it
to de-dup the backtraces.  E.g. in

	task cat:115 is blocked on a mutex likely owned by task cat:114

if we know that cat:114 is also blocked on a lock, then we probably
can just say "is blocked on a mutex likely owned by task cat:114" and
continue iterating through tasks.  That "cat:114" will be backtraced
individually later, as it's also blocked on a lock, owned by another
task.

Does this make any sense?