[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <tfzs3z7yjs6ppobm53hxwjzhhptgq2aqc2obylblz5rk7mdstg@bkas4xcq66xk>
Date: Wed, 30 Jul 2025 16:59:22 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Will Deacon <will@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
Boqun Feng <boqun.feng@...il.com>, Waiman Long <longman@...hat.com>,
Joel Granados <joel.granados@...nel.org>, Anna Schumaker <anna.schumaker@...cle.com>,
Lance Yang <ioworker0@...il.com>, Kent Overstreet <kent.overstreet@...ux.dev>,
Yongliang Gao <leonylgao@...cent.com>, Steven Rostedt <rostedt@...dmis.org>,
Tomasz Figa <tfiga@...omium.org>, Sergey Senozhatsky <senozhatsky@...omium.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 1/2] hung_task: Show the blocker task if the task is
hung on mutex
On (25/02/25 16:02), Masami Hiramatsu (Google) wrote:
> The "hung_task" shows a long-time uninterruptible slept task, but most
> often, it's blocked on a mutex acquired by another task. Without
> dumping such a task, investigating the root cause of the hung task
> problem is very difficult.
>
> This introduce task_struct::blocker_mutex to point the mutex lock
> which this task is waiting for. Since the mutex has "owner"
> information, we can find the owner task and dump it with hung tasks.
>
> Note: the owner can be changed while dumping the owner task, so
> this is "likely" the owner of the mutex.
>
> With this change, the hung task shows blocker task's info like below;
>
> INFO: task cat:115 blocked for more than 122 seconds.
> Not tainted 6.14.0-rc3-00003-ga8946be3de00 #156
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:cat state:D stack:13432 pid:115 tgid:115 ppid:106 task_flags:0x400100 flags:0x00000002
> Call Trace:
> <TASK>
> __schedule+0x731/0x960
> ? schedule_preempt_disabled+0x54/0xa0
> schedule+0xb7/0x140
> ? __mutex_lock+0x51b/0xa60
> ? __mutex_lock+0x51b/0xa60
> schedule_preempt_disabled+0x54/0xa0
> __mutex_lock+0x51b/0xa60
> read_dummy+0x23/0x70
> full_proxy_read+0x6a/0xc0
> vfs_read+0xc2/0x340
> ? __pfx_direct_file_splice_eof+0x10/0x10
> ? do_sendfile+0x1bd/0x2e0
> ksys_read+0x76/0xe0
> do_syscall_64+0xe3/0x1c0
> ? exc_page_fault+0xa9/0x1d0
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x4840cd
> RSP: 002b:00007ffe99071828 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd
> RDX: 0000000000001000 RSI: 00007ffe99071870 RDI: 0000000000000003
> RBP: 00007ffe99071870 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000
> R13: 00000000132fd3a0 R14: 0000000000000001 R15: ffffffffffffffff
> </TASK>
> INFO: task cat:115 is blocked on a mutex likely owned by task cat:114.
> task:cat state:S stack:13432 pid:114 tgid:114 ppid:106 task_flags:0x400100 flags:0x00000002
> Call Trace:
> <TASK>
> __schedule+0x731/0x960
> ? schedule_timeout+0xa8/0x120
> schedule+0xb7/0x140
> schedule_timeout+0xa8/0x120
> ? __pfx_process_timeout+0x10/0x10
> msleep_interruptible+0x3e/0x60
> read_dummy+0x2d/0x70
> full_proxy_read+0x6a/0xc0
> vfs_read+0xc2/0x340
> ? __pfx_direct_file_splice_eof+0x10/0x10
> ? do_sendfile+0x1bd/0x2e0
> ksys_read+0x76/0xe0
> do_syscall_64+0xe3/0x1c0
> ? exc_page_fault+0xa9/0x1d0
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x4840cd
> RSP: 002b:00007ffe3e0147b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd
> RDX: 0000000000001000 RSI: 00007ffe3e014800 RDI: 0000000000000003
> RBP: 00007ffe3e014800 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000
> R13: 000000001a0a93a0 R14: 0000000000000001 R15: ffffffffffffffff
> </TASK>
One thing that gives me a bit of "inconvenience" is that in certain
cases this significantly increases the amount of stack traces to go
through. A distilled real life example:
- task T1 acquires lock L1, attempts to acquire L2
- task T2 acquires lock L2, attempts to acquire L3
- task T3 acquires lock L3, attempts to acquire L1
So we'd now see:
- a backtrace of T1, followed by a backtrace of T2 (owner of L2)
- a backtrace of T2, followed by a backtrace of T3 (owner of L3)
- a backtrace of T3, followed by a backtrace of T1 (owner of L1)
Notice how each task is backtraced twice. I wonder if it's worth it
to de-dup the backtraces. E.g. in
task cat:115 is blocked on a mutex likely owned by task cat:114
if we know that cat:114 is also blocked on a lock, then we probably
can just say "is blocked on a mutex likely owned by task cat:114" and
continue iterating through tasks. That "cat:114" will be backtraced
individually later, as it's also blocked on a lock, owned by another
task.
Does this make any sense?
Powered by blists - more mailing lists