[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <392dc660-b4b0-4a0e-8e70-51e19c62589c@linux.dev>
Date: Wed, 30 Jul 2025 17:22:59 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: Sergey Senozhatsky <senozhatsky@...omium.org>,
"Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Will Deacon <will@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
Boqun Feng <boqun.feng@...il.com>, Waiman Long <longman@...hat.com>,
Joel Granados <joel.granados@...nel.org>,
Anna Schumaker <anna.schumaker@...cle.com>, Lance Yang
<ioworker0@...il.com>, Kent Overstreet <kent.overstreet@...ux.dev>,
Yongliang Gao <leonylgao@...cent.com>, Steven Rostedt <rostedt@...dmis.org>,
Tomasz Figa <tfiga@...omium.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 1/2] hung_task: Show the blocker task if the task is
hung on mutex
On 2025/7/30 15:59, Sergey Senozhatsky wrote:
> On (25/02/25 16:02), Masami Hiramatsu (Google) wrote:
>> The "hung_task" shows a long-time uninterruptible slept task, but most
>> often, it's blocked on a mutex acquired by another task. Without
>> dumping such a task, investigating the root cause of the hung task
>> problem is very difficult.
>>
>> This introduce task_struct::blocker_mutex to point the mutex lock
>> which this task is waiting for. Since the mutex has "owner"
>> information, we can find the owner task and dump it with hung tasks.
>>
>> Note: the owner can be changed while dumping the owner task, so
>> this is "likely" the owner of the mutex.
>>
>> With this change, the hung task shows blocker task's info like below;
>>
>> INFO: task cat:115 blocked for more than 122 seconds.
>> Not tainted 6.14.0-rc3-00003-ga8946be3de00 #156
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:cat state:D stack:13432 pid:115 tgid:115 ppid:106 task_flags:0x400100 flags:0x00000002
>> Call Trace:
>> <TASK>
>> __schedule+0x731/0x960
>> ? schedule_preempt_disabled+0x54/0xa0
>> schedule+0xb7/0x140
>> ? __mutex_lock+0x51b/0xa60
>> ? __mutex_lock+0x51b/0xa60
>> schedule_preempt_disabled+0x54/0xa0
>> __mutex_lock+0x51b/0xa60
>> read_dummy+0x23/0x70
>> full_proxy_read+0x6a/0xc0
>> vfs_read+0xc2/0x340
>> ? __pfx_direct_file_splice_eof+0x10/0x10
>> ? do_sendfile+0x1bd/0x2e0
>> ksys_read+0x76/0xe0
>> do_syscall_64+0xe3/0x1c0
>> ? exc_page_fault+0xa9/0x1d0
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x4840cd
>> RSP: 002b:00007ffe99071828 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd
>> RDX: 0000000000001000 RSI: 00007ffe99071870 RDI: 0000000000000003
>> RBP: 00007ffe99071870 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000
>> R13: 00000000132fd3a0 R14: 0000000000000001 R15: ffffffffffffffff
>> </TASK>
>> INFO: task cat:115 is blocked on a mutex likely owned by task cat:114.
>> task:cat state:S stack:13432 pid:114 tgid:114 ppid:106 task_flags:0x400100 flags:0x00000002
>> Call Trace:
>> <TASK>
>> __schedule+0x731/0x960
>> ? schedule_timeout+0xa8/0x120
>> schedule+0xb7/0x140
>> schedule_timeout+0xa8/0x120
>> ? __pfx_process_timeout+0x10/0x10
>> msleep_interruptible+0x3e/0x60
>> read_dummy+0x2d/0x70
>> full_proxy_read+0x6a/0xc0
>> vfs_read+0xc2/0x340
>> ? __pfx_direct_file_splice_eof+0x10/0x10
>> ? do_sendfile+0x1bd/0x2e0
>> ksys_read+0x76/0xe0
>> do_syscall_64+0xe3/0x1c0
>> ? exc_page_fault+0xa9/0x1d0
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x4840cd
>> RSP: 002b:00007ffe3e0147b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd
>> RDX: 0000000000001000 RSI: 00007ffe3e014800 RDI: 0000000000000003
>> RBP: 00007ffe3e014800 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000
>> R13: 000000001a0a93a0 R14: 0000000000000001 R15: ffffffffffffffff
>> </TASK>
>
> One thing that gives me a bit of "inconvenience" is that in certain
> cases this significantly increases the amount of stack traces to go
> through. A distilled real life example:
> - task T1 acquires lock L1, attempts to acquire L2
> - task T2 acquires lock L2, attempts to acquire L3
> - task T3 acquires lock L3, attempts to acquire L1
>
> So we'd now see:
> - a backtrace of T1, followed by a backtrace of T2 (owner of L2)
> - a backtrace of T2, followed by a backtrace of T3 (owner of L3)
> - a backtrace of T3, followed by a backtrace of T1 (owner of L1)
>
> Notice how each task is backtraced twice. I wonder if it's worth it
> to de-dup the backtraces. E.g. in
>
> task cat:115 is blocked on a mutex likely owned by task cat:114
>
> if we know that cat:114 is also blocked on a lock, then we probably
> can just say "is blocked on a mutex likely owned by task cat:114" and
> continue iterating through tasks. That "cat:114" will be backtraced
> individually later, as it's also blocked on a lock, owned by another
> task.
>
> Does this make any sense?
Good spot! There is room for improvement.
In a deadlock chain like T1->T2->T3->T1, by definition, T1, T2, and T3
are all hung tasks, and the detector's primary responsibility is to
generate a report for each of them. The current implementation, when
reporting on one task, also dumps the backtrace of its blocker.
This results in a task's backtrace being printed twice — once as a
blocker and again as a primary hung task.
Regarding the de-duplication idea: it is elegant, but it does introduce
more complexity into the detector. We should also consider that in many
real-world cases, the blocking chain is just one level deep, where this
isn't an issue, IHMO ;)
Thanks,
Lance
Powered by blists - more mailing lists