linux-kernel - Re: [PATCH v4 1/2] hung_task: Show the blocker task if the task is hung on mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <392dc660-b4b0-4a0e-8e70-51e19c62589c@linux.dev>
Date: Wed, 30 Jul 2025 17:22:59 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: Sergey Senozhatsky <senozhatsky@...omium.org>,
 "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Will Deacon <will@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
 Boqun Feng <boqun.feng@...il.com>, Waiman Long <longman@...hat.com>,
 Joel Granados <joel.granados@...nel.org>,
 Anna Schumaker <anna.schumaker@...cle.com>, Lance Yang
 <ioworker0@...il.com>, Kent Overstreet <kent.overstreet@...ux.dev>,
 Yongliang Gao <leonylgao@...cent.com>, Steven Rostedt <rostedt@...dmis.org>,
 Tomasz Figa <tfiga@...omium.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 1/2] hung_task: Show the blocker task if the task is
 hung on mutex



On 2025/7/30 15:59, Sergey Senozhatsky wrote:
> On (25/02/25 16:02), Masami Hiramatsu (Google) wrote:
>> The "hung_task" shows a long-time uninterruptible slept task, but most
>> often, it's blocked on a mutex acquired by another task. Without
>> dumping such a task, investigating the root cause of the hung task
>> problem is very difficult.
>>
>> This introduce task_struct::blocker_mutex to point the mutex lock
>> which this task is waiting for. Since the mutex has "owner"
>> information, we can find the owner task and dump it with hung tasks.
>>
>> Note: the owner can be changed while dumping the owner task, so
>> this is "likely" the owner of the mutex.
>>
>> With this change, the hung task shows blocker task's info like below;
>>
>>   INFO: task cat:115 blocked for more than 122 seconds.
>>         Not tainted 6.14.0-rc3-00003-ga8946be3de00 #156
>>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>   task:cat             state:D stack:13432 pid:115   tgid:115   ppid:106    task_flags:0x400100 flags:0x00000002
>>   Call Trace:
>>    <TASK>
>>    __schedule+0x731/0x960
>>    ? schedule_preempt_disabled+0x54/0xa0
>>    schedule+0xb7/0x140
>>    ? __mutex_lock+0x51b/0xa60
>>    ? __mutex_lock+0x51b/0xa60
>>    schedule_preempt_disabled+0x54/0xa0
>>    __mutex_lock+0x51b/0xa60
>>    read_dummy+0x23/0x70
>>    full_proxy_read+0x6a/0xc0
>>    vfs_read+0xc2/0x340
>>    ? __pfx_direct_file_splice_eof+0x10/0x10
>>    ? do_sendfile+0x1bd/0x2e0
>>    ksys_read+0x76/0xe0
>>    do_syscall_64+0xe3/0x1c0
>>    ? exc_page_fault+0xa9/0x1d0
>>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>   RIP: 0033:0x4840cd
>>   RSP: 002b:00007ffe99071828 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>>   RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd
>>   RDX: 0000000000001000 RSI: 00007ffe99071870 RDI: 0000000000000003
>>   RBP: 00007ffe99071870 R08: 0000000000000000 R09: 0000000000000000
>>   R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000
>>   R13: 00000000132fd3a0 R14: 0000000000000001 R15: ffffffffffffffff
>>    </TASK>
>>   INFO: task cat:115 is blocked on a mutex likely owned by task cat:114.
>>   task:cat             state:S stack:13432 pid:114   tgid:114   ppid:106    task_flags:0x400100 flags:0x00000002
>>   Call Trace:
>>    <TASK>
>>    __schedule+0x731/0x960
>>    ? schedule_timeout+0xa8/0x120
>>    schedule+0xb7/0x140
>>    schedule_timeout+0xa8/0x120
>>    ? __pfx_process_timeout+0x10/0x10
>>    msleep_interruptible+0x3e/0x60
>>    read_dummy+0x2d/0x70
>>    full_proxy_read+0x6a/0xc0
>>    vfs_read+0xc2/0x340
>>    ? __pfx_direct_file_splice_eof+0x10/0x10
>>    ? do_sendfile+0x1bd/0x2e0
>>    ksys_read+0x76/0xe0
>>    do_syscall_64+0xe3/0x1c0
>>    ? exc_page_fault+0xa9/0x1d0
>>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>   RIP: 0033:0x4840cd
>>   RSP: 002b:00007ffe3e0147b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>>   RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd
>>   RDX: 0000000000001000 RSI: 00007ffe3e014800 RDI: 0000000000000003
>>   RBP: 00007ffe3e014800 R08: 0000000000000000 R09: 0000000000000000
>>   R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000
>>   R13: 000000001a0a93a0 R14: 0000000000000001 R15: ffffffffffffffff
>>    </TASK>
> 
> One thing that gives me a bit of "inconvenience" is that in certain
> cases this significantly increases the amount of stack traces to go
> through.  A distilled real life example:
> - task T1 acquires lock L1, attempts to acquire L2
> - task T2 acquires lock L2, attempts to acquire L3
> - task T3 acquires lock L3, attempts to acquire L1
> 
> So we'd now see:
> - a backtrace of T1, followed by a backtrace of T2 (owner of L2)
> - a backtrace of T2, followed by a backtrace of T3 (owner of L3)
> - a backtrace of T3, followed by a backtrace of T1 (owner of L1)
> 
> Notice how each task is backtraced twice.  I wonder if it's worth it
> to de-dup the backtraces.  E.g. in
> 
> 	task cat:115 is blocked on a mutex likely owned by task cat:114
> 
> if we know that cat:114 is also blocked on a lock, then we probably
> can just say "is blocked on a mutex likely owned by task cat:114" and
> continue iterating through tasks.  That "cat:114" will be backtraced
> individually later, as it's also blocked on a lock, owned by another
> task.
> 
> Does this make any sense?

Good spot! There is room for improvement.

In a deadlock chain like T1->T2->T3->T1, by definition, T1, T2, and T3
are all hung tasks, and the detector's primary responsibility is to
generate a report for each of them. The current implementation, when
reporting on one task, also dumps the backtrace of its blocker.

This results in a task's backtrace being printed twice — once as a
blocker and again as a primary hung task.

Regarding the de-duplication idea: it is elegant, but it does introduce
more complexity into the detector. We should also consider that in many
real-world cases, the blocking chain is just one level deep, where this
isn't an issue, IHMO ;)

Thanks,
Lance