linux-kernel - Re: [PATCH 1/2] hung_task: Show the blocker task if the task is hung on mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0fa9dd8e-2d83-487e-bfb1-1f5d20cd9fe6@redhat.com>
Date: Wed, 19 Feb 2025 15:18:57 -0500
From: Waiman Long <llong@...hat.com>
To: Steven Rostedt <rostedt@...dmis.org>,
 "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Will Deacon <will@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
 Boqun Feng <boqun.feng@...il.com>, Joel Granados <joel.granados@...nel.org>,
 Anna Schumaker <anna.schumaker@...cle.com>, Lance Yang
 <ioworker0@...il.com>, Kent Overstreet <kent.overstreet@...ux.dev>,
 Yongliang Gao <leonylgao@...cent.com>, Tomasz Figa <tfiga@...omium.org>,
 Sergey Senozhatsky <senozhatsky@...omium.org>, linux-kernel@...r.kernel.org,
 Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: [PATCH 1/2] hung_task: Show the blocker task if the task is hung
 on mutex

On 2/19/25 11:23 AM, Steven Rostedt wrote:
> On Wed, 19 Feb 2025 22:00:49 +0900
> "Masami Hiramatsu (Google)" <mhiramat@...nel.org> wrote:
>
>> From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
>>
>> The "hung_task" shows a long-time uninterruptible slept task, but most
>> often, it's blocked on a mutex acquired by another task. Without
>> dumping such a task, investigating the root cause of the hung task
>> problem is very difficult.
>>
>> Fortunately CONFIG_DEBUG_MUTEXES=y allows us to identify the mutex
>> blocking the task. And the mutex has "owner" information, which can
>> be used to find the owner task and dump it with hung tasks.
>>
>> With this change, the hung task shows blocker task's info like below;
>>
> We've hit bugs like this in the field a few times, and it was very
> difficult to debug. Something like this would have made our lives much
> easier!
I agree that it will be a useful feature.
>> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@...nel.org>
>> ---
>>   kernel/hung_task.c           |   38 ++++++++++++++++++++++++++++++++++++++
>>   kernel/locking/mutex-debug.c |    1 +
>>   kernel/locking/mutex.c       |    9 +++++++++
>>   kernel/locking/mutex.h       |    6 ++++++
>>   4 files changed, 54 insertions(+)
>>
>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>> index 04efa7a6e69b..d1ce69504090 100644
>> --- a/kernel/hung_task.c
>> +++ b/kernel/hung_task.c
>> @@ -25,6 +25,8 @@
>>   
>>   #include <trace/events/sched.h>
>>   
>> +#include "locking/mutex.h"
>> +
>>   /*
>>    * The number of tasks checked:
>>    */
>> @@ -93,6 +95,41 @@ static struct notifier_block panic_block = {
>>   	.notifier_call = hung_task_panic,
>>   };
>>   
>> +
>> +#ifdef CONFIG_DEBUG_MUTEXES
>> +static void debug_show_blocker(struct task_struct *task)
>> +{
>> +	struct task_struct *g, *t;
>> +	unsigned long owner;
>> +	struct mutex *lock;
>> +
>> +	if (!task->blocked_on)
>> +		return;
>> +
>> +	lock = task->blocked_on->mutex;
> This is a catch 22. To look at the task's blocked_on, we need the
> lock->wait_lock held, otherwise this could be an issue. But to get that
> lock, we need to look at the task's blocked_on field! As this can race.
>
> Another thing is that the waiter is on the task's stack. Perhaps we need to
> move this into sched/core.c and be able to lock the task's rq. Because even
> something like:
>
> 	waiter = READ_ONCE(task->blocked_on);
>
> May be garbage if the task were to suddenly wake up and run.
>
> Now if we were able to lock the task's rq, which would prevent it from
> being woken up, then the blocked_on field would not be at risk of being
> corrupted.

It is tricky to access the mutex_waiter structure which is allocated 
from stack. So another way to work around this issue is to add a new 
blocked_on_mutex field in task_struct to directly point to relevant 
mutex. Yes, that increase the size of task_struct by 8 bytes, but it is 
a pretty large structure anyway. Using READ_ONCE/WRITE_ONCE() to access 
this field, we don't need to take lock, though taking the wait_lock may 
still be needed to examine other information inside the mutex.

Cheers,
Longman