lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250219112308.5d905680@gandalf.local.home>
Date: Wed, 19 Feb 2025 11:23:08 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Will Deacon <will@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
 Boqun Feng <boqun.feng@...il.com>, Waiman Long <longman@...hat.com>, Joel
 Granados <joel.granados@...nel.org>, Anna Schumaker
 <anna.schumaker@...cle.com>, Lance Yang <ioworker0@...il.com>, Kent
 Overstreet <kent.overstreet@...ux.dev>, Yongliang Gao
 <leonylgao@...cent.com>, Tomasz Figa <tfiga@...omium.org>, Sergey
 Senozhatsky <senozhatsky@...omium.org>, linux-kernel@...r.kernel.org, Linux
 Memory Management List <linux-mm@...ck.org>, Lance Yang
 <ioworker0@...il.com>
Subject: Re: [PATCH 1/2] hung_task: Show the blocker task if the task is
 hung on mutex

On Wed, 19 Feb 2025 22:00:49 +0900
"Masami Hiramatsu (Google)" <mhiramat@...nel.org> wrote:

> From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
> 
> The "hung_task" shows a long-time uninterruptible slept task, but most
> often, it's blocked on a mutex acquired by another task. Without
> dumping such a task, investigating the root cause of the hung task
> problem is very difficult.
> 
> Fortunately CONFIG_DEBUG_MUTEXES=y allows us to identify the mutex
> blocking the task. And the mutex has "owner" information, which can
> be used to find the owner task and dump it with hung tasks.
> 
> With this change, the hung task shows blocker task's info like below;
> 

We've hit bugs like this in the field a few times, and it was very
difficult to debug. Something like this would have made our lives much
easier!

> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@...nel.org>
> ---
>  kernel/hung_task.c           |   38 ++++++++++++++++++++++++++++++++++++++
>  kernel/locking/mutex-debug.c |    1 +
>  kernel/locking/mutex.c       |    9 +++++++++
>  kernel/locking/mutex.h       |    6 ++++++
>  4 files changed, 54 insertions(+)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 04efa7a6e69b..d1ce69504090 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -25,6 +25,8 @@
>  
>  #include <trace/events/sched.h>
>  
> +#include "locking/mutex.h"
> +
>  /*
>   * The number of tasks checked:
>   */
> @@ -93,6 +95,41 @@ static struct notifier_block panic_block = {
>  	.notifier_call = hung_task_panic,
>  };
>  
> +
> +#ifdef CONFIG_DEBUG_MUTEXES
> +static void debug_show_blocker(struct task_struct *task)
> +{
> +	struct task_struct *g, *t;
> +	unsigned long owner;
> +	struct mutex *lock;
> +
> +	if (!task->blocked_on)
> +		return;
> +
> +	lock = task->blocked_on->mutex;

This is a catch 22. To look at the task's blocked_on, we need the
lock->wait_lock held, otherwise this could be an issue. But to get that
lock, we need to look at the task's blocked_on field! As this can race.

Another thing is that the waiter is on the task's stack. Perhaps we need to
move this into sched/core.c and be able to lock the task's rq. Because even
something like:

	waiter = READ_ONCE(task->blocked_on);

May be garbage if the task were to suddenly wake up and run.

Now if we were able to lock the task's rq, which would prevent it from
being woken up, then the blocked_on field would not be at risk of being
corrupted.

-- Steve


> +	if (unlikely(!lock)) {
> +		pr_err("INFO: task %s:%d is blocked on a mutex, but the mutex is not found.\n",
> +			task->comm, task->pid);
> +		return;
> +	}
> +	owner = debug_mutex_get_owner(lock);
> +	if (likely(owner)) {
> +		/* Ensure the owner information is correct. */
> +		for_each_process_thread(g, t)
> +			if ((unsigned long)t == owner) {
> +				pr_err("INFO: task %s:%d is blocked on a mutex owned by task %s:%d.\n",
> +					task->comm, task->pid, t->comm, t->pid);
> +				sched_show_task(t);
> +				return;
> +			}
> +	}
> +	pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
> +		task->comm, task->pid);
> +}
> +#else
> +#define debug_show_blocker(t)	do {} while (0)
> +#endif

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ