lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250624125753.1b64a9b0dafbd2bd63239dbc@kernel.org>
Date: Tue, 24 Jun 2025 12:57:53 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Lance Yang <ioworker0@...il.com>
Cc: akpm@...ux-foundation.org, zi.li@...ux.dev, anna.schumaker@...cle.com,
 boqun.feng@...il.com, joel.granados@...nel.org, jstultz@...gle.com,
 kent.overstreet@...ux.dev, leonylgao@...cent.com,
 linux-kernel@...r.kernel.org, longman@...hat.com, mhiramat@...nel.org,
 mingo@...hat.com, mingzhe.yang@...com, peterz@...radead.org,
 rostedt@...dmis.org, senozhatsky@...omium.org, tfiga@...omium.org,
 will@...nel.org, Lance Yang <lance.yang@...ux.dev>
Subject: Re: [PATCH RFC 3/3] hung_task: extend hung task blocker tracking to
 rwsems

On Thu, 12 Jun 2025 12:19:26 +0800
Lance Yang <ioworker0@...il.com> wrote:

> From: Lance Yang <lance.yang@...ux.dev>
> 
> Inspired by mutex blocker tracking[1], and having already extended it to
> semaphores, let's now add support for reader-writer semaphores (rwsems).
> 
> The approach is simple: when a task enters TASK_UNINTERRUPTIBLE while
> waiting for an rwsem, we just call hung_task_set_blocker(). The hung task
> detector can then query the rwsem's owner to identify the lock holder.
> 
> Tracking works reliably for writers, as there can only be a single writer
> holding the lock, and its task struct is stored in the owner field.
> 
> The main challenge lies with readers. The owner field points to only one
> of many concurrent readers, so we might lose track of the blocker if that
> specific reader unlocks, even while others remain. This is not a
> significant issue, however. In practice, long-lasting lock contention is
> almost always caused by a writer. Therefore, reliably tracking the writer
> is the primary goal of this patch series ;)

I think as far as it is reliable, it is better than nothing :) and that
can help us to debug some part of kernel crashes.

> 
> With this change, the hung task detector can now show blocker task's info
> like below:
> 
> [Thu Jun 12 11:01:33 2025] INFO: task rw_sem_thread2:36526 blocked for more than 122 seconds.
> [Thu Jun 12 11:01:33 2025]       Tainted: G S         O        6.16.0-rc1 #1
> [Thu Jun 12 11:01:33 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Thu Jun 12 11:01:33 2025] task:rw_sem_thread2  state:D stack:0     pid:36526 tgid:36526 ppid:2      task_flags:0x208040 flags:0x00004000
> [Thu Jun 12 11:01:33 2025] Call Trace:
> [Thu Jun 12 11:01:33 2025]  <TASK>
> [Thu Jun 12 11:01:33 2025]  __schedule+0x7c7/0x1930
> [Thu Jun 12 11:01:33 2025]  ? __pfx___schedule+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ? _raw_spin_lock_irq+0x8a/0xe0
> [Thu Jun 12 11:01:33 2025]  ? __pfx__raw_spin_lock_irq+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  schedule+0x6a/0x180
> [Thu Jun 12 11:01:33 2025]  schedule_preempt_disabled+0x15/0x30
> [Thu Jun 12 11:01:33 2025]  rwsem_down_write_slowpath+0x447/0x1090
> [Thu Jun 12 11:01:33 2025]  ? __pfx_rwsem_down_write_slowpath+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ? __pfx___schedule+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ? __pfx___might_resched+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ? __pfx_thread2_func+0x10/0x10 [rw_sem_test_2]
> [Thu Jun 12 11:01:33 2025]  down_write+0x125/0x140
> [Thu Jun 12 11:01:33 2025]  ? __pfx_down_write+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ? msleep+0x91/0xf0
> [Thu Jun 12 11:01:33 2025]  ? __raw_spin_lock_irqsave+0x8c/0xf0
> [Thu Jun 12 11:01:33 2025]  thread2_func+0x37/0x70 [rw_sem_test_2]
> [Thu Jun 12 11:01:33 2025]  kthread+0x39f/0x750
> [Thu Jun 12 11:01:33 2025]  ? __pfx_kthread+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ? __pfx__raw_spin_lock_irq+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ? __pfx_kthread+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ret_from_fork+0x25d/0x320
> [Thu Jun 12 11:01:33 2025]  ? __pfx_kthread+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ret_from_fork_asm+0x1a/0x30
> [Thu Jun 12 11:01:33 2025]  </TASK>
> [Thu Jun 12 11:01:33 2025] INFO: task rw_sem_thread2:36526 <writer> blocked on an rw-semaphore likely owned by task rw_sem_thread1:36525 <writer>
> [Thu Jun 12 11:01:33 2025] task:rw_sem_thread1  state:S stack:0     pid:36525 tgid:36525 ppid:2      task_flags:0x208040 flags:0x00004000
> [Thu Jun 12 11:01:33 2025] Call Trace:
> [Thu Jun 12 11:01:33 2025]  <TASK>
> [Thu Jun 12 11:01:33 2025]  __schedule+0x7c7/0x1930
> [Thu Jun 12 11:01:33 2025]  ? __pfx___schedule+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ? __mod_timer+0x304/0xa80
> [Thu Jun 12 11:01:33 2025]  ? irq_work_queue+0x6a/0xa0
> [Thu Jun 12 11:01:33 2025]  ? __pfx_vprintk_emit+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  schedule+0x6a/0x180
> [Thu Jun 12 11:01:33 2025]  schedule_timeout+0xfb/0x230
> [Thu Jun 12 11:01:33 2025]  ? __pfx_schedule_timeout+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ? __pfx_process_timeout+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ? down_write+0xc4/0x140
> [Thu Jun 12 11:01:33 2025]  msleep_interruptible+0xbe/0x150
> [Thu Jun 12 11:01:33 2025]  ? __pfx_thread1_func+0x10/0x10 [rw_sem_test_2]
> [Thu Jun 12 11:01:33 2025]  thread1_func+0x37/0x60 [rw_sem_test_2]
> [Thu Jun 12 11:01:33 2025]  kthread+0x39f/0x750
> [Thu Jun 12 11:01:33 2025]  ? __pfx_kthread+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ? __pfx__raw_spin_lock_irq+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ? __pfx_kthread+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ret_from_fork+0x25d/0x320
> [Thu Jun 12 11:01:33 2025]  ? __pfx_kthread+0x10/0x10
> [Thu Jun 12 11:01:33 2025]  ret_from_fork_asm+0x1a/0x30
> [Thu Jun 12 11:01:33 2025]  </TASK>
> 
> [1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com/
> 
> Suggested-by: Masami Hiramatsu (Google) <mhiramat@...nel.org>
> Signed-off-by: Lance Yang <lance.yang@...ux.dev>

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@...nel.org>

BTW, can you also add a patch to extend the test module
(samples/hung_task/hung_task_tests.c) to handle the rwsem
blocking case ? (reader and writer side)

Thank you,


> ---
>  include/linux/hung_task.h | 18 +++++++++---------
>  kernel/hung_task.c        | 29 +++++++++++++++++++++++++----
>  kernel/locking/rwsem.c    | 17 ++++++++++++++++-
>  3 files changed, 50 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/hung_task.h b/include/linux/hung_task.h
> index 1bc2b3244613..34e615c76ca5 100644
> --- a/include/linux/hung_task.h
> +++ b/include/linux/hung_task.h
> @@ -21,17 +21,17 @@
>   * type.
>   *
>   * Type encoding:
> - * 00 - Blocked on mutex        (BLOCKER_TYPE_MUTEX)
> - * 01 - Blocked on semaphore    (BLOCKER_TYPE_SEM)
> - * 10 - Blocked on rt-mutex     (BLOCKER_TYPE_RTMUTEX)
> - * 11 - Blocked on rw-semaphore (BLOCKER_TYPE_RWSEM)
> + * 00 - Blocked on mutex			(BLOCKER_TYPE_MUTEX)
> + * 01 - Blocked on semaphore			(BLOCKER_TYPE_SEM)
> + * 10 - Blocked on rw-semaphore as READER	(BLOCKER_TYPE_RWSEM_READER)
> + * 11 - Blocked on rw-semaphore as WRITER	(BLOCKER_TYPE_RWSEM_WRITER)
>   */
> -#define BLOCKER_TYPE_MUTEX      0x00UL
> -#define BLOCKER_TYPE_SEM        0x01UL
> -#define BLOCKER_TYPE_RTMUTEX    0x02UL
> -#define BLOCKER_TYPE_RWSEM      0x03UL
> +#define BLOCKER_TYPE_MUTEX		0x00UL
> +#define BLOCKER_TYPE_SEM		0x01UL
> +#define BLOCKER_TYPE_RWSEM_READER	0x02UL
> +#define BLOCKER_TYPE_RWSEM_WRITER	0x03UL
>  
> -#define BLOCKER_TYPE_MASK       0x03UL
> +#define BLOCKER_TYPE_MASK		0x03UL
>  
>  #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
>  static inline void hung_task_set_blocker(void *lock, unsigned long type)
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index d2432df2b905..8708a1205f82 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -23,6 +23,7 @@
>  #include <linux/sched/debug.h>
>  #include <linux/sched/sysctl.h>
>  #include <linux/hung_task.h>
> +#include <linux/rwsem.h>
>  
>  #include <trace/events/sched.h>
>  
> @@ -100,6 +101,7 @@ static void debug_show_blocker(struct task_struct *task)
>  {
>  	struct task_struct *g, *t;
>  	unsigned long owner, blocker, blocker_type;
> +	const char *rwsem_blocked_by, *rwsem_blocked_as;
>  
>  	RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "No rcu lock held");
>  
> @@ -111,12 +113,20 @@ static void debug_show_blocker(struct task_struct *task)
>  
>  	switch (blocker_type) {
>  	case BLOCKER_TYPE_MUTEX:
> -		owner = mutex_get_owner(
> -			(struct mutex *)hung_task_blocker_to_lock(blocker));
> +		owner = mutex_get_owner(hung_task_blocker_to_lock(blocker));
>  		break;
>  	case BLOCKER_TYPE_SEM:
> -		owner = sem_last_holder(
> -			(struct semaphore *)hung_task_blocker_to_lock(blocker));
> +		owner = sem_last_holder(hung_task_blocker_to_lock(blocker));
> +		break;
> +	case BLOCKER_TYPE_RWSEM_READER:
> +	case BLOCKER_TYPE_RWSEM_WRITER:
> +		owner = (unsigned long)rwsem_owner(
> +					hung_task_blocker_to_lock(blocker));
> +		rwsem_blocked_as = (blocker_type == BLOCKER_TYPE_RWSEM_READER) ?
> +					"reader" : "writer";
> +		rwsem_blocked_by = is_rwsem_reader_owned(
> +					hung_task_blocker_to_lock(blocker)) ?
> +					"reader" : "writer";
>  		break;
>  	default:
>  		WARN_ON_ONCE(1);
> @@ -134,6 +144,11 @@ static void debug_show_blocker(struct task_struct *task)
>  			pr_err("INFO: task %s:%d is blocked on a semaphore, but the last holder is not found.\n",
>  			       task->comm, task->pid);
>  			break;
> +		case BLOCKER_TYPE_RWSEM_READER:
> +		case BLOCKER_TYPE_RWSEM_WRITER:
> +			pr_err("INFO: task %s:%d is blocked on an rw-semaphore, but the owner is not found.\n",
> +			       task->comm, task->pid);
> +			break;
>  		}
>  		return;
>  	}
> @@ -152,6 +167,12 @@ static void debug_show_blocker(struct task_struct *task)
>  			pr_err("INFO: task %s:%d blocked on a semaphore likely last held by task %s:%d\n",
>  			       task->comm, task->pid, t->comm, t->pid);
>  			break;
> +		case BLOCKER_TYPE_RWSEM_READER:
> +		case BLOCKER_TYPE_RWSEM_WRITER:
> +			pr_err("INFO: task %s:%d <%s> blocked on an rw-semaphore likely owned by task %s:%d <%s>\n",
> +			       task->comm, task->pid, rwsem_blocked_as, t->comm,
> +			       t->pid, rwsem_blocked_by);
> +			break;
>  		}
>  		sched_show_task(t);
>  		return;
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index a310eb9896de..92c6332da401 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -27,6 +27,7 @@
>  #include <linux/export.h>
>  #include <linux/rwsem.h>
>  #include <linux/atomic.h>
> +#include <linux/hung_task.h>
>  #include <trace/events/lock.h>
>  
>  #ifndef CONFIG_PREEMPT_RT
> @@ -1065,10 +1066,13 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
>  		wake_up_q(&wake_q);
>  
>  	trace_contention_begin(sem, LCB_F_READ);
> +	set_current_state(state);
> +
> +	if (state == TASK_UNINTERRUPTIBLE)
> +		hung_task_set_blocker(sem, BLOCKER_TYPE_RWSEM_READER);
>  
>  	/* wait to be given the lock */
>  	for (;;) {
> -		set_current_state(state);
>  		if (!smp_load_acquire(&waiter.task)) {
>  			/* Matches rwsem_mark_wake()'s smp_store_release(). */
>  			break;
> @@ -1083,8 +1087,12 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
>  		}
>  		schedule_preempt_disabled();
>  		lockevent_inc(rwsem_sleep_reader);
> +		set_current_state(state);
>  	}
>  
> +	if (state == TASK_UNINTERRUPTIBLE)
> +		hung_task_clear_blocker();
> +
>  	__set_current_state(TASK_RUNNING);
>  	lockevent_inc(rwsem_rlock);
>  	trace_contention_end(sem, 0);
> @@ -1146,6 +1154,9 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
>  	set_current_state(state);
>  	trace_contention_begin(sem, LCB_F_WRITE);
>  
> +	if (state == TASK_UNINTERRUPTIBLE)
> +		hung_task_set_blocker(sem, BLOCKER_TYPE_RWSEM_WRITER);
> +
>  	for (;;) {
>  		if (rwsem_try_write_lock(sem, &waiter)) {
>  			/* rwsem_try_write_lock() implies ACQUIRE on success */
> @@ -1179,6 +1190,10 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
>  trylock_again:
>  		raw_spin_lock_irq(&sem->wait_lock);
>  	}
> +
> +	if (state == TASK_UNINTERRUPTIBLE)
> +		hung_task_clear_blocker();
> +
>  	__set_current_state(TASK_RUNNING);
>  	raw_spin_unlock_irq(&sem->wait_lock);
>  	lockevent_inc(rwsem_wlock);
> -- 
> 2.49.0
> 


-- 
Masami Hiramatsu (Google) <mhiramat@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ