[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250314144300.32542-1-ioworker0@gmail.com>
Date: Fri, 14 Mar 2025 22:42:57 +0800
From: Lance Yang <ioworker0@...il.com>
To: akpm@...ux-foundation.org
Cc: will@...nel.org,
peterz@...radead.org,
mingo@...hat.com,
longman@...hat.com,
mhiramat@...nel.org,
anna.schumaker@...cle.com,
boqun.feng@...il.com,
joel.granados@...nel.org,
kent.overstreet@...ux.dev,
leonylgao@...cent.com,
linux-kernel@...r.kernel.org,
rostedt@...dmis.org,
senozhatsky@...omium.org,
tfiga@...omium.org,
amaindex@...look.com,
Lance Yang <ioworker0@...il.com>
Subject: [PATCH RESEND v2 0/3] hung_task: extend blocking task stacktrace dump to semaphore
Hi all,
Inspired by mutex blocker tracking[1], this patch series extend the
feature to not only dump the blocker task holding a mutex but also to
support semaphores. Unlike mutexes, semaphores lack explicit ownership
tracking, making it challenging to identify the root cause of hangs. To
address this, we introduce a last_holder field to the semaphore structure,
which is updated when a task successfully calls down() and cleared during
up().
The assumption is that if a task is blocked on a semaphore, the holders
must not have released it. While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.
With this change, the hung task detector can now show blocker task's info
like below:
[Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked for more than 122 seconds.
[Thu Mar 13 15:18:38 2025] Tainted: G OE 6.14.0-rc3+ #14
[Thu Mar 13 15:18:38 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Mar 13 15:18:38 2025] task:cat state:D stack:0 pid:1803 tgid:1803 ppid:1057 task_flags:0x400000 flags:0x00000004
[Thu Mar 13 15:18:38 2025] Call trace:
[Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T)
[Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8
[Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0
[Thu Mar 13 15:18:38 2025] schedule_timeout+0x1d0/0x208
[Thu Mar 13 15:18:38 2025] __down_common+0x2d4/0x6f8
[Thu Mar 13 15:18:38 2025] __down+0x24/0x50
[Thu Mar 13 15:18:38 2025] down+0xd0/0x140
[Thu Mar 13 15:18:38 2025] read_dummy+0x3c/0xa0 [hung_task_sem]
[Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0
[Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858
[Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220
[Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8
[Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278
[Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298
[Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88
[Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108
[Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160
[Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0
[Thu Mar 13 15:18:38 2025] INFO: task cat:1803 blocked on a semaphore likely last held by task cat:1802
[Thu Mar 13 15:18:38 2025] task:cat state:S stack:0 pid:1802 tgid:1802 ppid:1057 task_flags:0x400000 flags:0x00000004
[Thu Mar 13 15:18:38 2025] Call trace:
[Thu Mar 13 15:18:38 2025] __switch_to+0x1ec/0x380 (T)
[Thu Mar 13 15:18:38 2025] __schedule+0xc30/0x44f8
[Thu Mar 13 15:18:38 2025] schedule+0xb8/0x3b0
[Thu Mar 13 15:18:38 2025] schedule_timeout+0xf4/0x208
[Thu Mar 13 15:18:38 2025] msleep_interruptible+0x70/0x130
[Thu Mar 13 15:18:38 2025] read_dummy+0x48/0xa0 [hung_task_sem]
[Thu Mar 13 15:18:38 2025] full_proxy_read+0xfc/0x1d0
[Thu Mar 13 15:18:38 2025] vfs_read+0x1a0/0x858
[Thu Mar 13 15:18:38 2025] ksys_read+0x100/0x220
[Thu Mar 13 15:18:38 2025] __arm64_sys_read+0x78/0xc8
[Thu Mar 13 15:18:38 2025] invoke_syscall+0xd8/0x278
[Thu Mar 13 15:18:38 2025] el0_svc_common.constprop.0+0xb8/0x298
[Thu Mar 13 15:18:38 2025] do_el0_svc+0x4c/0x88
[Thu Mar 13 15:18:38 2025] el0_svc+0x44/0x108
[Thu Mar 13 15:18:38 2025] el0t_64_sync_handler+0x134/0x160
[Thu Mar 13 15:18:38 2025] el0t_64_sync+0x1b8/0x1c0
[1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com
Thanks,
Lance
---
v1 -> v2:
* Use one field to store the blocker as only one is active at a time,
suggested by Masami
* Leverage the LSB of the blocker field to reduce memory footprint,
suggested by Masami
* Add a hung_task detector semaphore blocking test sample code
* https://lore.kernel.org/all/20250301055102.88746-1-ioworker0@gmail.com
Lance Yang (2):
hung_task: replace blocker_mutex with encoded blocker
hung_task: show the blocker task if the task is hung on semaphore
Zi Li (1):
samples: add hung_task detector semaphore blocking sample
include/linux/hung_task.h | 94 +++++++++++++++++++++++++
include/linux/sched.h | 2 +-
include/linux/semaphore.h | 15 +++-
kernel/hung_task.c | 52 +++++++++++---
kernel/locking/mutex.c | 8 ++-
kernel/locking/semaphore.c | 55 +++++++++++++--
samples/Kconfig | 11 +--
samples/hung_task/Makefile | 3 +-
samples/hung_task/hung_task_mutex.c | 20 ++++--
samples/hung_task/hung_task_semaphore.c | 74 +++++++++++++++++++
10 files changed, 301 insertions(+), 33 deletions(-)
create mode 100644 include/linux/hung_task.h
create mode 100644 samples/hung_task/hung_task_semaphore.c
--
2.45.2
Powered by blists - more mailing lists