lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK1f24m-Ci3BvgfVYGW2tFSUEkPwiO9=0M_w4kpOt1qViiDXdg@mail.gmail.com>
Date: Wed, 19 Feb 2025 23:02:01 +0800
From: Lance Yang <ioworker0@...il.com>
To: mhiramat@...nel.org
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Will Deacon <will@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	Boqun Feng <boqun.feng@...il.com>, Waiman Long <longman@...hat.com>, 
	Joel Granados <joel.granados@...nel.org>, Anna Schumaker <anna.schumaker@...cle.com>, 
	Kent Overstreet <kent.overstreet@...ux.dev>, Yongliang Gao <leonylgao@...cent.com>, 
	Steven Rostedt <rostedt@...dmis.org>, Tomasz Figa <tfiga@...omium.org>, 
	Sergey Senozhatsky <senozhatsky@...omium.org>, linux-kernel@...r.kernel.org, 
	Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: [PATCH 0/2] hung_task: Dump the blocking task stacktrace

On Wed, Feb 19, 2025 at 9:33 PM Lance Yang <ioworker0@...il.com> wrote:
>
> CC linux-mm
>
> On Wed, Feb 19, 2025 at 9:00 PM Masami Hiramatsu (Google)
> <mhiramat@...nel.org> wrote:
> >
> > Hi,
> >
> > The hung_task detector is very useful for detecting the lockup.
> > However, since it only dumps the blocked (uninterruptible sleep)
> > processes, it is not enough to identify the root cause of that
> > lockup.
> >
> > For example, if a process holds a mutex and sleep an event in
> > interruptible state long time, the other processes will wait on
> > the mutex in uninterruptible state. In this case, the waiter
> > processes are dumped, but the blocker process is not shown
> > because it is sleep in interruptible state.

Cool! I just ran into something similar today, but with rwsem. In that
case, the blocked process was locked up, and we could not identify
the root cause either ;(

Thanks,
Lance

> >
> > This adds a feature to dump the blocker task which holds a mutex
> > when detecting a hung task. e.g.
> >
> >  INFO: task cat:113 blocked for more than 122 seconds.
> >        Not tainted 6.14.0-rc3-00002-g6afe972e1b9b #152
> >  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >  task:cat             state:D stack:13432 pid:113   tgid:113   ppid:103    task_flags:0x400100 flags:0x00000002
> >  Call Trace:
> >   <TASK>
> >   __schedule+0x731/0x960
> >   ? schedule_preempt_disabled+0x54/0xa0
> >   schedule+0xb7/0x140
> >   ? __mutex_lock+0x51d/0xa50
> >   ? __mutex_lock+0x51d/0xa50
> >   schedule_preempt_disabled+0x54/0xa0
> >   __mutex_lock+0x51d/0xa50
> >   ? current_time+0x3a/0x120
> >   read_dummy+0x23/0x70
> >   full_proxy_read+0x6a/0xc0
> >   vfs_read+0xc2/0x340
> >   ? __pfx_direct_file_splice_eof+0x10/0x10
> >   ? do_sendfile+0x1bd/0x2e0
> >   ksys_read+0x76/0xe0
> >   do_syscall_64+0xe3/0x1c0
> >   ? exc_page_fault+0xa9/0x1d0
> >   entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >  RIP: 0033:0x4840cd
> >  RSP: 002b:00007ffe632b76c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> >  RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd
> >  RDX: 0000000000001000 RSI: 00007ffe632b7710 RDI: 0000000000000003
> >  RBP: 00007ffe632b7710 R08: 0000000000000000 R09: 0000000000000000
> >  R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000
> >  R13: 000000003a8b63a0 R14: 0000000000000001 R15: ffffffffffffffff
> >   </TASK>
> >  INFO: task cat:113 is blocked on a mutex owned by task cat:112.
> >  task:cat             state:S stack:13432 pid:112   tgid:112   ppid:103    task_flags:0x400100 flags:0x00000002
> >  Call Trace:
> >   <TASK>
> >   __schedule+0x731/0x960
> >   ? schedule_timeout+0xa8/0x120
> >   schedule+0xb7/0x140
> >   schedule_timeout+0xa8/0x120
> >   ? __pfx_process_timeout+0x10/0x10
> >   msleep_interruptible+0x3e/0x60
> >   read_dummy+0x2d/0x70
> >   full_proxy_read+0x6a/0xc0
> >   vfs_read+0xc2/0x340
> >   ? __pfx_direct_file_splice_eof+0x10/0x10
> >   ? do_sendfile+0x1bd/0x2e0
> >   ksys_read+0x76/0xe0
> >   do_syscall_64+0xe3/0x1c0
> >   ? exc_page_fault+0xa9/0x1d0
> >   entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >  RIP: 0033:0x4840cd
> >  RSP: 002b:00007ffd69513748 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> >  RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004840cd
> >  RDX: 0000000000001000 RSI: 00007ffd69513790 RDI: 0000000000000003
> >  RBP: 00007ffd69513790 R08: 0000000000000000 R09: 0000000000000000
> >  R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000001000
> >  R13: 0000000029d8d3a0 R14: 0000000000000001 R15: ffffffffffffffff
> >   </TASK>
> >
> > Thank you,
> >
> > ---
> >
> > Masami Hiramatsu (Google) (2):
> >       hung_task: Show the blocker task if the task is hung on mutex
> >       samples: Add hung_task detector mutex blocking sample
> >
> >
> >  kernel/hung_task.c                  |   38 ++++++++++++++++++++
> >  kernel/locking/mutex-debug.c        |    1 +
> >  kernel/locking/mutex.c              |    9 +++++
> >  kernel/locking/mutex.h              |    6 +++
> >  samples/Kconfig                     |    9 +++++
> >  samples/Makefile                    |    1 +
> >  samples/hung_task/Makefile          |    2 +
> >  samples/hung_task/hung_task_mutex.c |   66 +++++++++++++++++++++++++++++++++++
> >  8 files changed, 132 insertions(+)
> >  create mode 100644 samples/hung_task/Makefile
> >  create mode 100644 samples/hung_task/hung_task_mutex.c
> >
> > --
> > Masami Hiramatsu (Google) <mhiramat@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ