lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAK1f24=EH4d2Zboke82YY3DjxfDwJBtoZDPEfDvD6vYiGROMKg@mail.gmail.com>
Date: Mon, 3 Mar 2025 18:52:48 +0800
From: Lance Yang <ioworker0@...il.com>
To: Masami Hiramatsu <mhiramat@...nel.org>
Cc: akpm@...ux-foundation.org, will@...nel.org, peterz@...radead.org, 
	mingo@...hat.com, longman@...hat.com, anna.schumaker@...cle.com, 
	boqun.feng@...il.com, joel.granados@...nel.org, kent.overstreet@...ux.dev, 
	leonylgao@...cent.com, linux-kernel@...r.kernel.org, rostedt@...dmis.org, 
	senozhatsky@...omium.org, tfiga@...omium.org, 
	Mingzhe Yang <mingzhe.yang@...com>
Subject: Re: [PATCH 1/1] hung_task: show the blocker task if the task is hung
 on semaphore

Hi Masami,

Thanks a lot for taking time to review!

On Mon, Mar 3, 2025 at 2:29 PM Masami Hiramatsu <mhiramat@...nel.org> wrote:
>
> On Sat,  1 Mar 2025 13:51:02 +0800
> Lance Yang <ioworker0@...il.com> wrote:
>
> > Inspired by mutex blocker tracking[1], this patch makes a trade-off to
> > balance the overhead and utility of the hung task detector.
>
> Thanks for adding new one!
>
> >
> > Unlike mutexes, semaphores lack explicit ownership tracking, making it
> > challenging to identify the root cause of hangs. To address this, we
> > introduce a last_holder field to the semaphore structure, which is
> > updated when a task successfully calls down() and cleared during up().
> >
> > The assumption is that if a task is blocked on a semaphore, the holders
> > must not have released it. While this does not guarantee that the last
> > holder is one of the current blockers, it likely provides a practical hint
> > for diagnosing semaphore-related stalls.
>
> Yeah, if we can have something like the owner_list, we can find the
> longest time blocker (owner) but it takes more {memory, performance}
> overheads.

You're right. Let's keep it simple and light for now without more
 {memory, performance} overheads ;p

>
> >
> > With this change, the hung task detector can now show blocker task's info
> > like below:
> >
> > [Sat Mar  1 02:39:52 2025] INFO: task cat:1437 blocked for more than 122 seconds.
> > [Sat Mar  1 02:39:52 2025]       Tainted: G           OE      6.14.0-rc3+ #9
> > [Sat Mar  1 02:39:52 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [Sat Mar  1 02:39:52 2025] task:cat             state:D stack:0     pid:1437  tgid:1437  ppid:1007   task_flags:0x400000 flags:0x00000004
> > [Sat Mar  1 02:39:52 2025] Call trace:
> > [Sat Mar  1 02:39:52 2025]  __switch_to+0x1ec/0x380 (T)
> > [Sat Mar  1 02:39:52 2025]  __schedule+0xc30/0x44f8
> > [Sat Mar  1 02:39:52 2025]  schedule+0xb8/0x3b0
> > [Sat Mar  1 02:39:52 2025]  schedule_timeout+0x1d0/0x208
> > [Sat Mar  1 02:39:52 2025]  __down_common+0x27c/0x600
> > [Sat Mar  1 02:39:52 2025]  __down+0x24/0x50
> > [Sat Mar  1 02:39:52 2025]  down+0xe0/0x140
> > [Sat Mar  1 02:39:52 2025]  read_dummy+0x3c/0xa0 [hung_task_sem]
> > [Sat Mar  1 02:39:52 2025]  full_proxy_read+0xfc/0x1d0
> > [Sat Mar  1 02:39:52 2025]  vfs_read+0x1a0/0x858
> > [Sat Mar  1 02:39:52 2025]  ksys_read+0x100/0x220
> > [Sat Mar  1 02:39:52 2025]  __arm64_sys_read+0x78/0xc8
> > [Sat Mar  1 02:39:52 2025]  invoke_syscall+0xd8/0x278
> > [Sat Mar  1 02:39:52 2025]  el0_svc_common.constprop.0+0xb8/0x298
> > [Sat Mar  1 02:39:52 2025]  do_el0_svc+0x4c/0x88
> > [Sat Mar  1 02:39:52 2025]  el0_svc+0x44/0x108
> > [Sat Mar  1 02:39:52 2025]  el0t_64_sync_handler+0x134/0x160
> > [Sat Mar  1 02:39:52 2025]  el0t_64_sync+0x1b8/0x1c0
> > [Sat Mar  1 02:39:52 2025] INFO: task cat:1437 blocked on a semaphore likely last held by task cat:1436
> > [Sat Mar  1 02:39:52 2025] task:cat             state:S stack:0     pid:1436  tgid:1436  ppid:1007   task_flags:0x400000 flags:0x00000004
> > [Sat Mar  1 02:39:52 2025] Call trace:
> > [Sat Mar  1 02:39:52 2025]  __switch_to+0x1ec/0x380 (T)
> > [Sat Mar  1 02:39:52 2025]  __schedule+0xc30/0x44f8
> > [Sat Mar  1 02:39:52 2025]  schedule+0xb8/0x3b0
> > [Sat Mar  1 02:39:52 2025]  schedule_timeout+0xf4/0x208
> > [Sat Mar  1 02:39:52 2025]  msleep_interruptible+0x70/0x130
> > [Sat Mar  1 02:39:52 2025]  read_dummy+0x48/0xa0 [hung_task_sem]
> > [Sat Mar  1 02:39:52 2025]  full_proxy_read+0xfc/0x1d0
> > [Sat Mar  1 02:39:52 2025]  vfs_read+0x1a0/0x858
> > [Sat Mar  1 02:39:52 2025]  ksys_read+0x100/0x220
> > [Sat Mar  1 02:39:52 2025]  __arm64_sys_read+0x78/0xc8
> > [Sat Mar  1 02:39:52 2025]  invoke_syscall+0xd8/0x278
> > [Sat Mar  1 02:39:52 2025]  el0_svc_common.constprop.0+0xb8/0x298
> > [Sat Mar  1 02:39:52 2025]  do_el0_svc+0x4c/0x88
> > [Sat Mar  1 02:39:52 2025]  el0_svc+0x44/0x108
> > [Sat Mar  1 02:39:52 2025]  el0t_64_sync_handler+0x134/0x160
> > [Sat Mar  1 02:39:52 2025]  el0t_64_sync+0x1b8/0x1c0
> >
> > [1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com
> >
> > Signed-off-by: Mingzhe Yang <mingzhe.yang@...com>
> > Signed-off-by: Lance Yang <ioworker0@...il.com>
> > ---
> >  include/linux/sched.h      |  1 +
> >  include/linux/semaphore.h  | 15 ++++++++++-
> >  kernel/hung_task.c         | 52 ++++++++++++++++++++++++++-----------
> >  kernel/locking/semaphore.c | 53 ++++++++++++++++++++++++++++++++++----
> >  4 files changed, 100 insertions(+), 21 deletions(-)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 0cebdd736d44..5dfdca879ac4 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1219,6 +1219,7 @@ struct task_struct {
> >
> >  #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> >       struct mutex                    *blocker_mutex;
> > +     struct semaphore                *blocker_sem;
>
> Can we make this a union because only one blocker is active at a time?
> Waiman has been suggested the update way;
>
> https://lore.kernel.org/all/9f9150b4-1cf5-4380-b431-419f70775a7d@redhat.com/
>
> If we can use MSB or LSB of those pointers, it is the smallest memory
> footprint.

Nice! Thanks for sharing the link and the idea. I will look into it and see
if we can make it work ;)

>
> >  #endif
> >
> >  #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> > diff --git a/include/linux/semaphore.h b/include/linux/semaphore.h
> > index 04655faadc2d..ca8240a5dbfc 100644
> > --- a/include/linux/semaphore.h
> > +++ b/include/linux/semaphore.h
> > @@ -16,13 +16,25 @@ struct semaphore {
> >       raw_spinlock_t          lock;
> >       unsigned int            count;
> >       struct list_head        wait_list;
> > +
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > +     atomic_long_t last_holder;
> > +#endif
>
> Would we need to make this atomic? I think mutex needs to use it directly
> as a pointer, but debug_show_blocker() searches the task, so we can
> use an unsigned long + WRITE_ONCE()/READ_ONCE().

Good spot! Using 'an unsigned long + WRITE_ONCE()/READ_ONCE()' is
the way to go, so let me give it a try ;)

>
> >  };
> >
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > +#define __LAST_HOLDER_SEMAPHORE_INITIALIZER                          \
> > +     , .last_holder = ATOMIC_LONG_INIT(0)
> > +#else
> > +#define __LAST_HOLDER_SEMAPHORE_INITIALIZER
> > +#endif
> > +
> >  #define __SEMAPHORE_INITIALIZER(name, n)                             \
> >  {                                                                    \
> >       .lock           = __RAW_SPIN_LOCK_UNLOCKED((name).lock),        \
> >       .count          = n,                                            \
> > -     .wait_list      = LIST_HEAD_INIT((name).wait_list),             \
> > +     .wait_list      = LIST_HEAD_INIT((name).wait_list)              \
> > +     __LAST_HOLDER_SEMAPHORE_INITIALIZER                             \
> >  }
> >
> >  /*
> > @@ -47,5 +59,6 @@ extern int __must_check down_killable(struct semaphore *sem);
> >  extern int __must_check down_trylock(struct semaphore *sem);
> >  extern int __must_check down_timeout(struct semaphore *sem, long jiffies);
> >  extern void up(struct semaphore *sem);
> > +extern unsigned long sem_last_holder(struct semaphore *sem);
> >
> >  #endif /* __LINUX_SEMAPHORE_H */
> > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > index ccd7217fcec1..332f555a97a9 100644
> > --- a/kernel/hung_task.c
> > +++ b/kernel/hung_task.c
> > @@ -98,30 +98,52 @@ static struct notifier_block panic_block = {
> >  static void debug_show_blocker(struct task_struct *task)
> >  {
> >       struct task_struct *g, *t;
> > -     unsigned long owner;
> > -     struct mutex *lock;
> > +     unsigned long owner, holder;
> > +     struct semaphore *sem_lock;
> > +     struct mutex *mutex_lock;
> >
> >       RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "No rcu lock held");
> >
> > -     lock = READ_ONCE(task->blocker_mutex);
> > -     if (!lock)
> > -             return;
> > +     mutex_lock = READ_ONCE(task->blocker_mutex);
> > +     if (mutex_lock) {
> > +             owner = mutex_get_owner(mutex_lock);
> > +             if (unlikely(!owner)) {
> > +                     pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
> > +                            task->comm, task->pid);
> > +                     goto blocker_sem;
> > +             }
> >
> > -     owner = mutex_get_owner(lock);
> > -     if (unlikely(!owner)) {
> > -             pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n",
> > -                     task->comm, task->pid);
> > +             /* Ensure the owner information is correct. */
> > +             for_each_process_thread(g, t) {
> > +                     if ((unsigned long)t == owner) {
> > +                             pr_err("INFO: task %s:%d is blocked on a mutex likely owned by task %s:%d.\n",
> > +                                    task->comm, task->pid, t->comm, t->pid);
> > +                             sched_show_task(t);
> > +                             return;
> > +                     }
> > +             }
> >               return;
> >       }
> >
> > -     /* Ensure the owner information is correct. */
> > -     for_each_process_thread(g, t) {
> > -             if ((unsigned long)t == owner) {
> > -                     pr_err("INFO: task %s:%d is blocked on a mutex likely owned by task %s:%d.\n",
> > -                             task->comm, task->pid, t->comm, t->pid);
> > -                     sched_show_task(t);
> > +blocker_sem:
> > +     sem_lock = READ_ONCE(task->blocker_sem);
> > +     if (sem_lock) {
> > +             holder = sem_last_holder(sem_lock);
> > +             if (unlikely(!holder)) {
> > +                     pr_err("INFO: task %s:%d is blocked on a semaphore, but the last holder is not found.\n",
> > +                            task->comm, task->pid);
> >                       return;
> >               }
> > +
> > +             for_each_process_thread(g, t) {
> > +                     if ((unsigned long)t == holder) {
> > +                             pr_err("INFO: task %s:%d blocked on a semaphore likely last held by task %s:%d\n",
> > +                                    task->comm, task->pid, t->comm, t->pid);
> > +                             sched_show_task(t);
> > +                             return;
> > +                     }
> > +             }
> > +             return;
> >       }
> >  }
> >  #else
> > diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
> > index 34bfae72f295..5a684c0a3087 100644
> > --- a/kernel/locking/semaphore.c
> > +++ b/kernel/locking/semaphore.c
> > @@ -39,6 +39,7 @@ static noinline int __down_interruptible(struct semaphore *sem);
> >  static noinline int __down_killable(struct semaphore *sem);
> >  static noinline int __down_timeout(struct semaphore *sem, long timeout);
> >  static noinline void __up(struct semaphore *sem);
> > +static inline void __sem_acquire(struct semaphore *sem);
> >
> >  /**
> >   * down - acquire the semaphore
> > @@ -58,7 +59,7 @@ void __sched down(struct semaphore *sem)
> >       might_sleep();
> >       raw_spin_lock_irqsave(&sem->lock, flags);
> >       if (likely(sem->count > 0))
> > -             sem->count--;
> > +             __sem_acquire(sem);
> >       else
> >               __down(sem);
> >       raw_spin_unlock_irqrestore(&sem->lock, flags);
> > @@ -82,7 +83,7 @@ int __sched down_interruptible(struct semaphore *sem)
> >       might_sleep();
> >       raw_spin_lock_irqsave(&sem->lock, flags);
> >       if (likely(sem->count > 0))
> > -             sem->count--;
> > +             __sem_acquire(sem);
> >       else
> >               result = __down_interruptible(sem);
> >       raw_spin_unlock_irqrestore(&sem->lock, flags);
> > @@ -109,7 +110,7 @@ int __sched down_killable(struct semaphore *sem)
> >       might_sleep();
> >       raw_spin_lock_irqsave(&sem->lock, flags);
> >       if (likely(sem->count > 0))
> > -             sem->count--;
> > +             __sem_acquire(sem);
> >       else
> >               result = __down_killable(sem);
> >       raw_spin_unlock_irqrestore(&sem->lock, flags);
> > @@ -139,7 +140,7 @@ int __sched down_trylock(struct semaphore *sem)
> >       raw_spin_lock_irqsave(&sem->lock, flags);
> >       count = sem->count - 1;
> >       if (likely(count >= 0))
> > -             sem->count = count;
> > +             __sem_acquire(sem);
> >       raw_spin_unlock_irqrestore(&sem->lock, flags);
> >
> >       return (count < 0);
> > @@ -164,7 +165,7 @@ int __sched down_timeout(struct semaphore *sem, long timeout)
> >       might_sleep();
> >       raw_spin_lock_irqsave(&sem->lock, flags);
> >       if (likely(sem->count > 0))
> > -             sem->count--;
> > +             __sem_acquire(sem);
> >       else
> >               result = __down_timeout(sem, timeout);
> >       raw_spin_unlock_irqrestore(&sem->lock, flags);
> > @@ -242,10 +243,18 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> >  {
> >       int ret;
> >
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > +     WRITE_ONCE(current->blocker_sem, sem);
> > +#endif
> > +
> >       trace_contention_begin(sem, 0);
> >       ret = ___down_common(sem, state, timeout);
> >       trace_contention_end(sem, ret);
> >
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > +     WRITE_ONCE(current->blocker_sem, NULL);
> > +#endif
> > +
> >       return ret;
> >  }
> >
> > @@ -274,6 +283,40 @@ static noinline void __sched __up(struct semaphore *sem)
> >       struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,
> >                                               struct semaphore_waiter, list);
> >       list_del(&waiter->list);
> > +
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > +     atomic_long_cmpxchg_release(&sem->last_holder, (unsigned long)current,
> > +                                 0UL);
>
> Ah, you don't need atomic here because __up() is called with sem->lock locked.

Indeed.

> Also, if wait_list is empty, the 'up()' does only sem->count++. In this case,
> the owner is not cleared.

Good catch! I totally missed that and will fix it in the next version.

Thanks a lot again for your time!
Lance


>
> Thank you,
>
> > +#endif
> > +
> >       waiter->up = true;
> >       wake_up_process(waiter->task);
> >  }
> > +
> > +static inline struct task_struct *__holder_task(unsigned long holder)
> > +{
> > +     return (struct task_struct *)holder;
> > +}
> > +
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > +/* Do not use the return value as a pointer directly. */
> > +unsigned long sem_last_holder(struct semaphore *sem)
> > +{
> > +     unsigned long holder = atomic_long_read(&sem->last_holder);
> > +
> > +     return (unsigned long)__holder_task(holder);
> > +}
> > +#else
> > +unsigned long sem_last_holder(struct semaphore *sem)
> > +{
> > +     return 0;
> > +}
> > +#endif
> > +
> > +static inline void __sem_acquire(struct semaphore *sem)
> > +{
> > +     sem->count--;
> > +#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
> > +     atomic_long_set(&sem->last_holder, (unsigned long)current);
> > +#endif
> > +}
> > --
> > 2.45.2
> >
> >
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@...nel.org>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ