lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK1f24mGk4pCqf37zXaZbqbTOzLVBqRNnGmf4wEUA9MGYFGoig@mail.gmail.com>
Date: Thu, 24 Oct 2024 11:28:01 +0800
From: Lance Yang <ioworker0@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: cunhuang@...cent.com, leonylgao@...cent.com, j.granados@...sung.com, 
	jsiddle@...hat.com, kent.overstreet@...ux.dev, 21cnbao@...il.com, 
	ryan.roberts@....com, david@...hat.com, ziy@...dia.com, 
	libang.li@...group.com, baolin.wang@...ux.alibaba.com, 
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 0/2] hung_task: add detect count for hung tasks

Hi Andrew,

Thanks a lot for paying attention!

On Thu, Oct 24, 2024 at 10:05 AM Andrew Morton
<akpm@...ux-foundation.org> wrote:
>
> On Tue, 22 Oct 2024 19:47:34 +0800 Lance Yang <ioworker0@...il.com> wrote:
>
> > Hi all,
> >
> > This patchset adds a counter, hung_task_detect_count, to track the number of
> > times hung tasks are detected. This counter provides a straightforward way
> > to monitor hung task events without manually checking dmesg logs.
> >
> > With this counter in place, system issues can be spotted quickly, allowing
> > admins to step in promptly before system load spikes occur, even if the
> > hung_task_warnings value has been decreased to 0 well before.
> >
> > Recently, we encountered a situation where warnings about hung tasks were
> > buried in dmesg logs during load spikes. Introducing this counter could
> > have helped us detect such issues earlier and improve our analysis efficiency.
> >
>
> Isn't the answer to this problem "write a better parser"?  I mean,

Yeah, I certainly agree that having a good parser is important, and I'm
working on that as well ;)

> we're providing userspace with information which is already available.

IHMO, there are two reasons why this counter remains valuable:

1) It allows us to easily detect hung tasks in time before load spikes occur,
using simple and common monitoring tools like Prometheus.

2) It ensures that we remain aware of hung tasks even when the
hung_task_warnings value has already been decreased to 0 well before.

Thanks again for your time!
Lance

>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ