linux-kernel - Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aAuq-3yjYM97rvj1@pathway.suse.cz>
Date: Fri, 25 Apr 2025 17:32:11 +0200
From: Petr Mladek <pmladek@...e.com>
To: Tomasz Figa <tfiga@...omium.org>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	John Ogness <john.ogness@...utronix.de>,
	Steven Rostedt <rostedt@...dmis.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel

On Fri 2025-04-25 15:58:46, Tomasz Figa wrote:
> Hi Petr,
> 
> On Thu, Apr 24, 2025 at 7:59 PM Petr Mladek <pmladek@...e.com> wrote:
> >
> > On Thu 2025-04-24 16:02:43, Sergey Senozhatsky wrote:
> > > Currently, hung-task watchdog uses two different loglevels
> > > to report hung-tasks: a) KERN_INFO for all the important task
> > > information (e.g. sched_show_task()) and b)  KERN_ERR for the
> > > rest.
> >
> > IMHO, the two different loglevels make sense. The KERN_ERR
> > message seems to inform about that a task gets blocked for too long.
> > And KERN_INFO is used for an extra debug information.
> >
> 
> I agree that two different levels make sense, but I think that
> KERN_INFO is not necessarily the best one to use, because we have
> quite a lot of usual/expected things logged with that level, but this
> clearly is not an unusual/expected event that we're logging.
> 
> My preference would be on KERN_NOTICE.

Sigh, this is the problem with loglevels. Different people have
different feeling about them.

A solution would be to add an extra log level. But the full 0-7
(3 bit) range is already taken.

> > > This makes it a little inconvenient, especially for
> > > automated kernel logs parsing.
> >
> > Anyway, what is the exact problem, please?
> > Are the KERN_INFO messages filtered because of console_loglevel?
> > Or is it a problem to match all the related lines?
> 
> The problem is that when we're looking at the hundreds of reports with
> various problems from the production fleet, we want to be able to
> filter out some of the usual/expected logs. The easiest way to do it
> is by using the message log level. However, if we set the filters to
> anything more severe than KERNEL_INFO, we lose the task dumps and we
> need to go and fetch the entire unfiltered log, which is tedious.

Good to know.

This might be an argument for using the same log level for the entire
report. But it might create new problems. It would be more complicated
to filter-out known problems. I mean that a single known
warning/error/emergency message can be filtered easily. But
creating a filter for the entire to-be-ignored backtrace is more
complicated.


> (FWIW, we're also developing an automated analysis tool and it would
> make the implementation much easier if we could simply use the log
> level to filter out expected vs unexpected events from the logs - and
> most of the time that already works, the case Sergey's patch is
> addressing is just one of the small number of exceptions.)

It might be interesting to see the list of exceptions. Maybe, we
could find some common pattern...

It would be nice to handle all the reports of critical situations
similar way. It would help everyone. This is why I am not happy with
a hung-stask-detector-specific setting.

> > If the problem is matching all related lines. Then a solution
> > would be printing some help lines around the report, similar
> > to
> >
> >     ------------[ cut here ]------------
> >
> > in include/asm-generic/bug.h
> >
> > Plus, it would be needed to filter out messages from other CPUs.
> > CONFIG_PRINTK_CALLER should help with this.
> 
> I'm not really in love with that idea - it would make things so much
> more complicated, despite already having the right tool to
> differentiate between the importance of various logs - after all the
> log level is exactly that.

Honestly, the more I think about it the more I like the prefix/postfix
lines + the caller_id. I am afraid that manipulating log levels is a
lost fight  because different people might have different opinion
about how various messages are important.

Best Regards,
Petr