linux-kernel - Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAAFQd5A6J-UCy46bp1MYP0imJf3oUL29mxFVLZZZ4JmP2YTvhQ@mail.gmail.com>
Date: Fri, 25 Apr 2025 15:58:46 +0900
From: Tomasz Figa <tfiga@...omium.org>
To: Petr Mladek <pmladek@...e.com>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>, Ingo Molnar <mingo@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	John Ogness <john.ogness@...utronix.de>, Steven Rostedt <rostedt@...dmis.org>, 
	Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel

Hi Petr,

On Thu, Apr 24, 2025 at 7:59 PM Petr Mladek <pmladek@...e.com> wrote:
>
> On Thu 2025-04-24 16:02:43, Sergey Senozhatsky wrote:
> > Currently, hung-task watchdog uses two different loglevels
> > to report hung-tasks: a) KERN_INFO for all the important task
> > information (e.g. sched_show_task()) and b)  KERN_ERR for the
> > rest.
>
> IMHO, the two different loglevels make sense. The KERN_ERR
> message seems to inform about that a task gets blocked for too long.
> And KERN_INFO is used for an extra debug information.
>

I agree that two different levels make sense, but I think that
KERN_INFO is not necessarily the best one to use, because we have
quite a lot of usual/expected things logged with that level, but this
clearly is not an unusual/expected event that we're logging.

My preference would be on KERN_NOTICE.

> > This makes it a little inconvenient, especially for
> > automated kernel logs parsing.
>
> Anyway, what is the exact problem, please?
> Are the KERN_INFO messages filtered because of console_loglevel?
> Or is it a problem to match all the related lines?

The problem is that when we're looking at the hundreds of reports with
various problems from the production fleet, we want to be able to
filter out some of the usual/expected logs. The easiest way to do it
is by using the message log level. However, if we set the filters to
anything more severe than KERNEL_INFO, we lose the task dumps and we
need to go and fetch the entire unfiltered log, which is tedious.

(FWIW, we're also developing an automated analysis tool and it would
make the implementation much easier if we could simply use the log
level to filter out expected vs unexpected events from the logs - and
most of the time that already works, the case Sergey's patch is
addressing is just one of the small number of exceptions.)

>
> > Introduce CONFIG_HUNG_TASK_STACKTRACE_LOGLEVEL so that (a)
> > becomes configurable.
>
> I am not sure if adding hung-task-specific config option is
> the right solution. I guess that other watchdogs or other
> similar reports have the same problem.
>
> It seems that several other reports, for example,
> watchdog_hardlockup_check(), or __die(), are using KERN_DEFAULT
> which is configurable via CONFIG_MESSAGE_LOGLEVEL_DEFAULT.
>
> A solution might be using KERN_DEFAULT for sched_show_task()
> in hung_tasks detector as well.

I have to admit that I don't really know what else KERN_DEFAULT is
used for, but wouldn't that mean that again some typical messages
would end up being mixed in with messages for unexpected events?

>
> Alternatively, if the problem is console_loglevel filtering then
> it might make sense to create a config option which would force
> using the same loglevel in all similar reports. I would call it:
>
>    CONFIG_FULL_REPORT_USING_SAME_LOGLEVEL
>
> And support it for other reports.

I think that would work for us too, but I kind of also think that
having two different levels for the main part and then a higher (lower
severity) one for the other tasks makes sense and would be useful for
our analysis too.

>
> If the problem is matching all related lines. Then a solution
> would be printing some help lines around the report, similar
> to
>
>     ------------[ cut here ]------------
>
> in include/asm-generic/bug.h
>
> Plus, it would be needed to filter out messages from other CPUs.
> CONFIG_PRINTK_CALLER should help with this.

I'm not really in love with that idea - it would make things so much
more complicated, despite already having the right tool to
differentiate between the importance of various logs - after all the
log level is exactly that.

Best,
Tomasz