linux-kernel - Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f5nqvveevlfabujnmc7ume54y2zga6ovtxdnpmneynonw5m34@2t677akblpd3>
Date: Wed, 30 Apr 2025 10:34:07 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: Petr Mladek <pmladek@...e.com>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>, 
	Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Tomasz Figa <tfiga@...omium.org>, John Ogness <john.ogness@...utronix.de>, 
	Steven Rostedt <rostedt@...dmis.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel

On (25/04/25 16:55), Petr Mladek wrote:
> > > Anyway, what is the exact problem, please?
> > > Are the KERN_INFO messages filtered because of console_loglevel?
> > > Or is it a problem to match all the related lines?
> > 
> > The latter one.  A made up example, just to demonstrate what we are
> > getting now:
> > 
> 
> Note: I do not have strong opinion. I am rather thinking loudly.

Sure.

> When I look at it. A prefix line:
> 
> <6>[  125.297687][  T140] ------------[ cut here ]------------
> 
> > <3>[  125.297687][  T140] INFO: task zsh:470 blocked for more than 61 seconds.
> > <3>[  125.302321][  T140]       Not tainted 6.15.0-rc3-next-20250424-00001-g258d8df78c77-dirty #154
> > <3>[  125.309333][  T140] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > <6>[  125.315040][  T140] task:zsh             state:D stack:0     pid:470   tgid:470   ppid:430    task_flags:0x400100 flags:0x00004002
> > <6>[  125.320594][  T140] Call Trace:
> > <6>[  125.322327][  T140]  <TASK>
> > <6>[  125.323852][  T140]  __schedule+0x13b4/0x2120
> > <6>[  125.325459][  T140]  ? schedule+0xdc/0x280
> > <6>[  125.327100][  T140]  schedule+0xdc/0x280
> > <6>[  125.328590][  T140]  schedule_preempt_disabled+0x10/0x20
> > <6>[  125.330589][  T140]  __mutex_lock+0x698/0x1200
> > <6>[  125.332291][  T140]  ? __mutex_lock+0x485/0x1200
> > <6>[  125.334074][  T140]  mutex_lock+0x81/0x90
[..]
> > <3>[  125.362363][  T140] INFO: task zsh:470 is blocked on a mutex likely owned by task zsh:470.
> > <6>[  125.364467][  T140] task:zsh             state:D stack:0     pid:470   tgid:470   ppid:430    task_flags:0x400100 flags:0x00004002
> > <6>[  125.367493][  T140] Call Trace:
> > <6>[  125.368359][  T140]  <TASK>
> > <6>[  125.369180][  T140]  __schedule+0x13b4/0x2120
> > <6>[  125.370364][  T140]  ? schedule+0xdc/0x280
> > <6>[  125.371486][  T140]  schedule+0xdc/0x280
> > <6>[  125.372518][  T140]  schedule_preempt_disabled+0x10/0x20
> > <6>[  125.374049][  T140]  __mutex_lock+0x698/0x1200
> > <6>[  125.375326][  T140]  ? __mutex_lock+0x485/0x1200
> > <6>[  125.376572][  T140]  mutex_lock+0x81/0x90
[..]

> and a suffix line
> 
> <6>[  125.403579][  T140] ------------[ cut here ]------------
> 
> might be helpful for both automated monitors and humans => win win solution.

suffix/prefix lines should improve things, I think.  We don't enable
printk-owner in the fleet, we probably should, but we don't for whatever
reason.

[..]
> > So my worry with CONFIG_MESSAGE_LOGLEVEL_DEFAULT was that, if we
> > set it, say, to 3 (KERN_ERR), then how many "false positives" we
> > will get?  There are many printk("") calls in the kernel that
> > default to MESSAGE_LOGLEVEL_DEFAULT, as far as I understand it:
> > 	git grep "printk(\"" | grep -v TP_printk | wc -l
> > 	9001
> 
> Just for record, this still counts also many other printk() wrappers,
> e.g. bpf_printk(), srm_printk(), dprintk().
> 
> It should be more precise with -w option:
> 
> 	$> git grep -w "printk(\"" | wc -l
> 	2830

Good point.

> That said, I agree that it might still mean a lot of false
> positives.

Agreed.

> > But maybe that is the solution.  Do you want to switch sched_show_task()
> > to KERN_DEFAULT for all or would it be better to still introduce
> > sched_show_task_log_lvl() can call it with KERN_DEFAULT only from
> > kernel/hung_task.c?
> 
> If you want to go this way then I would introduce sched_show_task_log_lvl().
> It would allow to fix the various reports incrementally.
> 
> For example, I see sched_show_task() used in show_cpu_pool_hog() in
> kernel/workqueue.c and some other messages are KERN_INFO. So, using
> another log level in sched_show_task() would cause mess here.

Good point.

[..]
> If you want the same loglevel then I really suggest to introduce
> a generic CONFIG option, e.g. CONFIG_CONSISTENT_REPORT_LOGLEVEL
> and use is everywhere to force the consistent loglevel,
> for example, hung taskdetector, workqueue watchdog, soft lockup
> detector, Oops, ...

Yeah, that's a logical next step, if we go this way.

[..]
> int printk_report_start(const char *lvl)
> {
> 	int report_id = atomic_inc_return(&printk_report_count);
> 
> 	if (printk_force_report_loglevel)
> 		printk_force_loglevel_enter(lvl);
> 
> 	printk("%s---------- Report No. %d start --------\n", lvl, report_id);
> 
> 	return report_id;
> }
> 
> void printk_report_end(const char *lvl, int report_id)
> {
> 	printk("%s---------- Report No. %d end --------\n", lvl, report_id);
> 
> 	if (printk_force_report_loglevel)
> 		printk_force_loglevel_exit();
> }

Looks good to me, however

> My problem with the log level manipulation is that I do not know what
> loglevel right.

Excellent point.  Sort of feels like we are back to square one.
If we just KERN_INFO everything, then someone might come up with
a patch to make it KERN_ERR, or ALERT (because they panic the system
right after that print-out and it cannot be INFO.)