linux-kernel - Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhkbymqobtva6j7xmwzvh5g2tvuixvu2hwfdozed6hijrt3vkl@rywdhz43e52y>
Date: Fri, 25 Apr 2025 13:49:13 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: Petr Mladek <pmladek@...e.com>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>, 
	Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Tomasz Figa <tfiga@...omium.org>, John Ogness <john.ogness@...utronix.de>, 
	Steven Rostedt <rostedt@...dmis.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel

Hi Petr,

On (25/04/24 12:58), Petr Mladek wrote:
> On Thu 2025-04-24 16:02:43, Sergey Senozhatsky wrote:
> > Currently, hung-task watchdog uses two different loglevels
> > to report hung-tasks: a) KERN_INFO for all the important task
> > information (e.g. sched_show_task()) and b)  KERN_ERR for the
> > rest.
> 
> IMHO, the two different loglevels make sense. The KERN_ERR
> message seems to inform about that a task gets blocked for too long.
> And KERN_INFO is used for an extra debug information.
> 
> > This makes it a little inconvenient, especially for
> > automated kernel logs parsing.
> 
> Anyway, what is the exact problem, please?
> Are the KERN_INFO messages filtered because of console_loglevel?
> Or is it a problem to match all the related lines?

The latter one.  A made up example, just to demonstrate what we are
getting now:

<3>[  125.297687][  T140] INFO: task zsh:470 blocked for more than 61 seconds.
<3>[  125.302321][  T140]       Not tainted 6.15.0-rc3-next-20250424-00001-g258d8df78c77-dirty #154
<3>[  125.309333][  T140] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[  125.315040][  T140] task:zsh             state:D stack:0     pid:470   tgid:470   ppid:430    task_flags:0x400100 flags:0x00004002
<6>[  125.320594][  T140] Call Trace:
<6>[  125.322327][  T140]  <TASK>
<6>[  125.323852][  T140]  __schedule+0x13b4/0x2120
<6>[  125.325459][  T140]  ? schedule+0xdc/0x280
<6>[  125.327100][  T140]  schedule+0xdc/0x280
<6>[  125.328590][  T140]  schedule_preempt_disabled+0x10/0x20
<6>[  125.330589][  T140]  __mutex_lock+0x698/0x1200
<6>[  125.332291][  T140]  ? __mutex_lock+0x485/0x1200
<6>[  125.334074][  T140]  mutex_lock+0x81/0x90
<6>[  125.335113][  T140]  drop_caches_sysctl_handler+0x3e/0x140
<6>[  125.336665][  T140]  proc_sys_call_handler+0x327/0x4f0
<6>[  125.338069][  T140]  vfs_write+0x794/0xb60
<6>[  125.339216][  T140]  ? proc_sys_read+0x10/0x10
<6>[  125.340568][  T140]  ksys_write+0xb8/0x170
<6>[  125.341701][  T140]  do_syscall_64+0xd0/0x1a0
<6>[  125.343009][  T140]  ? arch_exit_to_user_mode_prepare+0x11/0x60
<6>[  125.344612][  T140]  ? irqentry_exit_to_user_mode+0x7e/0xa0
<6>[  125.346260][  T140]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
<6>[  125.347772][  T140] RIP: 0033:0x7fa4bd8be687
<6>[  125.348958][  T140] RSP: 002b:00007ffecf417820 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
<6>[  125.351161][  T140] RAX: ffffffffffffffda RBX: 00007fa4bd82e300 RCX: 00007fa4bd8be687
<6>[  125.353221][  T140] RDX: 0000000000000002 RSI: 00005621f5c65860 RDI: 0000000000000001
<6>[  125.355338][  T140] RBP: 00005621f5c65860 R08: 0000000000000000 R09: 0000000000000000
<6>[  125.357424][  T140] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
<6>[  125.359677][  T140] R13: 00007fa4bda175c0 R14: 00007fa4bda14e80 R15: 00007fa4bdb59f70
<6>[  125.361551][  T140]  </TASK>
<3>[  125.362363][  T140] INFO: task zsh:470 is blocked on a mutex likely owned by task zsh:470.
<6>[  125.364467][  T140] task:zsh             state:D stack:0     pid:470   tgid:470   ppid:430    task_flags:0x400100 flags:0x00004002
<6>[  125.367493][  T140] Call Trace:
<6>[  125.368359][  T140]  <TASK>
<6>[  125.369180][  T140]  __schedule+0x13b4/0x2120
<6>[  125.370364][  T140]  ? schedule+0xdc/0x280
<6>[  125.371486][  T140]  schedule+0xdc/0x280
<6>[  125.372518][  T140]  schedule_preempt_disabled+0x10/0x20
<6>[  125.374049][  T140]  __mutex_lock+0x698/0x1200
<6>[  125.375326][  T140]  ? __mutex_lock+0x485/0x1200
<6>[  125.376572][  T140]  mutex_lock+0x81/0x90
<6>[  125.377773][  T140]  drop_caches_sysctl_handler+0x3e/0x140
<6>[  125.379391][  T140]  proc_sys_call_handler+0x327/0x4f0
<6>[  125.380715][  T140]  vfs_write+0x794/0xb60
<6>[  125.381951][  T140]  ? proc_sys_read+0x10/0x10
<6>[  125.383083][  T140]  ksys_write+0xb8/0x170
<6>[  125.384329][  T140]  do_syscall_64+0xd0/0x1a0
<6>[  125.385461][  T140]  ? arch_exit_to_user_mode_prepare+0x11/0x60
<6>[  125.387110][  T140]  ? irqentry_exit_to_user_mode+0x7e/0xa0
<6>[  125.388566][  T140]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
<6>[  125.390194][  T140] RIP: 0033:0x7fa4bd8be687
<6>[  125.391268][  T140] RSP: 002b:00007ffecf417820 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
<6>[  125.393367][  T140] RAX: ffffffffffffffda RBX: 00007fa4bd82e300 RCX: 00007fa4bd8be687
<6>[  125.395506][  T140] RDX: 0000000000000002 RSI: 00005621f5c65860 RDI: 0000000000000001
<6>[  125.397494][  T140] RBP: 00005621f5c65860 R08: 0000000000000000 R09: 0000000000000000
<6>[  125.399701][  T140] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
<6>[  125.401663][  T140] R13: 00007fa4bda175c0 R14: 00007fa4bda14e80 R15: 00007fa4bdb59f70
<6>[  125.403579][  T140]  </TASK>

So there are <3> and then <6> lines and we can't easily tell when to
stop parsing and what to consider part of the automated error report
and what not to, there also can be some other <6> lines in between.

> > Introduce CONFIG_HUNG_TASK_STACKTRACE_LOGLEVEL so that (a)
> > becomes configurable.
> 
> I am not sure if adding hung-task-specific config option is
> the right solution. I guess that other watchdogs or other
> similar reports have the same problem.
>
> It seems that several other reports, for example,
> watchdog_hardlockup_check(), or __die(), are using KERN_DEFAULT
> which is configurable via CONFIG_MESSAGE_LOGLEVEL_DEFAULT.
> 
> A solution might be using KERN_DEFAULT for sched_show_task()
> in hung_tasks detector as well.

So my worry with CONFIG_MESSAGE_LOGLEVEL_DEFAULT was that, if we
set it, say, to 3 (KERN_ERR), then how many "false positives" we
will get?  There are many printk("") calls in the kernel that
default to MESSAGE_LOGLEVEL_DEFAULT, as far as I understand it:
	git grep "printk(\"" | grep -v TP_printk | wc -l
	9001

But maybe that is the solution.  Do you want to switch sched_show_task()
to KERN_DEFAULT for all or would it be better to still introduce
sched_show_task_log_lvl() can call it with KERN_DEFAULT only from
kernel/hung_task.c?