[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhkbymqobtva6j7xmwzvh5g2tvuixvu2hwfdozed6hijrt3vkl@rywdhz43e52y>
Date: Fri, 25 Apr 2025 13:49:13 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: Petr Mladek <pmladek@...e.com>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Tomasz Figa <tfiga@...omium.org>, John Ogness <john.ogness@...utronix.de>,
Steven Rostedt <rostedt@...dmis.org>, Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel
Hi Petr,
On (25/04/24 12:58), Petr Mladek wrote:
> On Thu 2025-04-24 16:02:43, Sergey Senozhatsky wrote:
> > Currently, hung-task watchdog uses two different loglevels
> > to report hung-tasks: a) KERN_INFO for all the important task
> > information (e.g. sched_show_task()) and b) KERN_ERR for the
> > rest.
>
> IMHO, the two different loglevels make sense. The KERN_ERR
> message seems to inform about that a task gets blocked for too long.
> And KERN_INFO is used for an extra debug information.
>
> > This makes it a little inconvenient, especially for
> > automated kernel logs parsing.
>
> Anyway, what is the exact problem, please?
> Are the KERN_INFO messages filtered because of console_loglevel?
> Or is it a problem to match all the related lines?
The latter one. A made up example, just to demonstrate what we are
getting now:
<3>[ 125.297687][ T140] INFO: task zsh:470 blocked for more than 61 seconds.
<3>[ 125.302321][ T140] Not tainted 6.15.0-rc3-next-20250424-00001-g258d8df78c77-dirty #154
<3>[ 125.309333][ T140] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[ 125.315040][ T140] task:zsh state:D stack:0 pid:470 tgid:470 ppid:430 task_flags:0x400100 flags:0x00004002
<6>[ 125.320594][ T140] Call Trace:
<6>[ 125.322327][ T140] <TASK>
<6>[ 125.323852][ T140] __schedule+0x13b4/0x2120
<6>[ 125.325459][ T140] ? schedule+0xdc/0x280
<6>[ 125.327100][ T140] schedule+0xdc/0x280
<6>[ 125.328590][ T140] schedule_preempt_disabled+0x10/0x20
<6>[ 125.330589][ T140] __mutex_lock+0x698/0x1200
<6>[ 125.332291][ T140] ? __mutex_lock+0x485/0x1200
<6>[ 125.334074][ T140] mutex_lock+0x81/0x90
<6>[ 125.335113][ T140] drop_caches_sysctl_handler+0x3e/0x140
<6>[ 125.336665][ T140] proc_sys_call_handler+0x327/0x4f0
<6>[ 125.338069][ T140] vfs_write+0x794/0xb60
<6>[ 125.339216][ T140] ? proc_sys_read+0x10/0x10
<6>[ 125.340568][ T140] ksys_write+0xb8/0x170
<6>[ 125.341701][ T140] do_syscall_64+0xd0/0x1a0
<6>[ 125.343009][ T140] ? arch_exit_to_user_mode_prepare+0x11/0x60
<6>[ 125.344612][ T140] ? irqentry_exit_to_user_mode+0x7e/0xa0
<6>[ 125.346260][ T140] entry_SYSCALL_64_after_hwframe+0x4b/0x53
<6>[ 125.347772][ T140] RIP: 0033:0x7fa4bd8be687
<6>[ 125.348958][ T140] RSP: 002b:00007ffecf417820 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
<6>[ 125.351161][ T140] RAX: ffffffffffffffda RBX: 00007fa4bd82e300 RCX: 00007fa4bd8be687
<6>[ 125.353221][ T140] RDX: 0000000000000002 RSI: 00005621f5c65860 RDI: 0000000000000001
<6>[ 125.355338][ T140] RBP: 00005621f5c65860 R08: 0000000000000000 R09: 0000000000000000
<6>[ 125.357424][ T140] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
<6>[ 125.359677][ T140] R13: 00007fa4bda175c0 R14: 00007fa4bda14e80 R15: 00007fa4bdb59f70
<6>[ 125.361551][ T140] </TASK>
<3>[ 125.362363][ T140] INFO: task zsh:470 is blocked on a mutex likely owned by task zsh:470.
<6>[ 125.364467][ T140] task:zsh state:D stack:0 pid:470 tgid:470 ppid:430 task_flags:0x400100 flags:0x00004002
<6>[ 125.367493][ T140] Call Trace:
<6>[ 125.368359][ T140] <TASK>
<6>[ 125.369180][ T140] __schedule+0x13b4/0x2120
<6>[ 125.370364][ T140] ? schedule+0xdc/0x280
<6>[ 125.371486][ T140] schedule+0xdc/0x280
<6>[ 125.372518][ T140] schedule_preempt_disabled+0x10/0x20
<6>[ 125.374049][ T140] __mutex_lock+0x698/0x1200
<6>[ 125.375326][ T140] ? __mutex_lock+0x485/0x1200
<6>[ 125.376572][ T140] mutex_lock+0x81/0x90
<6>[ 125.377773][ T140] drop_caches_sysctl_handler+0x3e/0x140
<6>[ 125.379391][ T140] proc_sys_call_handler+0x327/0x4f0
<6>[ 125.380715][ T140] vfs_write+0x794/0xb60
<6>[ 125.381951][ T140] ? proc_sys_read+0x10/0x10
<6>[ 125.383083][ T140] ksys_write+0xb8/0x170
<6>[ 125.384329][ T140] do_syscall_64+0xd0/0x1a0
<6>[ 125.385461][ T140] ? arch_exit_to_user_mode_prepare+0x11/0x60
<6>[ 125.387110][ T140] ? irqentry_exit_to_user_mode+0x7e/0xa0
<6>[ 125.388566][ T140] entry_SYSCALL_64_after_hwframe+0x4b/0x53
<6>[ 125.390194][ T140] RIP: 0033:0x7fa4bd8be687
<6>[ 125.391268][ T140] RSP: 002b:00007ffecf417820 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
<6>[ 125.393367][ T140] RAX: ffffffffffffffda RBX: 00007fa4bd82e300 RCX: 00007fa4bd8be687
<6>[ 125.395506][ T140] RDX: 0000000000000002 RSI: 00005621f5c65860 RDI: 0000000000000001
<6>[ 125.397494][ T140] RBP: 00005621f5c65860 R08: 0000000000000000 R09: 0000000000000000
<6>[ 125.399701][ T140] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
<6>[ 125.401663][ T140] R13: 00007fa4bda175c0 R14: 00007fa4bda14e80 R15: 00007fa4bdb59f70
<6>[ 125.403579][ T140] </TASK>
So there are <3> and then <6> lines and we can't easily tell when to
stop parsing and what to consider part of the automated error report
and what not to, there also can be some other <6> lines in between.
> > Introduce CONFIG_HUNG_TASK_STACKTRACE_LOGLEVEL so that (a)
> > becomes configurable.
>
> I am not sure if adding hung-task-specific config option is
> the right solution. I guess that other watchdogs or other
> similar reports have the same problem.
>
> It seems that several other reports, for example,
> watchdog_hardlockup_check(), or __die(), are using KERN_DEFAULT
> which is configurable via CONFIG_MESSAGE_LOGLEVEL_DEFAULT.
>
> A solution might be using KERN_DEFAULT for sched_show_task()
> in hung_tasks detector as well.
So my worry with CONFIG_MESSAGE_LOGLEVEL_DEFAULT was that, if we
set it, say, to 3 (KERN_ERR), then how many "false positives" we
will get? There are many printk("") calls in the kernel that
default to MESSAGE_LOGLEVEL_DEFAULT, as far as I understand it:
git grep "printk(\"" | grep -v TP_printk | wc -l
9001
But maybe that is the solution. Do you want to switch sched_show_task()
to KERN_DEFAULT for all or would it be better to still introduce
sched_show_task_log_lvl() can call it with KERN_DEFAULT only from
kernel/hung_task.c?
Powered by blists - more mailing lists