[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f5nqvveevlfabujnmc7ume54y2zga6ovtxdnpmneynonw5m34@2t677akblpd3>
Date: Wed, 30 Apr 2025 10:34:07 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: Petr Mladek <pmladek@...e.com>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Tomasz Figa <tfiga@...omium.org>, John Ogness <john.ogness@...utronix.de>,
Steven Rostedt <rostedt@...dmis.org>, Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] hung_task: configurable hung-task stacktrace loglevel
On (25/04/25 16:55), Petr Mladek wrote:
> > > Anyway, what is the exact problem, please?
> > > Are the KERN_INFO messages filtered because of console_loglevel?
> > > Or is it a problem to match all the related lines?
> >
> > The latter one. A made up example, just to demonstrate what we are
> > getting now:
> >
>
> Note: I do not have strong opinion. I am rather thinking loudly.
Sure.
> When I look at it. A prefix line:
>
> <6>[ 125.297687][ T140] ------------[ cut here ]------------
>
> > <3>[ 125.297687][ T140] INFO: task zsh:470 blocked for more than 61 seconds.
> > <3>[ 125.302321][ T140] Not tainted 6.15.0-rc3-next-20250424-00001-g258d8df78c77-dirty #154
> > <3>[ 125.309333][ T140] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > <6>[ 125.315040][ T140] task:zsh state:D stack:0 pid:470 tgid:470 ppid:430 task_flags:0x400100 flags:0x00004002
> > <6>[ 125.320594][ T140] Call Trace:
> > <6>[ 125.322327][ T140] <TASK>
> > <6>[ 125.323852][ T140] __schedule+0x13b4/0x2120
> > <6>[ 125.325459][ T140] ? schedule+0xdc/0x280
> > <6>[ 125.327100][ T140] schedule+0xdc/0x280
> > <6>[ 125.328590][ T140] schedule_preempt_disabled+0x10/0x20
> > <6>[ 125.330589][ T140] __mutex_lock+0x698/0x1200
> > <6>[ 125.332291][ T140] ? __mutex_lock+0x485/0x1200
> > <6>[ 125.334074][ T140] mutex_lock+0x81/0x90
[..]
> > <3>[ 125.362363][ T140] INFO: task zsh:470 is blocked on a mutex likely owned by task zsh:470.
> > <6>[ 125.364467][ T140] task:zsh state:D stack:0 pid:470 tgid:470 ppid:430 task_flags:0x400100 flags:0x00004002
> > <6>[ 125.367493][ T140] Call Trace:
> > <6>[ 125.368359][ T140] <TASK>
> > <6>[ 125.369180][ T140] __schedule+0x13b4/0x2120
> > <6>[ 125.370364][ T140] ? schedule+0xdc/0x280
> > <6>[ 125.371486][ T140] schedule+0xdc/0x280
> > <6>[ 125.372518][ T140] schedule_preempt_disabled+0x10/0x20
> > <6>[ 125.374049][ T140] __mutex_lock+0x698/0x1200
> > <6>[ 125.375326][ T140] ? __mutex_lock+0x485/0x1200
> > <6>[ 125.376572][ T140] mutex_lock+0x81/0x90
[..]
> and a suffix line
>
> <6>[ 125.403579][ T140] ------------[ cut here ]------------
>
> might be helpful for both automated monitors and humans => win win solution.
suffix/prefix lines should improve things, I think. We don't enable
printk-owner in the fleet, we probably should, but we don't for whatever
reason.
[..]
> > So my worry with CONFIG_MESSAGE_LOGLEVEL_DEFAULT was that, if we
> > set it, say, to 3 (KERN_ERR), then how many "false positives" we
> > will get? There are many printk("") calls in the kernel that
> > default to MESSAGE_LOGLEVEL_DEFAULT, as far as I understand it:
> > git grep "printk(\"" | grep -v TP_printk | wc -l
> > 9001
>
> Just for record, this still counts also many other printk() wrappers,
> e.g. bpf_printk(), srm_printk(), dprintk().
>
> It should be more precise with -w option:
>
> $> git grep -w "printk(\"" | wc -l
> 2830
Good point.
> That said, I agree that it might still mean a lot of false
> positives.
Agreed.
> > But maybe that is the solution. Do you want to switch sched_show_task()
> > to KERN_DEFAULT for all or would it be better to still introduce
> > sched_show_task_log_lvl() can call it with KERN_DEFAULT only from
> > kernel/hung_task.c?
>
> If you want to go this way then I would introduce sched_show_task_log_lvl().
> It would allow to fix the various reports incrementally.
>
> For example, I see sched_show_task() used in show_cpu_pool_hog() in
> kernel/workqueue.c and some other messages are KERN_INFO. So, using
> another log level in sched_show_task() would cause mess here.
Good point.
[..]
> If you want the same loglevel then I really suggest to introduce
> a generic CONFIG option, e.g. CONFIG_CONSISTENT_REPORT_LOGLEVEL
> and use is everywhere to force the consistent loglevel,
> for example, hung taskdetector, workqueue watchdog, soft lockup
> detector, Oops, ...
Yeah, that's a logical next step, if we go this way.
[..]
> int printk_report_start(const char *lvl)
> {
> int report_id = atomic_inc_return(&printk_report_count);
>
> if (printk_force_report_loglevel)
> printk_force_loglevel_enter(lvl);
>
> printk("%s---------- Report No. %d start --------\n", lvl, report_id);
>
> return report_id;
> }
>
> void printk_report_end(const char *lvl, int report_id)
> {
> printk("%s---------- Report No. %d end --------\n", lvl, report_id);
>
> if (printk_force_report_loglevel)
> printk_force_loglevel_exit();
> }
Looks good to me, however
> My problem with the log level manipulation is that I do not know what
> loglevel right.
Excellent point. Sort of feels like we are back to square one.
If we just KERN_INFO everything, then someone might come up with
a patch to make it KERN_ERR, or ALERT (because they panic the system
right after that print-out and it cannot be INFO.)
Powered by blists - more mailing lists