linux-kernel - Re: kernel CI: printk loglevels in kernel boot logs?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171122113448.sllaehmlqae7telk@pathway.suse.cz>
Date:   Wed, 22 Nov 2017 12:34:48 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     Fengguang Wu <fengguang.wu@...el.com>
Cc:     Kevin Hilman <khilman@...libre.com>,
        Mark Brown <broonie@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: kernel CI: printk loglevels in kernel boot logs?

On Wed 2017-11-22 11:27:02, Fengguang Wu wrote:
> [CC LKML for possible printk improvements]
> 
> On Wed, Nov 22, 2017 at 09:56:10AM +0800, Fengguang Wu wrote:
> > Hi Kevin,
> > 
> > On Tue, Nov 21, 2017 at 12:27:48PM -0800, Kevin Hilman wrote:
> > > Hi Fenguang,
> > > 
> > > In automated testing, for ease of parsing kernel boot logs (especially
> > > separating warnings and errors from debug, info etc.)

Would be enough to use console_loglevel to filter out the non-interesting
messages?

> > > Right now we can get this info from a "dmesg --raw" after bootup, but
> > > it would be really nice in certain automation frameworks to have a
> > > kernel command-line option to enable printing of loglevels in default
> > > boot log.
> > 
> > Agreed.
> > 
> > > This is espeically useful when ingesting kernel logs into advanced
> > > search/analytics frameworks (I'm playing with and ELK stack: Elastic
> > > Search, Logstash, Kibana).

This suggests that people are already filtering the kernel log using
external tools. Therefore there is a danger that the format might
have already became the kernel ABI and we could not change it easily.

A solution would be to printk the loglevel after the timestamp.
But this would be ugly because dmesg --raw prints it at the beginning.

To make it clear. I understand that "show_loglevel" command line argument
would be useful for you. But I am afraid that it is not worth changing
the format. There would need to be wide interest into the change.
Also there would need to be evidence that the existing solutions
(dmesg --raw, console_loglevel) are not enough in many real life
scenarios.

> [...]
> 
> > As you may know I'm currently reporting kernel oops in mainline
> > kernel, hoping to clear noisy oops there -- they obviously hurt
> > catching and bisecting new oops.
> > 
> > I have to say the warnings are much more noisy than kernel oops
> > in 2 ways:
> >
> > 1) It's common for a *normal* kernel boot to have a dozen of old
> > warnings.

I guess that you solve this by a "clever" diff between old and new
log. The extra info suggested below might help to filter out
some messages. But the problems will still be there.
I am not sure that the categories are worth it, see below.


> > 2) Many warnings do not necessarily mean something should or could be
> > fixed -- they may well be mentioning some HW problem, or an alert
> > message to the user.
> > 
> > So there is a much bigger and messy problem than catching the warnings:
> > ways to effectively mark or filter real warnings that automated testing
> > should catch and report.
> >
> > For filtering, we currently accumulated the below blacklist:
> > 
> > https://github.com/fengguang/lkp-tests/blob/master/etc/kmsg-blacklist
> > 
> [...]
> 
> For the marking part, I wonder if there can be a clear rule that
> allows developers to distinguish 2 kind of information for users and
> testers:
> 
> - "bug" message: indicating a regression that should be reported and fixed
> 
> - "fact" message: well there's a problem, but we kernel developers
>  probably can do nothing about it. It's just to let the user know
>  about the fact. The fix might be something like replacing a broken
>  disk drive.
> 
> Those message types are orthogonal to severity of the problem (log
> levels), so the current log levels are actually not sufficient for
> distinguishing these kind of situations.

It is interesting idea but I am afraid that it would not work
in reality from several reasons:

  + It is yet another complexity. I am not sure that developers would
    want to think about it when creating new messages.


  + Who would set this for the existing ~100k messages?


  + If I get it correctly, the motivation is to help normal users
    to decide when a kernel bug should get reported. This might
    heavily increase the number of poor bug reports.

    The recent discussion about kernel regression suggests
    that there we are not ready for this, see
    https://lwn.net/Articles/738216/


  + The proposal it rather strigthforward. But does all messages
    clearly fit into one of the two categories? What about
    messages that are not about problems? For example, most
    of pr_info(), pr_notice() messages?


  + Are the two categories worth the effort? Especially when the
    second one might still eventually be a kernel bug?

    If we add more categories, it might get too complicated.
    For example, like setting severities and priorities
    in the classic bugzilla.


Best Regards,
Petr