linux-kernel - Re: [PATCH 2/2] printk: always report lost messages on serial console

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170111165038.GK20785@pathway.suse.cz>
Date:   Wed, 11 Jan 2017 17:50:38 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc:     Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Peter Hurley <peter@...leysoftware.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] printk: always report lost messages on serial console

Hi Sergey,

first, thanks a lot for the detailed description. I have finally
understood what was important on the "non-important" messages
and how you used them. I am sorry that I was not able to get
it earlier.

On Tue 2017-01-10 17:49:39, Sergey Senozhatsky wrote:
> On (01/09/17 17:56), Petr Mladek wrote:
> > It is possible that your fix is fine. If we lose messages,
> > we are screwed anyway. But I still have problems to accept
> > that we would start printing less important messages (that would
> > normally be ignored) in situation when we have troubles
> > to print the more important ones. This logic rings warning
> > bells in my head and this is why I suggest more conservative
> > solution and ask the many questions.
>
> once the system is in "oh, let me drop some of the messages for you"
> mood, loglevel filtering is unreliable and in some cases unneeded.
> it's so unreliable that I'm even considering disabling it in *in-house*
> builds when console_unlock() detects that there was no room for all
> 'yet to be seen' messages.
> 
> those are another messages, with 'visible' loglevel or with 'suppressed'
> loglevel or both 'visible' and 'suppressed' loglevels, that caused the
> logbuf overflow.
> 
> now, if the loss of messages was caused by:
> 
> a) flood of suppressed loglevel messages
>    then printing at least some of those messages makes *a lot* of sense.
> 
> b) flood of visible loglevel messages
>    then may be those messages are not so important. there a whole logbuf of
>    them. per my experience, it is quite hard to overflow the logbuf with
>    really important, unique, sensible messages of 'visible' loglevel with
>    active loglevel filtering.

Just for record, I guess that the same is true also for the messages
with lower level. I mean that they are repeating as well. It would be
great to make it easier to throttle the same messages or do it a
generic way. But this a food for the future work.

> once the system is out of logbuf space it is impossible to clearly
> distinguish between 'important' and 'not so important' messages. all
> we know in console_unlock(), when we pick up next_idx message, is that
> there is an abnormal/unusual/weird/unexpected/sick/whatever amount of
> messages - 'suppressed' or 'visible' or both. and that's the problem.

It is true that lost messages is a "serious" problem because you might
miss message about a "really" serious problem. The normally important
messages are less useful because they are incomplete. It makes sense
to debug what causes the flood. The key is to ignore loglevel and
print what is being stored.

Your patch makes perfect sense from this point of view. Please,
mention such an explanation in the next iteration of the patch.

Ah, you will kill me. I still have one thing. The levels are defined
like this:

#define KERN_EMERG	KERN_SOH "0"	/* system is unusable */
#define KERN_ALERT	KERN_SOH "1"	/* action must be taken immediately */
#define KERN_CRIT	KERN_SOH "2"	/* critical conditions */
#define KERN_ERR	KERN_SOH "3"	/* error conditions */

The flood of messages usually means something pretty wrong. But
it might also be caused by too many or forgotten debug messages.

It think that lost messages belong to the level "2". Note that
the warning about lost NMI messages and recent printk recursion
were printed with loglevel '2' as well.

Would it make sense and be acceptable to ignore the log level
only when console_level allows to show KERN_CRIT messages?

Best Regards,
Petr