linux-kernel - Re: [PATCH 2/2] printk: always report lost messages on serial console

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20170113051112.GF9360@jagdpanzerIV.localdomain>
Date:   Fri, 13 Jan 2017 14:11:12 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Peter Hurley <peter@...leysoftware.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] printk: always report lost messages on serial console

Hi,

On (01/11/17 17:50), Petr Mladek wrote:
> Hi Sergey,
> 
> first, thanks a lot for the detailed description. I have finally
> understood what was important on the "non-important" messages
> and how you used them. I am sorry that I was not able to get
> it earlier.

sure, no prob. I was mostly offline for the past few days for personal
reasons but now I'm back.

[..]
> > now, if the loss of messages was caused by:
> > 
> > a) flood of suppressed loglevel messages
> >    then printing at least some of those messages makes *a lot* of sense.
> > 
> > b) flood of visible loglevel messages
> >    then may be those messages are not so important. there a whole logbuf of
> >    them. per my experience, it is quite hard to overflow the logbuf with
> >    really important, unique, sensible messages of 'visible' loglevel with
> >    active loglevel filtering.
> 
> Just for record, I guess that the same is true also for the messages
> with lower level. I mean that they are repeating as well.

right. those are 100% reproducible, quick to spot and easy to fix, I
guess. a spontaneous explosion is a different/bigger problem. though
the end result is the same -- we lose messages, may be very important
ones. our best effort/goal is to print logbuf content. that's why we
are playing with nmi/safe printk; zap locks; ignore locks state in
some cases; and so on and on. but when we lose messages that were
meant to be printed even _before_ we try to print them out, then our
best effort is sort of void/undefined.

> It would be great to make it easier to throttle the same messages or
> do it a generic way. But this a food for the future work.

yes. syslog tracks "duplicate messages". but I kinda couldn't understand
how helpful it will be in vprintk_emit() /* because it's too late to track
duplicates in console_unlock(). duplicates should not be stored in multiple
instances in the first place */. backtraces are hard to suppress, besides we
shouldn't suppress backtraces I think. log_store() would have to strcmp() or
"hash+compare hashes" current message and the most recent logbuf message.
but bigger concern is -- do people see dropped messages that often to add
duplicate messages tracker to vprintk_emit()? I see dropped messages quite
a lot, but that's just my setup.

> #define KERN_EMERG	KERN_SOH "0"	/* system is unusable */
> #define KERN_ALERT	KERN_SOH "1"	/* action must be taken immediately */
> #define KERN_CRIT	KERN_SOH "2"	/* critical conditions */
> #define KERN_ERR	KERN_SOH "3"	/* error conditions */
> 
> The flood of messages usually means something pretty wrong. But
> it might also be caused by too many or forgotten debug messages.

well. it's still really a lot of forgotten messages. so much that
we have to drop other messages. so I'd say the root cause is less
important (if important at all) as long as the result is "lost messages".

> It think that lost messages belong to the level "2". Note that
> the warning about lost NMI messages and recent printk recursion
> were printed with loglevel '2' as well.
> 
> Would it make sense and be acceptable to ignore the log level
> only when console_level allows to show KERN_CRIT messages?

need to think. what will it improve?

// I'm catching up with the emails, it'll take some time.

	-ss