linux-kernel - Re: [PATCHv6 5/7] printk: report lost messages in printk safe/nmi contexts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20161223150827.GA7464@tigerII.localdomain>
Date:   Sat, 24 Dec 2016 00:08:27 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky@...il.com>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jan Kara <jack@...e.cz>, Tejun Heo <tj@...nel.org>,
        Calvin Owens <calvinowens@...com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Hurley <peter@...leysoftware.com>,
        linux-kernel@...r.kernel.org,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Subject: Re: [PATCHv6 5/7] printk: report lost messages in printk safe/nmi
 contexts

Hello,

On (12/23/16 11:54), Petr Mladek wrote:
[..]
> There is a potential race:
> 
> CPU0					CPU1
> 
> printk_safe_log_store()
>   len = atomic_read(&s->len);
> 
> 					__printk_save_flush()
> 
> 					  atomic_cmpxchg(&s->len, len, 0)
> 
> 					  report_message_lost(s);
> 
>    if (len >= sizeof(s->buffer) - 1) {
> 	atomic_inc(&s->message_lost);
> 		return 0;
> 
> We check the outdated len, account lost message, but it will not
> be reported until some other message appears in the log buffer.
> 
> > +
> >  out:
> 
> I would make sense to move report_message_lost(s) here, after
> the out: label.

hm, ok. to flush from another CPU we first need to have printk-safe/nmi
messages on that CPU, then return from printk-safe/nmi on that CPU, execute
per-CPU irq_wor, and then have concurrent printk-safe/nmi messages on current
CPU, in addition happening frequent enough to hit this case. I may be wrong,
but that's quite unlikely. I can move report_message_lost() to `out' label,
no problem. thanks for the report.

at some point I was actually considering turning ->message_lost into
'bool' -- "we lost your messages, we are sorry". the precise number of
lost messages doesn't help that much: the messages are gone, go and
increment CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT; that's all we can say now.

and speaking of lost messages. I think I found a regression in
console_unlock(). so I'll send out a fix ahead of this series.

and, besides, the logs I had a pleasure to look at today contained numerous
"%d printk messages dropped" with very accurate numbers, but those numbers
meant pretty much nothing to me - the messages were lost.

	-ss