linux-kernel - Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170424021747.GA630@jagdpanzerIV.localdomain>
Date:   Mon, 24 Apr 2017 11:17:47 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Russell King <rmk+kernel@....linux.org.uk>,
        Daniel Thompson <daniel.thompson@...aro.org>,
        Jiri Kosina <jkosina@...e.com>, Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Chris Metcalf <cmetcalf@...hip.com>,
        linux-kernel@...r.kernel.org, x86@...nel.org,
        linux-arm-kernel@...ts.infradead.org,
        adi-buildroot-devel@...ts.sourceforge.net,
        linux-cris-kernel@...s.com, linux-mips@...ux-mips.org,
        linuxppc-dev@...ts.ozlabs.org, linux-s390@...r.kernel.org,
        linux-sh@...r.kernel.org, sparclinux@...r.kernel.org,
        Jan Kara <jack@...e.cz>, Ralf Baechle <ralf@...ux-mips.org>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        David Miller <davem@...emloft.net>
Subject: Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in
 NMI

On (04/21/17 14:06), Petr Mladek wrote:
[..]
> > I agree that this_cpu_read(printk_context) covers slightly more than
> > logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
> > with unlocked logbuf_lock, but I don't tend to think that it's a big
> > problem.
> 
> PRINTK_SAFE_CONTEXT is set also in call_console_drivers().
> It might take rather long and logbuf_lock is availe. So, it is
> noticeable source of false positives.

yes, agree.

probably we need additional printk_safe annotations for
		"logbuf_lock is locked from _this_ CPU"

false positives there can be very painful.

[..]
> 	if (raw_spin_is_locked(&logbuf_lock))
> 		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> 	else
> 		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);

well, if everyone is fine with logbuf_lock access from every CPU from every
NMI then I won't object either. but may be it makes sense to reduce the
possibility of false positives. Steven is loosing critically important logs,
after all.

by the way,
does this `nmi_print_seq' bypass even fix anything for Steven? it sort of
can, in theory, but just in theory. so may be we need direct message flush
from NMI handler (printk->console_unlock), which will be a really big problem.

logbuf might not be big enough for 4890096 messages (Steven's report
mentions "Lost 4890096 message(s)!"). we are counting on the fact that
in case of `nmi_print_seq' bypass some other CPU will call console_unlock()
and print pending logbuf messages, but this is not guaranteed and the
messages can be dropped even from logbuf.

I don't know,
should we try to queue printk_deferred irq_work for all online CPUs from
vprintk_nmi() when it bypasses printk_safe_log_store()? in order to minimize
possibilities of logbuf overflow. printk_deferred() will queue work on
vprintk_nmi() CPU, sure, but we don't know how many messages we are going
to add to logbuf from NMI.

> > @@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
> >  {
> >         struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
> >  
> > -       return printk_safe_log_store(s, fmt, args);
> > +       if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
> > +               return printk_safe_log_store(s, fmt, args);
> > +
> > +       return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> >  }
> 
> It looks simple but some things are missing. It will be used also
> outside panic/oops, so it should queue the irq_work to flush the console.

you are right. I thought about moving irq_work to vprintk_emit(), but
completely forgot about it. without that missing bit the proposed two-liner
is not complete.

	-ss