linux-kernel - Re: [RFC][PATCH] printk: Fixup the nmi printk mess

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150612120037.GR3644@twins.programming.kicks-ass.net>
Date:	Fri, 12 Jun 2015 14:00:37 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Petr Mladek <pmladek@...e.cz>
Cc:	Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
	jkosina@...e.cz, paulmck@...ux.vnet.ibm.com,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RFC][PATCH] printk: Fixup the nmi printk mess

On Thu, Jun 11, 2015 at 04:55:47PM +0200, Petr Mladek wrote:
> The output from "echo l >/proc/sysrq-trigger" looks reasonable.
> It does not mix output from different CPUs. This is expected
> because of the @lock.
> 
> The other observation is that it prints CPUs in _random_ order:
> 28, 24, 25, 1, 26, 2, 27, 3, ...
> 
> The order is fine when I disable the irq_work.
> 
> It means that irq_works are usually faster than printk_nmi_flush() =>
> printk_nmi_flush() is not that useful => all the complexity with
> the three atomic variables (head, tail, read) did not bring
> much win.

The reason I did the printk_nmi_flush() is that the irq_work might
_never_ happen if the cpu in question is stuck with interrupts disabled.

Do something like:

	local_irq_disable();
	for (;;)
		;
	local_irq_enable(); /* like this'll ever happen */

And you won't see the offending stacktrace that makes your machine a
useless piece of power sucking junk.

> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index c099b082cd02..99bfc1e3f32a 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1821,13 +1821,200 @@ int vprintk_default(const char *fmt, va_list args)
> > +static void __printk_nmi_flush(struct irq_work *work)
> > +{
> > +	static raw_spinlock_t lock = __RAW_SPIN_LOCK_INITIALIZER(lock);
> > +	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
> > +	int len, head, size, i, last_i;
> > +
> > +again:
> > +	/*
> > +	 * vprintk_nmi()	truncate
> > +	 *
> > +	 * [S] head		[S] head
> > +	 *     wmb		mb
> > +	 * [S] tail		[S] read
> 
> BTW, this is quite cryptic for me. Coffee did not help :-)

It states that those two sites store head first, then issue a barrier
and then store something else. This means we need to read the lot in the
reverse order and issue a matching barrier.

> 	 *
> > +	 * therefore:
> > +	 */
> > +	i = atomic_read(&s->read);
> > +	len = atomic_read(&s->tail); /* up to the tail is stable */
> > +	smp_rmb();
> > +	head = atomic_read(&s->head);
> > +
> > +	/*
> > +	 * We cannot truncate tail because it could overwrite a store from
> > +	 * vprintk_nmi(), however vprintk_nmi() will always update tail to the
> > +	 * correct value.
> > +	 *
> > +	 * Therefore if head < tail, we missed a truncate and should do so now.
> > +	 */
> > +	if (head < len)
> > +		len = 0;
> 
> This is a bit confusing. It is a complicated way how to return on the next test.

Yes, but its two separate pieces of logic. The above deals with the
truncate not being able to also reset tail. While the below test is a
'simple' do I have to actually do anything test.

> If I get this correctly. This might happen only inside
> _printk_nmi_flush() called on another CPU (from
> printk_nmi_flush()) when it interferes with the queued
> irq_work. The irq_work is faster and truncates the buffer.
> 
> So, the return is fine after all because the irq_work printed
> everything.

Either that, or another NMI happened while we started the irq_work, this
will re(enqueue) the irq_work, this one will in fact observe all data,
print the whole buffer and leave an empty buffer behind.

Then the irq_work happens _again_, sees an empty buffer and figures
that's ok, nothing to do.

> > +	if (len - i <= 0) /* nothing to do */
> > +		return;
> > +	/*
> > +	 * 'Consume' this chunk, avoids concurrent callers printing the same
> > +	 * stuff.
> > +	 */
> > +	if (atomic_cmpxchg(&s->read, i, len) != i)
> > +		goto again;
> 
> I think that this is racy:

Indeed. I'll think a bit on that.

> I think that ordering CPUs is not worth it. I would go back to the
> first solution, add the @lock there, and double check races with
> seq_buf().

You mean, provide printk_nmi_flush() but completely screw concurrency?
And basically reserve it for a last ditch effort for getting crap out?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/