linux-kernel - Re: [RFC][PATCH] printk: Fixup the nmi printk mess

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150611145547.GA3234@dhcp128.suse.cz>
Date:	Thu, 11 Jun 2015 16:55:47 +0200
From:	Petr Mladek <pmladek@...e.cz>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
	jkosina@...e.cz, paulmck@...ux.vnet.ibm.com,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RFC][PATCH] printk: Fixup the nmi printk mess

On Wed 2015-06-10 21:23:04, Peter Zijlstra wrote:
> Below a version which does x-cpu stuff to allow the
> trigger_all*_cpu_backtrace() initiator to flush buffers on behalf of
> other CPUs.
>
> Compile tested only.

The output from "echo l >/proc/sysrq-trigger" looks reasonable.
It does not mix output from different CPUs. This is expected
because of the @lock.

The other observation is that it prints CPUs in _random_ order:
28, 24, 25, 1, 26, 2, 27, 3, ...

The order is fine when I disable the irq_work.

It means that irq_works are usually faster than printk_nmi_flush() =>
printk_nmi_flush() is not that useful => all the complexity with
the three atomic variables (head, tail, read) did not bring
much win.

Anyway, I think that the current solution is racy and it cannot be fixed
easily, see below.


> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c099b082cd02..99bfc1e3f32a 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1821,13 +1821,200 @@ int vprintk_default(const char *fmt, va_list args)
> +static void __printk_nmi_flush(struct irq_work *work)
> +{
> +	static raw_spinlock_t lock = __RAW_SPIN_LOCK_INITIALIZER(lock);
> +	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
> +	int len, head, size, i, last_i;
> +
> +again:
> +	/*
> +	 * vprintk_nmi()	truncate
> +	 *
> +	 * [S] head		[S] head
> +	 *     wmb		mb
> +	 * [S] tail		[S] read

BTW, this is quite cryptic for me. Coffee did not help :-)

	 *
> +	 * therefore:
> +	 */
> +	i = atomic_read(&s->read);
> +	len = atomic_read(&s->tail); /* up to the tail is stable */
> +	smp_rmb();
> +	head = atomic_read(&s->head);
> +
> +	/*
> +	 * We cannot truncate tail because it could overwrite a store from
> +	 * vprintk_nmi(), however vprintk_nmi() will always update tail to the
> +	 * correct value.
> +	 *
> +	 * Therefore if head < tail, we missed a truncate and should do so now.
> +	 */
> +	if (head < len)
> +		len = 0;

This is a bit confusing. It is a complicated way how to return on the next test.

If I get this correctly. This might happen only inside
_printk_nmi_flush() called on another CPU (from
printk_nmi_flush()) when it interferes with the queued
irq_work. The irq_work is faster and truncates the buffer.

So, the return is fine after all because the irq_work printed
everything.


> +	if (len - i <= 0) /* nothing to do */
> +		return;
> +	/*
> +	 * 'Consume' this chunk, avoids concurrent callers printing the same
> +	 * stuff.
> +	 */
> +	if (atomic_cmpxchg(&s->read, i, len) != i)
> +		goto again;

I think that this is racy:

CPU0					CPU7

printk_nmi_flush()

  __printk_nmi_flush(for CPU7)

    i = atomic_read(&s->read);     (100)
    len = atomic_read(&s->tail);   (200)
    head = atomic_read(&s->head);  (200)

    if (atomic_cmpxchg(&s->read, i, len) != i)

    we pass but we get interrupted
    or rescheduled on preemptive kernel

					another vprintk_nmi()
					leaves: head(400), tail(400)

					__printk_nmi_flush() in irq_work

					it prints string between 200-400
					truncate buffer: head(0), read(0)

					another vprintk_nmi()
					returns: head(150), tail(150)

    print string between (100-200) =>
    part of the new and part of old message
    and modifies @head and @read a wrong way

I think that such races are hard to avoid without indexing the printed
messages. But it would make the approach too complicated.

I think that ordering CPUs is not worth it. I would go back to the
first solution, add the @lock there, and double check races with
seq_buf().

I stop here with commenting the code for now.

Best Regards,
Petr

PS: I had two cups of coffee and hope that my comments are smaller fiasco
than yesterday.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/