lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150610143155.GD9409@pathway.suse.cz>
Date:	Wed, 10 Jun 2015 16:31:55 +0200
From:	Petr Mladek <pmladek@...e.cz>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
	jkosina@...e.cz, paulmck@...ux.vnet.ibm.com,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RFC][PATCH] printk: Fixup the nmi printk mess

On Wed 2015-06-10 14:55:09, Peter Zijlstra wrote:
> Hi,
> 
> I just stumbled upon:
> 
> 	a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs")
> 
> Which is not dissimilar to what I've proposed in the past. Except its
> squirreled away in some far and dark corner of one arch.
> 
> Lets fix that.

It is very interesting approach. But I see two possible races, see below.

> diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> index dfd59d6bc6f0..a477e0766d2e 100644
> --- a/include/linux/hardirq.h
> +++ b/include/linux/hardirq.h
> @@ -67,10 +67,12 @@ extern void irq_exit(void);
>  		preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
>  		rcu_nmi_enter();				\
>  		trace_hardirq_enter();				\
> +		printk_nmi_enter();				\
>  	} while (0)
>  
>  #define nmi_exit()						\
>  	do {							\
> +		printk_nmi_exit();				\
>  		trace_hardirq_exit();				\
>  		rcu_nmi_exit();					\
>  		BUG_ON(!in_nmi());				\
> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
> index caebf2a758dc..04c68b9f56f8 100644
> --- a/include/linux/percpu.h
> +++ b/include/linux/percpu.h
> @@ -135,7 +135,4 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
>  	(typeof(type) __percpu *)__alloc_percpu(sizeof(type),		\
>  						__alignof__(type))
>  
> -/* To avoid include hell, as printk can not declare this, we declare it here */
> -DECLARE_PER_CPU(printk_func_t, printk_func);
> -
>  #endif /* __LINUX_PERCPU_H */
> diff --git a/init/main.c b/init/main.c
> index 2115055faeac..4a28accaaa98 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -586,6 +586,7 @@ asmlinkage __visible void __init start_kernel(void)
>  	timekeeping_init();
>  	time_init();
>  	sched_clock_postinit();
> +	printk_init();
>  	perf_event_init();
>  	profile_init();
>  	call_function_init();
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c099b082cd02..0149c41be95b 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1821,13 +1821,125 @@ int vprintk_default(const char *fmt, va_list args)
>  }
>  EXPORT_SYMBOL_GPL(vprintk_default);
>  
> +#ifdef CONFIG_PRINTK_NMI
> +
> +typedef int(*printk_func_t)(const char *fmt, va_list args);
>  /*
>   * This allows printk to be diverted to another function per cpu.
>   * This is useful for calling printk functions from within NMI
>   * without worrying about race conditions that can lock up the
>   * box.
>   */
> -DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
> +static DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
> +
> +#include <linux/seq_buf.h>
> +
> +struct nmi_seq_buf {
> +	struct seq_buf		seq;
> +	struct irq_work		work;
> +	unsigned char		buffer[PAGE_SIZE -
> +					sizeof(struct seq_buf) -
> +					sizeof(struct irq_work)];
> +};
> +
> +/* Safe printing in NMI context */
> +static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
> +
> +static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
> +{
> +	const char *buf = s->buffer + start;
> +
> +	printk("%.*s", (end - start) + 1, buf);
> +}
> +
> +static void __printk_nmi_flush(struct irq_work *work)
> +{
> +	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
> +	int len, last_i = 0, i = 0;
> +
> +again:
> +	len = seq_buf_used(&s->seq);
> +	if (!len)
> +		return;
> +
> +	/* Print line by line. */
> +	for (; i < len; i++) {
> +		if (s->buffer[i] == '\n') {
> +			print_seq_line(s, last_i, i);
> +			last_i = i + 1;
> +		}
> +	}
> +	/* Check if there was a partial line. */
> +	if (last_i < len) {
> +		print_seq_line(s, last_i, len - 1);
> +		pr_cont("\n");
> +	}
> +
> +	/*
> +	 * Another NMI could have come in while we were printing
> +	 * check if nothing has been added to the buffer.
> +	 */
> +	if (cmpxchg_local(&s->seq.len, len, 0) != len)
> +		goto again;

If another NMI comes at this point, it will start filling the buffer
from the beginning. If it is fast enough, it might override the text
that we print above.

Note that it is easy to be faster because the normal printk() is more
complex and might wait for console.

Well, it is not that critical. It will just print a mess.

> +}
> +
> +void printk_init(void)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		struct nmi_seq_buf *s = &per_cpu(nmi_print_seq, cpu);
> +
> +		init_irq_work(&s->work, __printk_nmi_flush);
> +		seq_buf_init(&s->seq, s->buffer, sizeof(s->buffer));
> +	}
> +}
> +
> +/*
> + * It is not safe to call printk() directly from NMI handlers.
> + * It may be fine if the NMI detected a lock up and we have no choice
> + * but to do so, but doing a NMI on all other CPUs to get a back trace
> + * can be done with a sysrq-l. We don't want that to lock up, which
> + * can happen if the NMI interrupts a printk in progress.
> + *
> + * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
> + * the content into a per cpu seq_buf buffer. Then when the NMIs are
> + * all done, we can safely dump the contents of the seq_buf to a printk()
> + * from a non NMI context.
> + */
> +static int vprintk_nmi(const char *fmt, va_list args)
> +{
> +	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
> +	unsigned int len = seq_buf_used(&s->seq);
> +
> +	irq_work_queue(&s->work);
> +	seq_buf_vprintf(&s->seq, fmt, args);

This is more critical. seq_buf_vprintf() has the following code:

	len = vsnprintf(s->buffer + s->len, s->size - s->len, fmt, args);

The race might look like:

CPU0					CPU1

					__printk_nmi_flush()
					...

seq_buf_vprintf()

   s->buffer + s->len

					cmpxchg_local(&s->seq.len,
					len, 0)

   s->size - s->len

=> that seq_buf_vprint() will start writing to the end of the seq_buf
  but it will think that the entire writer is available and it might
  cause buffer overflow


Well, it is rather theoretical. s->len will be most likely read only
once (compiler optimization). Anyway, we must make make sure that
all the used seq_buf operations are save against such races.
It is not obvious and might be prone to regressions.

To be honest, I am not familiar with cmpxchg_local(). But I think that
it can't do much difference. The value has to be synced to the other
CPUs one day.


Note that Steven's solution is safe because the prints are serialized via
@backtrace_flag in arch_trigger_all_cpu_backtrace().


Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ