linux-kernel - Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160315155849.GW10940@pathway.suse.cz>
Date:	Tue, 15 Mar 2016 16:58:49 +0100
From:	Petr Mladek <pmladek@...e.com>
To:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Jan Kara <jack@...e.com>, Tejun Heo <tj@...nel.org>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	linux-kernel@...r.kernel.org,
	Byungchul Park <byungchul.park@....com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
	Jan Kara <jack@...e.cz>
Subject: Re: [RFC][PATCH v4 1/2] printk: Make printk() completely async

On Mon 2016-03-14 23:13:39, Sergey Senozhatsky wrote:
> From: Jan Kara <jack@...e.cz>
> 
> Currently, printk() sometimes waits for message to be printed to console
> and sometimes it does not (when console_sem is held by some other
> process). In case printk() grabs console_sem and starts printing to
> console, it prints messages from kernel printk buffer until the buffer
> is empty. When serial console is attached, printing is slow and thus
> other CPUs in the system have plenty of time to append new messages to
> the buffer while one CPU is printing. Thus the CPU can spend unbounded
> amount of time doing printing in console_unlock().  This is especially
> serious problem if the printk() calling console_unlock() was called with
> interrupts disabled.
> 
> In practice users have observed a CPU can spend tens of seconds printing
> in console_unlock() (usually during boot when hundreds of SCSI devices
> are discovered) resulting in RCU stalls (CPU doing printing doesn't
> reach quiescent state for a long time), softlockup reports (IPIs for the
> printing CPU don't get served and thus other CPUs are spinning waiting
> for the printing CPU to process IPIs), and eventually a machine death
> (as messages from stalls and lockups append to printk buffer faster than
> we are able to print). So these machines are unable to boot with serial
> console attached. Another observed issue is that due to slow printk,
> hardware discovery is slow and udev times out before kernel manages to
> discover all the attached HW. Also during artificial stress testing SATA
> disk disappears from the system because its interrupts aren't served for
> too long.
> 
> This patch makes printk() completely asynchronous (similar to what
> printk_deferred() did until now). It appends message to the kernel
> printk buffer and wake_up()s a special dedicated kthread to do the
> printing to console. This has the advantage that printing always happens
> from a schedulable contex and thus we don't lockup any particular CPU or
> even interrupts. Also it has the advantage that printk() is fast and
> thus kernel booting is not slowed down by slow serial console.
> Disadvantage of this method is that in case of crash there is higher
> chance that important messages won't appear in console output (we may
> need working scheduling to print message to console). We somewhat
> mitigate this risk by switching printk to the original method of
> immediate printing to console if oops is in progress.  Also for
> debugging purposes we provide printk.synchronous kernel parameter which
> resorts to the original printk behavior.
> 
> printk() is expected to work under different conditions and in different
> scenarios, including corner cases of OOM when all of the workers are busy
> (e.g. allocating memory), thus printk() uses its own dedicated printing
> kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit
> set we potentially can receive delays in printing until workqueue
> declares a ->mayday, as noted by Tetsuo Handa). Another thing to mention,
> is that deferred printk() messages may appear before printing kthread
> created, so in the very beginning we have to print deferred messages
> the old way -- via IRQs.
> 
> Signed-off-by: Jan Kara <jack@...e.cz>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@...il.com>
> ---
>  Documentation/kernel-parameters.txt |  10 ++
>  kernel/printk/printk.c              | 199 +++++++++++++++++++++++++-----------
>  2 files changed, 151 insertions(+), 58 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 1e58ae9..454999e 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3114,6 +3114,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  	printk.time=	Show timing data prefixed to each printk message line
>  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
>  
> +	printk.synchronous=
> +			By default kernel messages are printed to console
> +			asynchronously (except during early boot or when oops
> +			is happening). That avoids kernel stalling behind slow
> +			serial console and thus avoids softlockups, interrupt
> +			timeouts, or userspace timing out during heavy printing.
> +			However for debugging problems, printing messages to
> +			console immediately may be desirable. This option
> +			enables such behavior.
> +
>  	processor.max_cstate=	[HW,ACPI]
>  			Limit processor to maximum C-state
>  			max_cstate=9 overrides any DMI blacklist limit.
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d5fd844..3c4f8b7 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -46,6 +46,7 @@
>  #include <linux/utsname.h>
>  #include <linux/ctype.h>
>  #include <linux/uio.h>
> +#include <linux/kthread.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm-generic/sections.h>
> @@ -284,6 +285,102 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
>  static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
>  
> +/*
> + * When true, printing to console will happen synchronously unless someone else
> + * is already printing messages.
> + *
> + * The default value on UP systems is 'true'.
> + */
> +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
> +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);

If we make it writtable, we also need to handle the situation that
it gets disabled at runtime. It means to make sure that the kthread
will be running event printk_sync was set during the boot.

> +MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
> +
> +/* Printing kthread for async vprintk_emit() */
> +static struct task_struct *printk_thread;
> +/* Wait for printing wakeups from async vprintk_emit() */
> +static DECLARE_WAIT_QUEUE_HEAD(printing_wait);
> +
> +static int printing_func(void *data)
> +{
> +	while (1) {
> +		DECLARE_WAITQUEUE(wait, current);
> +		set_current_state(TASK_INTERRUPTIBLE);
> +		add_wait_queue(&printing_wait, &wait);
> +
> +		schedule();
> +		remove_wait_queue(&printing_wait, &wait);

This whole thing looks strange. But it is also possible
that I do not have enough knowledge.

First, I think that this is racy. IMHO, we should do some
check before calling schedule(). We should not sleep if
anyone woken us before we got registered into the waitqueue.

Second, I wonder what was the motivation to use the waitqueue.
One advantage is that we do not need to check if the process
exists but then we also do not catch problems. Otherwise,
it looks too complicated. IMHO, it might be much easier
to just sleep here and call wake_up_process.

What about this?

int need_flush_console;


	int work;

	while(1) {
		set_current_state(TASK_INTERRUPTIBLE);
		if (!need_flush_console)
			schedule();
		__set_current_state(TASK_RUNNING);

		need_flush_console =  false;

> +		console_lock();
> +		console_unlock();
> +	}

Also I wonder if we need some special handling of the system freezing
but I do not thing so.

> +
> +	return 0;
> +}
> +
> +static int __init init_printk_thread(void)
> +{
> +	if (printk_sync)
> +		return 0;
> +
> +	printk_thread = kthread_run(printing_func, NULL, "printk");
> +	BUG_ON(IS_ERR(printk_thread));

I would prefer to force the synchronous mode instead.

> +	return 0;
> +}
> +late_initcall(init_printk_thread);
> +
> +/*
> + * Delayed printk version, for scheduler-internal messages:

This is not longer related to sheduler only.

BTW: I suggest to move this whole section in a separate patch.
It will be more clear what has changed for the async printk
and what stays for the deferred printk.

> + */
> +#define PRINTK_PENDING_WAKEUP	(1<<0)
> +#define PRINTK_PENDING_IRQ_OUTPUT	(1<<1)
> +#define PRINTK_PENDING_KTHREAD_OUTPUT	(1<<2)

Sigh, I am getting a bit lost in these names. Either
they deserve a comment or better renaming.

I would make it clear what is related to console, klogd,
and deferred synchronous printk.

Well, I wonder if we need to differ the defferred
printk and KTHREAD OUTPUT by the flag. We could
check the printk_sync global flag here.

What about?

#define PRINTK_PENDING_KLOGD_WAKEUP	(1<<0)
#define PRINTK_PENDING_CONSOLE_OUTPUT	(1<<2)


> +static DEFINE_PER_CPU(int, printk_pending);
> +
> +static void wake_up_klogd_work_func(struct irq_work *irq_work)
> +{
> +	int pending = __this_cpu_xchg(printk_pending, 0);
> +
> +	if (pending & PRINTK_PENDING_KTHREAD_OUTPUT) 
> +		wake_up(&printing_wait);
> +
> +	if (pending & PRINTK_PENDING_IRQ_OUTPUT) {
> +		/* If trylock fails, someone else is doing the printing */
> +		if (console_trylock())
> +			console_unlock();
> +	}
> +
> +	if (pending & PRINTK_PENDING_WAKEUP)
> +		wake_up_interruptible(&log_wait);

then we could use:

	if (pending & PRINTK_PENDING_CONSOLE_OUTPUT) {
		if (printk_sync || !printk_kthread) {
			/* If trylock fails, someone else is doing the printing */
			if (console_trylock())
				console_unlock();
		} else {
			wake_up_process(printk_kthread);
		}

	if (pending & PRINTK_PENDING_KLOGD_WAKEUP)
		wake_up_interruptible(&log_wait);


> +}
> +
> +static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
> +	.func = wake_up_klogd_work_func,
> +	.flags = IRQ_WORK_LAZY,
> +};
> +
> +void wake_up_klogd(void)
> +{
> +	preempt_disable();
> +	if (waitqueue_active(&log_wait)) {
> +		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	}
> +	preempt_enable();
> +}
> +
> +int printk_deferred(const char *fmt, ...)
> +{
> +	va_list args;
> +	int r;
> +
> +	va_start(args, fmt);
> +	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> +	va_end(args);
> +
> +	return r;
> +}
> +
>  /* Return log buffer address */
>  char *log_buf_addr_get(void)
>  {
> @@ -1609,6 +1706,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  			    const char *dict, size_t dictlen,
>  			    const char *fmt, va_list args)
>  {
> +	/* cpu currently holding logbuf_lock in this function */
> +	static unsigned int logbuf_cpu = UINT_MAX;
>  	static bool recursion_bug;
>  	static char textbuf[LOG_LINE_MAX];
>  	char *text = textbuf;
> @@ -1618,12 +1717,17 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	int this_cpu;
>  	int printed_len = 0;
>  	bool in_sched = false;
> -	/* cpu currently holding logbuf_lock in this function */
> -	static unsigned int logbuf_cpu = UINT_MAX;
> +	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
> +	bool sync_print = printk_sync;

I would force the global printk_sync if we are in_panic

	if (in_panic)
		printk_sync = true;
 
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
> +		/*
> +		 * Deferred sched messages must not be printed
> +		 * synchronously regardless the @printk_sync or @in_panic.
> +		 */
>  		in_sched = true;
> +		sync_print = false;
>  	}
>  
>  	boot_delay_msec(level);
> @@ -1752,10 +1856,43 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when synchronous printing was explicitly requested by
> +	 * kernel parameter, or when console_verbose() was called to print
> +	 * everything during panic / oops.
> +	 */
> +	if (!sync_print) {
> +		if (printk_thread && !in_panic) {
> +			/*
> +			 * This will wakeup the printing kthread and offload
> +			 * printing to a schedulable context.
> +			 */
> +			__this_cpu_or(printk_pending,
> +					PRINTK_PENDING_KTHREAD_OUTPUT);
> +			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +		} else if (in_sched) {
> +			/*
> +			 * @in_sched messages may come too early, when we don't
> +			 * yet have @printk_thread. We can't print deferred
> +			 * messages directly, because this may deadlock, route
> +			 * them via IRQ context.
> +			 */
> +			__this_cpu_or(printk_pending,
> +					PRINTK_PENDING_IRQ_OUTPUT);
> +			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
>  
> -	/* If called from the scheduler, we can not call up(). */
> -	if (!in_sched) {
> +	if (sync_print) {
>  		lockdep_off();

I wonder if it might be much easier with If we used only the two
PRINTK_PENDING flags and force global printk_sync when in panic.

	if (in_sched || (printk_kthread && !printk_sync)) {
		__this_cpu_or(printk_pending,
			PRINTK_PENDING_CONSOLE_OUTPUT);
			irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
	} else {
>  		/*
>  		 * Try to acquire and then immediately release the console

Sigh, it would be great to rename also wake_up_klogd_work and
wake_up_klogd_work_func(). They are not only about klogd.
Well, this should be separate patch as well because it
was even before.

I still to thing about possible races. Especially, when checking
printk_kthread and printk_sync.

I hope that some of the above suggestions makes sense. vprintk_emit()
is crazy already now. I feel motivated to do not make it worse ;-)

Best Regards,
Petr