lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160331111229.GB1023@pathway.suse.cz>
Date:	Thu, 31 Mar 2016 13:12:29 +0200
From:	Petr Mladek <pmladek@...e.com>
To:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Jan Kara <jack@...e.com>, Tejun Heo <tj@...nel.org>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	linux-kernel@...r.kernel.org,
	Byungchul Park <byungchul.park@....com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
	Jan Kara <jack@...e.cz>
Subject: Re: [RFC][PATCH v8 1/2] printk: Make printk() completely async

On Fri 2016-03-25 00:43:22, Sergey Senozhatsky wrote:
> From: Jan Kara <jack@...e.cz>
> 
> Currently, printk() sometimes waits for message to be printed to console
> and sometimes it does not (when console_sem is held by some other
> process). In case printk() grabs console_sem and starts printing to
> console, it prints messages from kernel printk buffer until the buffer
> is empty. When serial console is attached, printing is slow and thus
> other CPUs in the system have plenty of time to append new messages to
> the buffer while one CPU is printing. Thus the CPU can spend unbounded
> amount of time doing printing in console_unlock().  This is especially
> serious problem if the printk() calling console_unlock() was called with
> interrupts disabled.
> 
> In practice users have observed a CPU can spend tens of seconds printing
> in console_unlock() (usually during boot when hundreds of SCSI devices
> are discovered) resulting in RCU stalls (CPU doing printing doesn't
> reach quiescent state for a long time), softlockup reports (IPIs for the
> printing CPU don't get served and thus other CPUs are spinning waiting
> for the printing CPU to process IPIs), and eventually a machine death
> (as messages from stalls and lockups append to printk buffer faster than
> we are able to print). So these machines are unable to boot with serial
> console attached. Another observed issue is that due to slow printk,
> hardware discovery is slow and udev times out before kernel manages to
> discover all the attached HW. Also during artificial stress testing SATA
> disk disappears from the system because its interrupts aren't served for
> too long.
> 
> This patch makes printk() completely asynchronous (similar to what
> printk_deferred() did until now). It appends message to the kernel
> printk buffer and wake_up()s a special dedicated kthread to do the
> printing to console. This has the advantage that printing always happens
> from a schedulable contex and thus we don't lockup any particular CPU or
> even interrupts. Also it has the advantage that printk() is fast and
> thus kernel booting is not slowed down by slow serial console.
> Disadvantage of this method is that in case of crash there is higher
> chance that important messages won't appear in console output (we may
> need working scheduling to print message to console). We somewhat
> mitigate this risk by switching printk to the original method of
> immediate printing to console if oops is in progress.  Also for
> debugging purposes we provide printk.synchronous kernel parameter which
> resorts to the original printk behavior.
> 
> printk() is expected to work under different conditions and in different
> scenarios, including corner cases of OOM when all of the workers are busy
> (e.g. allocating memory), thus printk() uses its own dedicated printing
> kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit
> set we potentially can receive delays in printing until workqueue
> declares a ->mayday, as noted by Tetsuo Handa).
> 
> Signed-off-by: Jan Kara <jack@...e.cz>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@...il.com>
> ---
>  Documentation/kernel-parameters.txt | 10 ++++
>  kernel/printk/printk.c              | 91 ++++++++++++++++++++++++++++++++++---
>  2 files changed, 94 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index ecc74fa..4745e94 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3114,6 +3114,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  	printk.time=	Show timing data prefixed to each printk message line
>  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
>  
> +	printk.synchronous=
> +			By default kernel messages are printed to console
> +			asynchronously (except during early boot or when oops
> +			is happening). That avoids kernel stalling behind slow
> +			serial console and thus avoids softlockups, interrupt
> +			timeouts, or userspace timing out during heavy printing.
> +			However for debugging problems, printing messages to
> +			console immediately may be desirable. This option
> +			enables such behavior.
> +
>  	processor.max_cstate=	[HW,ACPI]
>  			Limit processor to maximum C-state
>  			max_cstate=9 overrides any DMI blacklist limit.
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index bfbf284..a984c5a 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -46,6 +46,7 @@
>  #include <linux/utsname.h>
>  #include <linux/ctype.h>
>  #include <linux/uio.h>
> +#include <linux/kthread.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm-generic/sections.h>
> @@ -284,6 +285,19 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
>  static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
>  
> +/*
> + * When true, printing to console will happen synchronously.
> + * The default value on UP systems is 'true'.
> + */
> +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
> +module_param_named(synchronous, printk_sync, bool, S_IRUGO);
> +MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
> +
> +/* Printing kthread for async printk */
> +static struct task_struct *printk_kthread;
> +/* When `true' - printing thread has messages to print */
> +static bool need_flush_console;
> +
>  /* Return log buffer address */
>  char *log_buf_addr_get(void)
>  {
> @@ -1608,6 +1622,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  			    const char *dict, size_t dictlen,
>  			    const char *fmt, va_list args)
>  {
> +	/* cpu currently holding logbuf_lock in this function */
> +	static unsigned int logbuf_cpu = UINT_MAX;
>  	static bool recursion_bug;
>  	static char textbuf[LOG_LINE_MAX];
>  	char *text = textbuf;
> @@ -1617,8 +1633,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	int this_cpu;
>  	int printed_len = 0;
>  	bool in_sched = false;
> -	/* cpu currently holding logbuf_lock in this function */
> -	static unsigned int logbuf_cpu = UINT_MAX;
> +	bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
>  
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
> @@ -1655,6 +1670,14 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	raw_spin_lock(&logbuf_lock);
>  	logbuf_cpu = this_cpu;
>  
> +	/*
> +	 * Set printing kthread sleep condition early, under the
> +	 * logbuf_lock, so it (if RUNNING) will go to console_lock()
> +	 * and spin on logbuf_lock.
> +	 */
> +	if (!in_panic && printk_kthread && !need_flush_console)
> +		need_flush_console = true;

I would remove the if-statement and always set it:

  + It does not matter if we set it in panic. It will not affect
    anything.

  + The check for printk_kthread is racy. It might be false here
    and it might be true later when check whether to wakeup
    the kthread or try to get console directly.

  + We might set it true even when it was true before.

I think that this was an attempt to avoid a spurious wake up.
But we solve it better way now.


>  	if (unlikely(recursion_bug)) {
>  		static const char recursion_msg[] =
>  			"BUG: recent printk recursion!";
> @@ -1757,12 +1780,28 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	if (!in_sched) {
>  		lockdep_off();
>  		/*
> -		 * Try to acquire and then immediately release the console
> -		 * semaphore.  The release will print out buffers and wake up
> -		 * /dev/kmsg and syslog() users.
> +		 * By default we print message to console asynchronously so
> +		 * that kernel doesn't get stalled due to slow serial console.
> +		 * That can lead to softlockups, lost interrupts, or userspace
> +		 * timing out under heavy printing load.
> +		 *
> +		 * However we resort to synchronous printing of messages during
> +		 * early boot, when synchronous printing was explicitly
> +		 * requested by kernel parameter, or when console_verbose() was
> +		 * called to print everything during panic / oops.
>  		 */
> -		if (console_trylock())
> -			console_unlock();
> +		if (!in_panic && printk_kthread) {
> +			/* Offload printing to a schedulable context. */
> +			wake_up_process(printk_kthread);
> +		} else {
> +			/*
> +			 * Try to acquire and then immediately release the
> +			 * console semaphore.  The release will print out
> +			 * buffers and wake up /dev/kmsg and syslog() users.
> +			 */
> +			if (console_trylock())
> +				console_unlock();
> +		}
>  		lockdep_on();
>  	}
>  
> @@ -1899,6 +1938,8 @@ static size_t cont_print_text(char *text, size_t size) { return 0; }
>  /* Still needs to be defined for users */
>  DEFINE_PER_CPU(printk_func_t, printk_func);
>  
> +static struct task_struct *printk_kthread;
> +static bool need_flush_console;
>  #endif /* CONFIG_PRINTK */
>  
>  #ifdef CONFIG_EARLY_PRINTK
> @@ -2337,6 +2378,8 @@ skip:
>  	 */
>  	raw_spin_lock(&logbuf_lock);
>  	retry = console_seq != log_next_seq;
> +	if (!retry && printk_kthread)
> +		need_flush_console = false;

Similar here. I remove the if-statement and always set it. We will
either retry or it should be false anyway.

>  	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
>  
>  	if (retry && console_trylock())
> @@ -2722,6 +2765,40 @@ static int __init printk_late_init(void)
>  late_initcall(printk_late_init);
>  
>  #if defined CONFIG_PRINTK
> +static int printk_kthread_func(void *data)
> +{
> +	while (1) {
> +		set_current_state(TASK_INTERRUPTIBLE);
> +		if (!need_flush_console)
> +			schedule();
> +
> +		__set_current_state(TASK_RUNNING);


We still must do here:

		need_flush_console = false;

Othrerwise, we might start "busy" cycling. cosole_unlock()
sometimes returns earlly, e.g. when console_suspended is set
or !can_use_console() returns true.

Sigh, the handling of "need_flush_console" is a bit strange.
Part of the logic depends on logbuf_lock and the other part
must be lockless.

I think that this lockless assigment deserve a comment, e.g.

		/*
		 * Avoid an infinite loop when console_unlock() cannot
		 * access consoles, e.g. because of a suspend. We
		 * could get asleep here. Someone else will call
		 * consoles if conditions change.
		 */

Also another name might help. If we set it false here, the value
does describe a global state. The variable describes if this
kthread needs to flush the console. So, more precise name would be

	printk_kthread_need_flush_console


I think that we are close. I really like the current state of
the patch and how minimalistic it is.


Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ