lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151222144838.GA611@swordfish>
Date:	Tue, 22 Dec 2015 23:48:38 +0900
From:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Petr Mladek <pmladek@...e.cz>,
	KY Sri nivasan <kys@...rosoft.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	linux-kernel@...r.kernel.org,
	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Subject: Re: [PATCH 1/7] printk: Hand over printing to console if printing
 too long

On (12/22/15 14:47), Jan Kara wrote:
[..]
> Thanks for looking into my patches and sorry for replying with a delay. As
> I wrote in my previous email [1] even the referenced patches are not quite
> enough. Over last few days I have worked on redoing the stuff as we
> discussed with Linus and Andrew at Kernel Summit and I have new patches
> which are working fine for me. I still want to test them on some machines
> having real issues with udev during boot but so far stress-testing with
> serial console slowed down to ~1000 chars/sec on other machines and VMs
> looks promising.
> 
> I'm attaching them in case you want to have a look. They are on top of
> Tejun's patch adding cond_resched() (which is essential). I'll officially
> submit the patches once the testing is finished (but I'm not sure when I
> get to the problematic HW...).
> 

Hello,

Thanks a lot! Will take a look.

	-ss

> From 2e9675abbfc0df4a24a8c760c58e8150b9a31259 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@...e.cz>
> Date: Mon, 21 Dec 2015 13:10:31 +0100
> Subject: [PATCH 1/2] printk: Make printk() completely async
> 
> Currently, printk() sometimes waits for message to be printed to console
> and sometimes it does not (when console_sem is held by some other
> process). In case printk() grabs console_sem and starts printing to
> console, it prints messages from kernel printk buffer until the buffer
> is empty. When serial console is attached, printing is slow and thus
> other CPUs in the system have plenty of time to append new messages to
> the buffer while one CPU is printing. Thus the CPU can spend unbounded
> amount of time doing printing in console_unlock().  This is especially
> serious problem if the printk() calling console_unlock() was called with
> interrupts disabled.
> 
> In practice users have observed a CPU can spend tens of seconds printing
> in console_unlock() (usually during boot when hundreds of SCSI devices
> are discovered) resulting in RCU stalls (CPU doing printing doesn't
> reach quiescent state for a long time), softlockup reports (IPIs for the
> printing CPU don't get served and thus other CPUs are spinning waiting
> for the printing CPU to process IPIs), and eventually a machine death
> (as messages from stalls and lockups append to printk buffer faster than
> we are able to print). So these machines are unable to boot with serial
> console attached. Another observed issue is that due to slow printk,
> hardware discovery is slow and udev times out before kernel manages to
> discover all the attached HW. Also during artificial stress testing SATA
> disk disappears from the system because its interrupts aren't served for
> too long.
> 
> This patch makes printk() completely asynchronous (similar to what
> printk_deferred() did until now). It appends message to the kernel
> printk buffer and queues work to do the printing to console. This has
> the advantage that printing always happens from a schedulable contex and
> thus we don't lockup any particular CPU or even interrupts. Also it has
> the advantage that printk() is fast and thus kernel booting is not
> slowed down by slow serial console. Disadvantage of this method is that
> in case of crash there is higher chance that important messages won't
> appear in console output (we may need working scheduling to print
> message to console). We somewhat mitigate this risk by switching printk
> to the original method of immediate printing to console if oops is in
> progress.  Also for debugging purposes we provide printk.synchronous
> kernel parameter which resorts to the original printk behavior.
> 
> Signed-off-by: Jan Kara <jack@...e.cz>
> ---
>  Documentation/kernel-parameters.txt |  10 +++
>  kernel/printk/printk.c              | 144 +++++++++++++++++++++---------------
>  2 files changed, 95 insertions(+), 59 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 742f69d18fc8..4cf1bddeffc7 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3000,6 +3000,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  	printk.time=	Show timing data prefixed to each printk message line
>  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
>  
> +	printk.synchronous=
> +			By default kernel messages are printed to console
> +			asynchronously (except during early boot or when oops
> +			is happening). That avoids kernel stalling behind slow
> +			serial console and thus avoids softlockups, interrupt
> +			timeouts, or userspace timing out during heavy printing.
> +			However for debugging problems, printing messages to
> +			console immediately may be desirable. This option
> +			enables such behavior.
> +
>  	processor.max_cstate=	[HW,ACPI]
>  			Limit processor to maximum C-state
>  			max_cstate=9 overrides any DMI blacklist limit.
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 299c2f0e7350..d455d1bd0d2c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -283,6 +283,73 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
>  static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
>  
> +/*
> + * When true, printing to console will happen synchronously unless someone else
> + * is already printing messages.
> + */
> +static bool __read_mostly printk_sync;
> +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
> +
> +#define PRINTK_PENDING_WAKEUP	0x01
> +#define PRINTK_PENDING_OUTPUT	0x02
> +
> +static DEFINE_PER_CPU(int, printk_pending);
> +
> +static void printing_work_func(struct work_struct *work)
> +{
> +	console_lock();
> +	console_unlock();
> +}
> +
> +static DECLARE_WORK(printing_work, printing_work_func);
> +
> +static void wake_up_klogd_work_func(struct irq_work *irq_work)
> +{
> +	int pending = __this_cpu_xchg(printk_pending, 0);
> +
> +	/*
> +	 * We just schedule regular work to do the printing from irq work. We
> +	 * don't want to do printing here directly as that happens with
> +	 * interrupts disabled and thus is bad for interrupt latency. We also
> +	 * don't want to queue regular work from vprintk_emit() as that gets
> +	 * called in various difficult contexts where schedule_work() could
> +	 * deadlock.
> +	 */
> +	if (pending & PRINTK_PENDING_OUTPUT)
> +		schedule_work(&printing_work);
> +
> +	if (pending & PRINTK_PENDING_WAKEUP)
> +		wake_up_interruptible(&log_wait);
> +}
> +
> +static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
> +	.func = wake_up_klogd_work_func,
> +	.flags = IRQ_WORK_LAZY,
> +};
> +
> +void wake_up_klogd(void)
> +{
> +	preempt_disable();
> +	if (waitqueue_active(&log_wait)) {
> +		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	}
> +	preempt_enable();
> +}
> +
> +int printk_deferred(const char *fmt, ...)
> +{
> +	va_list args;
> +	int r;
> +
> +	va_start(args, fmt);
> +	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> +	va_end(args);
> +
> +	return r;
> +}
> +
>  /* Return log buffer address */
>  char *log_buf_addr_get(void)
>  {
> @@ -1668,15 +1735,14 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	unsigned long flags;
>  	int this_cpu;
>  	int printed_len = 0;
> -	bool in_sched = false;
> +	bool sync_print = printk_sync;
>  	/* cpu currently holding logbuf_lock in this function */
>  	static unsigned int logbuf_cpu = UINT_MAX;
>  
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
> -		in_sched = true;
> +		sync_print = false;
>  	}
> -
>  	boot_delay_msec(level);
>  	printk_delay();
>  
> @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when oops is in progress, or when synchronous printing was
> +	 * explicitely requested by kernel parameter.
> +	 */
> +	if (keventd_up() && !oops_in_progress && !sync_print) {
> +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	} else
> +		sync_print = true;
>  	local_irq_restore(flags);
>  
> -	/* If called from the scheduler, we can not call up(). */
> -	if (!in_sched) {
> +	if (sync_print) {
>  		lockdep_off();
>  		/*
>  		 * Disable preemption to avoid being preempted while holding
> @@ -2688,60 +2768,6 @@ late_initcall(printk_late_init);
>  
>  #if defined CONFIG_PRINTK
>  /*
> - * Delayed printk version, for scheduler-internal messages:
> - */
> -#define PRINTK_PENDING_WAKEUP	0x01
> -#define PRINTK_PENDING_OUTPUT	0x02
> -
> -static DEFINE_PER_CPU(int, printk_pending);
> -
> -static void wake_up_klogd_work_func(struct irq_work *irq_work)
> -{
> -	int pending = __this_cpu_xchg(printk_pending, 0);
> -
> -	if (pending & PRINTK_PENDING_OUTPUT) {
> -		/* If trylock fails, someone else is doing the printing */
> -		if (console_trylock())
> -			console_unlock();
> -	}
> -
> -	if (pending & PRINTK_PENDING_WAKEUP)
> -		wake_up_interruptible(&log_wait);
> -}
> -
> -static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
> -	.func = wake_up_klogd_work_func,
> -	.flags = IRQ_WORK_LAZY,
> -};
> -
> -void wake_up_klogd(void)
> -{
> -	preempt_disable();
> -	if (waitqueue_active(&log_wait)) {
> -		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> -		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> -	}
> -	preempt_enable();
> -}
> -
> -int printk_deferred(const char *fmt, ...)
> -{
> -	va_list args;
> -	int r;
> -
> -	preempt_disable();
> -	va_start(args, fmt);
> -	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> -	va_end(args);
> -
> -	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> -	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> -	preempt_enable();
> -
> -	return r;
> -}
> -
> -/*
>   * printk rate limiting, lifted from the networking subsystem.
>   *
>   * This enforces a rate limit: not more than 10 kernel messages
> -- 
> 2.6.2
> 

> From be116ae18f15f0d2d05ddf0b53eaac184943d312 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@...e.cz>
> Date: Mon, 21 Dec 2015 14:26:13 +0100
> Subject: [PATCH 2/2] printk: Skip messages on oops
> 
> When there are too many messages in the kernel printk buffer it can take
> very long to print them to console (especially when using slow serial
> console). This is undesirable during oops so when we encounter oops and
> there are more than 100 messages to print, print just the newest 100
> messages and then the oops message.
> 
> Signed-off-by: Jan Kara <jack@...e.cz>
> ---
>  kernel/printk/printk.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d455d1bd0d2c..fc67ab70e9c7 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -262,6 +262,9 @@ static u64 console_seq;
>  static u32 console_idx;
>  static enum log_flags console_prev;
>  
> +/* current record sequence when oops happened */
> +static u64 oops_start_seq;
> +
>  /* the next printk record to read after the last 'clear' command */
>  static u64 clear_seq;
>  static u32 clear_idx;
> @@ -1783,6 +1786,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  					 NULL, 0, recursion_msg,
>  					 strlen(recursion_msg));
>  	}
> +	if (oops_in_progress && !sync_print && !oops_start_seq)
> +		oops_start_seq = log_next_seq;
>  
>  	/*
>  	 * The printf needs to come first; we need the syslog
> @@ -2292,6 +2297,12 @@ out:
>  	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
>  }
>  
> +/*
> + * When oops happens and there are more messages to be printed in the printk
> + * buffer that this, skip some mesages and print only this many newest messages.
> + */
> +#define PRINT_MSGS_BEFORE_OOPS 100
> +
>  /**
>   * console_unlock - unlock the console system
>   *
> @@ -2348,7 +2359,28 @@ again:
>  			seen_seq = log_next_seq;
>  		}
>  
> -		if (console_seq < log_first_seq) {
> +		/*
> +		 * If oops happened and there are more than
> +		 * PRINT_MSGS_BEFORE_OOPS messages pending before oops message,
> +		 * skip them to make oops appear faster.
> +		 */
> +		if (oops_start_seq &&
> +		    console_seq + PRINT_MSGS_BEFORE_OOPS < oops_start_seq) {
> +			len = sprintf(text,
> +				"** %u printk messages dropped due to oops ** ",
> +				(unsigned)(oops_start_seq - console_seq -
> +					   PRINT_MSGS_BEFORE_OOPS));
> +			if (console_seq < log_first_seq) {
> +				console_seq = log_first_seq;
> +				console_idx = log_first_idx;
> +			}
> +			while (console_seq <
> +			       oops_start_seq - PRINT_MSGS_BEFORE_OOPS) {
> +				console_idx = log_next(console_idx);
> +				console_seq++;
> +			}
> +			console_prev = 0;
> +		} else if (console_seq < log_first_seq) {
>  			len = sprintf(text, "** %u printk messages dropped ** ",
>  				      (unsigned)(log_first_seq - console_seq));
>  
> -- 
> 2.6.2
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ