lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130212122245.GB19583@quack.suse.cz>
Date:	Tue, 12 Feb 2013 13:22:45 +0100
From:	Jan Kara <jack@...e.cz>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jan Kara <jack@...e.cz>, jslaby@...e.cz,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	"kay.sievers" <kay.sievers@...y.org>
Subject: Re: [RFC][PATCH] printk: Remove separate printk_sched buffers and
 use printk buf instead

On Thu 07-02-13 00:02:20, Jan Kara wrote:
> On Tue 05-02-13 20:05:48, Steven Rostedt wrote:
> > [ I sent this in a reply to another thread, but wanted a bit more attention to it ]
> > 
> > To prevent deadlocks with doing a printk inside the scheduler,
> > printk_sched() was created. The issue is that printk has a console_sem
> > that it can grab and release. The release does a wake up if there's a
> > task pending on the sem, and this wake up grabs the rq locks that is
> > held in the scheduler. This leads to a possible deadlock if the wake up
> > uses the same rq as the one with the rq lock held already.
> > 
> > What printk_sched() does is to save the printk write in a per cpu buffer
> > and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
> > set, the printk() is done against the buffer.
> > 
> > There's a couple of issues with this approach.
> > 
> > 1) If two printk_sched()s are called before the tick, the second one
> > will overwrite the first one.
> > 
> > 2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
> > bit of space wasted for something that is seldom used.
> > 
> > In order to remove this, the printk_sched() can instead use the printk
> > buffer instead, and delay the console_trylock()/console_unlock() to the
> > tick.
> > 
> > Because printk_sched() would then be taking the logbuf_lock, the
> > logbuf_lock must not be held while doing anything that may call into the
> > scheduler functions, which includes wake ups. Unfortunately, printk()
> > also has a console_sem that it uses, and on release, the
> > up(&console_sem) may do a wake up of any pending waiters. This must be
> > avoided while holding the logbuf_lock.
> > 
> > Luckily, there's not many places that do the unlock, or hold the
> > logbuf_lock. By moving things around a little, the console_sem can be
> > released without ever holding the logbuf_lock, and we can safely have
> > printk_sched() use the printk buffer directly.
>   So after quite some experiments and some hair tearing I have a patch that
> uses PRINTK_PENDING_OUTPUT and makes the machine survive my heavy-printk
> test. The first patch I attach is actually a small improvement of your
> patch which I think can be folded in it. I was also wondering whether we
> still need printk_needs_cpu(). I left it in since I don't know about a
> better way of keeping at least one CPU ticking. But maybe others do?
> 
> The second patch then makes use of PRINTK_PENDING_OUTPUT to handle the
> printing when console_unlock() would take too long. If you wonder whether
> the last_printing_cpu in printk_tick() is necessary - it is... Without it
> we keep printing on one CPU and the machine complains, looses drives,
> etc... (I guess I should add this comment somewhere to the code).
> 
> Anyway, what do you guys think about this version?
  Steven, Andrew, did you have a chance to look at my patches?

								Honza

> From bda93cfa4133c41e8f10dcd371d1f025d94b85dc Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@...e.cz>
> Date: Wed, 6 Feb 2013 19:15:30 +0100
> Subject: [PATCH 2/3] printk: Remove per-cpu printk flags
> 
> There's no need to have printk flags percpu. Just make a single variable
> operated by atomic operations from it. It also has an advantage that any
> cpu can do the printing / wakeup work.
> 
> Signed-off-by: Jan Kara <jack@...e.cz>
> ---
>  kernel/printk.c |   28 ++++++++++++----------------
>  1 files changed, 12 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/printk.c b/kernel/printk.c
> index ed630fa..2fe917d 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -1986,28 +1986,24 @@ int is_console_locked(void)
>  	return console_locked;
>  }
>  
> -/*
> - * Delayed printk version, for scheduler-internal messages:
> - */
> -#define PRINTK_BUF_SIZE		512
> +#define PRINTK_PENDING_WAKEUP	1
> +#define PRINTK_PENDING_OUTPUT	2
>  
> -#define PRINTK_PENDING_WAKEUP	0x01
> -#define PRINTK_PENDING_OUTPUT	0x02
> -
> -static DEFINE_PER_CPU(int, printk_pending);
> +static unsigned long printk_pending;
>  
>  void printk_tick(void)
>  {
> -	if (__this_cpu_read(printk_pending)) {
> -		int pending = __this_cpu_xchg(printk_pending, 0);
> -		if (pending & PRINTK_PENDING_OUTPUT) {
> +	if (printk_pending) {
> +		unsigned long pending = xchg(&printk_pending, 0);
> +
> +		if (test_bit(PRINTK_PENDING_OUTPUT, &pending)) {
>  			if (console_trylock())
>  				console_unlock();
>  			else
>  				/* Try again later */
> -				__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +				set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
>  		}
> -		if (pending & PRINTK_PENDING_WAKEUP)
> +		if (test_bit(PRINTK_PENDING_WAKEUP, &pending))
>  			wake_up_interruptible(&log_wait);
>  	}
>  }
> @@ -2016,13 +2012,13 @@ int printk_needs_cpu(int cpu)
>  {
>  	if (cpu_is_offline(cpu))
>  		printk_tick();
> -	return __this_cpu_read(printk_pending);
> +	return printk_pending;
>  }
>  
>  void wake_up_klogd(void)
>  {
>  	if (waitqueue_active(&log_wait))
> -		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> +		set_bit(PRINTK_PENDING_WAKEUP, &printk_pending);
>  }
>  
>  static void console_cont_flush(char *text, size_t size)
> @@ -2497,7 +2493,7 @@ int printk_sched(const char *fmt, ...)
>  	r = vprintk_emit(0, -2, NULL, 0, fmt, args);
>  	va_end(args);
>  
> -	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +	set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
>  
>  	return r;
>  }
> -- 
> 1.7.1
> 

> From 4d44b91477c73a29dd16a032bc431227d159f1bf Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@...e.cz>
> Date: Tue, 15 Jan 2013 15:15:06 +0100
> Subject: [PATCH 3/3] printk: Avoid softlockups in console_unlock()
> 
> A CPU can be caught in console_unlock() for a long time (tens of seconds are
> reported by our customers) when other CPUs are using printk heavily and serial
> console makes printing slow. Despite serial console drivers are calling
> touch_nmi_watchdog() this triggers softlockup warnings because
> interrupts are disabled for the whole time console_unlock() runs (e.g.
> vprintk() calls console_unlock() with interrupts disabled). Thus IPIs
> cannot be processed and other CPUs get stuck spinning in calls like
> smp_call_function_many(). Also RCU eventually starts reporting lockups.
> 
> In my artifical testing I also managed to trigger a situation when disk
> disappeared from the system apparently because commands to / from it
> could not be delivered for long enough. This is why just silencing
> watchdogs isn't a reliable solution to the problem and we simply have to
> avoid spending too long in console_unlock().
> 
> We fix the issue by limiting the time we spend in console_unlock() to
> watchdog_thresh() / 4 (unless we are in an early boot stage or oops is
> happening). The rest of the buffer will be printed either by further
> callers to printk() or during next timer tick.
> 
> Signed-off-by: Jan Kara <jack@...e.cz>
> ---
>  include/linux/nmi.h |   18 +++++++++++++
>  kernel/printk.c     |   70 +++++++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 75 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index db50840..ce32959 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -51,6 +51,24 @@ extern int watchdog_thresh;
>  struct ctl_table;
>  extern int proc_dowatchdog(struct ctl_table *, int ,
>  			   void __user *, size_t *, loff_t *);
> +/*
> + * Return the maximum number of nanosecond for which interrupts may be disabled
> + * on the current CPU
> + */
> +static inline u64 max_interrupt_disabled_duration(void)
> +{
> +	/*
> +	 * We give us some headroom because timers need time to fire before the
> +	 * watchdog period expires.
> +	 */
> +	return ((u64)watchdog_thresh) * NSEC_PER_SEC / 2;
> +}
> +#else
> +static inline u64 max_interrupt_disabled_duration(void)
> +{
> +	/* About the value we'd get with the default watchdog setting */
> +	return 5ULL * NSEC_PER_SEC;
> +}
>  #endif
>  
>  #endif
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 2fe917d..c4f7f6f 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -1990,17 +1990,31 @@ int is_console_locked(void)
>  #define PRINTK_PENDING_OUTPUT	2
>  
>  static unsigned long printk_pending;
> +static int last_printing_cpu = -1;
> +
> +static bool __console_unlock(void);
>  
>  void printk_tick(void)
>  {
>  	if (printk_pending) {
> -		unsigned long pending = xchg(&printk_pending, 0);
> +		unsigned long pending;
> +		int cpu = smp_processor_id();
> +
> +		/*
> +		 * Did we last print? Give other cpus a chance to do it but
> +		 * reset the variable so that we do printing in the next tick
> +		 * if noone else got to it.
> +		 */
> +		if (cmpxchg(&last_printing_cpu, cpu, -1))
> +			return;
>  
> +		pending = xchg(&printk_pending, 0);
>  		if (test_bit(PRINTK_PENDING_OUTPUT, &pending)) {
> +			bool more_work = true;
> +
>  			if (console_trylock())
> -				console_unlock();
> -			else
> -				/* Try again later */
> +				more_work = __console_unlock();
> +			if (more_work)
>  				set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
>  		}
>  		if (test_bit(PRINTK_PENDING_WAKEUP, &pending))
> @@ -2051,33 +2065,46 @@ out:
>  }
>  
>  /**
> - * console_unlock - unlock the console system
> + * __console_unlock - unlock the console system
>   *
>   * Releases the console_lock which the caller holds on the console system
>   * and the console driver list.
>   *
> - * While the console_lock was held, console output may have been buffered
> - * by printk().  If this is the case, console_unlock(); emits
> - * the output prior to releasing the lock.
> + * While the console_lock was held, console output may have been buffered by
> + * printk(). If this is the case, console_unlock() emits the output prior to
> + * releasing the lock. However we need not write all the data in the buffer if
> + * we would hog the CPU for too long. Function returns true, if there's more
> + * data that needs printing in the buffer.
>   *
>   * If there is output waiting, we wake /dev/kmsg and syslog() users.
>   *
> - * console_unlock(); may be called from any context.
> + * __console_unlock() may be called from any context.
>   */
> -void console_unlock(void)
> +static bool __console_unlock(void)
>  {
>  	static char text[LOG_LINE_MAX + PREFIX_MAX];
>  	static u64 seen_seq;
>  	unsigned long flags;
>  	bool wake_klogd = false;
>  	bool retry;
> +	u64 end_time, now;
> +	int cur_cpu;
>  
>  	if (console_suspended) {
>  		up(&console_sem);
> -		return;
> +		return false;
>  	}
>  
>  	console_may_schedule = 0;
> +	preempt_disable();
> +	cur_cpu = smp_processor_id();
> +	/*
> +	 * We give us some headroom because we check the time only after
> +	 * printing the whole message
> +	 */
> +	end_time = cpu_clock(cur_cpu) +
> +				max_interrupt_disabled_duration() / 2;
> +	preempt_enable();
>  
>  	/* flush buffered message fragment immediately to console */
>  	console_cont_flush(text, sizeof(text));
> @@ -2100,7 +2127,9 @@ again:
>  			console_prev = 0;
>  		}
>  skip:
> -		if (console_seq == log_next_seq)
> +		now = sched_clock_cpu(cur_cpu);
> +		if (console_seq == log_next_seq ||
> +		    (now > end_time && !oops_in_progress))
>  			break;
>  
>  		msg = log_from_idx(console_idx);
> @@ -2143,6 +2172,7 @@ skip:
>  
>  	raw_spin_unlock(&logbuf_lock);
>  
> +	last_printing_cpu = cur_cpu;
>  	up(&console_sem);
>  
>  	/*
> @@ -2155,11 +2185,25 @@ skip:
>  	retry = console_seq != log_next_seq;
>  	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
>  
> -	if (retry && console_trylock())
> +	if (retry && (now <= end_time || oops_in_progress) &&
> +	    console_trylock())
>  		goto again;
>  
>  	if (wake_klogd)
>  		wake_up_klogd();
> +	return retry;
> +}
> +
> +/*
> + * This is a wrapper against __console_unlock() that makes sure the rest of
> + * buffer is printed in future.
> + */
> +void console_unlock(void)
> +{
> +	if (__console_unlock()) {
> +		/* Leave the rest of printing for a timer tick */
> +		set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
> +	}
>  }
>  EXPORT_SYMBOL(console_unlock);
>  
> -- 
> 1.7.1
> 

-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ