lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 26 Jan 2008 10:10:15 +0100
From:	Pavel Machek <pavel@....cz>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Christoph Hellwig <hch@...radead.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
	Gregory Haskins <ghaskins@...ell.com>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	Thomas Gleixner <tglx@...utronix.de>,
	Tim Bird <tim.bird@...sony.com>,
	Sam Ravnborg <sam@...nborg.org>,
	"Frank Ch. Eigler" <fche@...hat.com>,
	Jan Kiszka <jan.kiszka@...mens.com>,
	John Stultz <johnstul@...ibm.com>,
	Arjan van de Ven <arjan@...radead.org>,
	Steven Rostedt <srostedt@...hat.com>
Subject: Re: [PATCH 01/23 -v6] printk - dont wakeup klogd with interrupts
	disabled

On Fri 2008-01-25 23:21:53, Steven Rostedt wrote:
> [ This patch is added to the series since the wakeup timings trace
>   may lockup without it. ]
> 
> I thought that one could place a printk anywhere without worrying.
> But it seems that it is not wise to place a printk where the runqueue
> lock is held.
> 
> I just spent two hours debugging why some of my code was locking up,
> to find that the lockup was caused by some debugging printk's that
> I had in the scheduler.  The printk's were only in rare paths so
> they shouldn't be too much of a problem, but after I hit the printk
> the system locked up.
> 
> Thinking that it was locking up on my code I went looking down the
> wrong path. I finally found (after examining an NMI dump) that
> the lockup happened because printk was trying to wakeup the klogd
> daemon, which caused a deadlock when the try_to_wakeup code tries
> to grab the runqueue lock.
> 
> This patch adds a runqueue_is_locked interface in sched.c for other
> files to see if the current runqueue lock is held. This is used
> in printk to determine whether it is safe or not to wakeup the klogd.
> 
> And with this patch, my code ran fine ;-)
> 
> Signed-off-by: Steven Rostedt <srostedt@...hat.com>
> ---
>  include/linux/sched.h |    2 ++
>  kernel/printk.c       |   14 ++++++++++----
>  kernel/sched.c        |   18 ++++++++++++++++++
>  3 files changed, 30 insertions(+), 4 deletions(-)
> 
> Index: linux-mcount.git/kernel/printk.c
> ===================================================================
> --- linux-mcount.git.orig/kernel/printk.c	2008-01-25 21:46:50.000000000 -0500
> +++ linux-mcount.git/kernel/printk.c	2008-01-25 21:46:55.000000000 -0500
> @@ -590,9 +590,11 @@ static int have_callable_console(void)
>   * @fmt: format string
>   *
>   * This is printk().  It can be called from any context.  We want it to work.
> - * Be aware of the fact that if oops_in_progress is not set, we might try to
> - * wake klogd up which could deadlock on runqueue lock if printk() is called
> - * from scheduler code.
> + *
> + * Note: if printk() is called with the runqueue lock held, it will not wake
> + * up the klogd. This is to avoid a deadlock from calling printk() in schedule
> + * with the runqueue lock held and having the wake_up grab the runqueue lock
> + * as well.
>   *
>   * We try to grab the console_sem.  If we succeed, it's easy - we log the output and
>   * call the console drivers.  If we fail to get the semaphore we place the output
> @@ -1003,7 +1005,11 @@ void release_console_sem(void)
>  	console_locked = 0;
>  	up(&console_sem);
>  	spin_unlock_irqrestore(&logbuf_lock, flags);
> -	if (wake_klogd)
> +	/*
> +	 * If we try to wake up klogd while printing with the runqueue lock
> +	 * held, this will deadlock.
> +	 */
> +	if (wake_klogd && !runqueue_is_locked())
>  		wake_up_klogd();
>  }

I guess you are going to kill me... but 

CPU0					CPU1
if (!runqueue_is_locked()) {
					locks runqueue
	wake_up_klogd

...and we are dead. What is needed here is
"wake_up_klogd_if_you_can()" or something, that does trylock (atomic).

...but even this version is better than status quo, I'd say.

								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ