lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 7 Oct 2015 15:34:08 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Vitaly Kuznetsov <vkuznets@...hat.com>
Cc:	HATAYAMA Daisuke <d.hatayama@...fujitsu.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	Jiri Kosina <jkosina@...e.cz>, Baoquan He <bhe@...hat.com>,
	Prarit Bhargava <prarit@...hat.com>,
	Xie XiuQi <xiexiuqi@...wei.com>,
	Seth Jennings <sjenning@...hat.com>,
	linux-kernel@...r.kernel.org,
	"K. Y. Srinivasan" <kys@...rosoft.com>, Jan Kara <jack@...e.cz>
Subject: Re: [PATCH] panic: release stale console lock to always get the
 logbuf printed out

(cc Jan)

On Wed,  7 Oct 2015 19:02:22 +0200 Vitaly Kuznetsov <vkuznets@...hat.com> wrote:

> In some cases we may end up killing the CPU holding the console lock
> while still having valuable data in logbuf. E.g. I'm observing the
> following:
> - A crash is happening on one CPU and console_unlock() is being called on
>   some other.
> - console_unlock() tries to print out the buffer before releasing the lock
>   and on slow console it takes time.
> - in the meanwhile crashing CPU does lots of printk()-s with valuable data
>   (which go to the logbuf) and sends IPIs to all other CPUs.
> - console_unlock() finishes printing previous chunk and enables interrupts
>   before trying to print out the rest, the CPU catches the IPI and never
>   releases console lock.

Why doesn't the lock-owning CPU release the console lock?  Because it
was stopped by smp_send_stop() in panic()?

I don't recall why we stop CPUs in panic(), and of course we didn't
document the reason.  I guess it makes sense from the "what else can we
do" point of view, but I wonder if we can just do it later on - that
would fix this problem?

(dumb aside: why doesn't smp_send_stop() stop the calling CPU?)

> This is not the only possible case: in VT/fb subsystems we have many other
> console_lock()/console_unlock() users. Non-masked interrupts (or receiving
> NMI in case of extreme slowness) will have the same result. Getting the
> whole console buffer printed out on crash should be top priority.

Yes, this is a pretty big hole in the logic.

> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -23,6 +23,7 @@
>  #include <linux/sysrq.h>
>  #include <linux/init.h>
>  #include <linux/nmi.h>
> +#include <linux/console.h>
>  
>  #define PANIC_TIMER_STEP 100
>  #define PANIC_BLINK_SPD 18
> @@ -147,6 +148,15 @@ void panic(const char *fmt, ...)
>  
>  	bust_spinlocks(0);
>  
> +	/*
> +	 * We may have ended up killing the CPU holding the lock and still have
> +	 * some valuable data in console buffer. Try to acquire the lock and
> +	 * release it regardless of the result. The release will also print the
> +	 * buffers out.
> +	 */
> +	console_trylock();
> +	console_unlock();
> +

"killing the CPU" is a bit vague.  How's this look?

--- a/kernel/panic.c~panic-release-stale-console-lock-to-always-get-the-logbuf-printed-out-fix
+++ a/kernel/panic.c
@@ -149,10 +149,10 @@ void panic(const char *fmt, ...)
 	bust_spinlocks(0);
 
 	/*
-	 * We may have ended up killing the CPU holding the lock and still have
-	 * some valuable data in console buffer. Try to acquire the lock and
-	 * release it regardless of the result. The release will also print the
-	 * buffers out.
+	 * We may have ended up stopping the CPU holding the lock (in
+	 * smp_send_stop()) while still having some valuable data in the console
+	 * buffer.  Try to acquire the lock then release it regardless of the
+	 * result.  The release will also print the buffers out.
 	 */
 	console_trylock();
 	console_unlock();
_


Does the console_trylock() guarantee that the console lock is now held?
If the console_lock-holding CPU is still running then there's a window
where the above code could enter console_unlock() when nobody's holding
console_lock.  If smp_send_stop() always works (synchronously) then
that won't happen.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ