linux-kernel - Re: [PATCH] panic: release stale console lock to always get the logbuf printed out

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151008090153.GB13232@quack.suse.cz>
Date:	Thu, 8 Oct 2015 11:01:53 +0200
From:	Jan Kara <jack@...e.cz>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Vitaly Kuznetsov <vkuznets@...hat.com>,
	HATAYAMA Daisuke <d.hatayama@...fujitsu.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	Jiri Kosina <jkosina@...e.cz>, Baoquan He <bhe@...hat.com>,
	Prarit Bhargava <prarit@...hat.com>,
	Xie XiuQi <xiexiuqi@...wei.com>,
	Seth Jennings <sjenning@...hat.com>,
	linux-kernel@...r.kernel.org,
	"K. Y. Srinivasan" <kys@...rosoft.com>, Jan Kara <jack@...e.cz>
Subject: Re: [PATCH] panic: release stale console lock to always get the
 logbuf printed out

On Wed 07-10-15 15:34:08, Andrew Morton wrote:
> (cc Jan)
> 
> On Wed,  7 Oct 2015 19:02:22 +0200 Vitaly Kuznetsov <vkuznets@...hat.com> wrote:
> 
> > In some cases we may end up killing the CPU holding the console lock
> > while still having valuable data in logbuf. E.g. I'm observing the
> > following:
> > - A crash is happening on one CPU and console_unlock() is being called on
> >   some other.
> > - console_unlock() tries to print out the buffer before releasing the lock
> >   and on slow console it takes time.
> > - in the meanwhile crashing CPU does lots of printk()-s with valuable data
> >   (which go to the logbuf) and sends IPIs to all other CPUs.
> > - console_unlock() finishes printing previous chunk and enables interrupts
> >   before trying to print out the rest, the CPU catches the IPI and never
> >   releases console lock.
> 
> Why doesn't the lock-owning CPU release the console lock?  Because it
> was stopped by smp_send_stop() in panic()?
> 
> I don't recall why we stop CPUs in panic(), and of course we didn't
> document the reason.  I guess it makes sense from the "what else can we
> do" point of view, but I wonder if we can just do it later on - that
> would fix this problem?
> 
> (dumb aside: why doesn't smp_send_stop() stop the calling CPU?)
> 
> > This is not the only possible case: in VT/fb subsystems we have many other
> > console_lock()/console_unlock() users. Non-masked interrupts (or receiving
> > NMI in case of extreme slowness) will have the same result. Getting the
> > whole console buffer printed out on crash should be top priority.
> 
> Yes, this is a pretty big hole in the logic.
> 
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -23,6 +23,7 @@
> >  #include <linux/sysrq.h>
> >  #include <linux/init.h>
> >  #include <linux/nmi.h>
> > +#include <linux/console.h>
> >  
> >  #define PANIC_TIMER_STEP 100
> >  #define PANIC_BLINK_SPD 18
> > @@ -147,6 +148,15 @@ void panic(const char *fmt, ...)
> >  
> >  	bust_spinlocks(0);
> >  
> > +	/*
> > +	 * We may have ended up killing the CPU holding the lock and still have
> > +	 * some valuable data in console buffer. Try to acquire the lock and
> > +	 * release it regardless of the result. The release will also print the
> > +	 * buffers out.
> > +	 */
> > +	console_trylock();
> > +	console_unlock();
> > +
> 
> "killing the CPU" is a bit vague.  How's this look?
> 
> --- a/kernel/panic.c~panic-release-stale-console-lock-to-always-get-the-logbuf-printed-out-fix
> +++ a/kernel/panic.c
> @@ -149,10 +149,10 @@ void panic(const char *fmt, ...)
>  	bust_spinlocks(0);
>  
>  	/*
> -	 * We may have ended up killing the CPU holding the lock and still have
> -	 * some valuable data in console buffer. Try to acquire the lock and
> -	 * release it regardless of the result. The release will also print the
> -	 * buffers out.
> +	 * We may have ended up stopping the CPU holding the lock (in
> +	 * smp_send_stop()) while still having some valuable data in the console
> +	 * buffer.  Try to acquire the lock then release it regardless of the
> +	 * result.  The release will also print the buffers out.
>  	 */
>  	console_trylock();
>  	console_unlock();
> _
> 
> 
> Does the console_trylock() guarantee that the console lock is now held?
> If the console_lock-holding CPU is still running then there's a window
> where the above code could enter console_unlock() when nobody's holding
> console_lock.  If smp_send_stop() always works (synchronously) then
> that won't happen.

We have this mechanism using zap_locks() in kernel/printk/printk.c when
crash happens on the CPU holding console_sem. Can't we use the same
mechanism for this case? Something like adding:

	zap_locks();
	console_lock();
	console_unlock();

to panic? If we picked up patch "kernel: Avoid softlockups in
stop_machine() during heavy printing" from my series (it's completely
independent, I've attached the latest version), the result would look less
hacky to me (attached). Thoughts?

Warning, the result is untested... I can do some testing and official
posting of the two patches if people agree we want to got down this path.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

View attachment "0003-kernel-Avoid-softlockups-in-stop_machine-during-heav.patch" of type "text/x-patch" (4074 bytes)

View attachment "0001-panic-release-stale-console-lock-to-always-get-the-l.patch" of type "text/x-patch" (3936 bytes)