linux-kernel - Re: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in the debug code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160131123041.GA1306@swordfish>
Date:	Sun, 31 Jan 2016 21:30:41 +0900
From:	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:	Peter Hurley <peter@...leysoftware.com>
Cc:	Byungchul Park <byungchul.park@....com>, akpm@...ux-foundation.org,
	mingo@...nel.org, linux-kernel@...r.kernel.org,
	akinobu.mita@...il.com, jack@...e.cz,
	"torvalds@...ux-foundation.org Sergey Senozhatsky" 
	<sergey.senozhatsky.work@...il.com>,
	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Subject: Re: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in
 the debug code

On (01/29/16 15:37), Sergey Senozhatsky wrote:
> 
> panic()->console_panic_mode()->{for_each_console()->reset(), zap_locks()}->console_trelock()->console_unlock().

Hello,

This is not a final submission, just a RFC, so we can settle a better
plan. the patches are not signed off, have known problems (and likely
some unknown). I put a summary in here and send them out as a reply to
this email, so it'll be easier to review/comment/discuss.

patch 0001
***************
CPU stop IPI issued from panic() on CPUA, can leave console_sem locked
on CPUB if that cpu was holding the console_sem lock at the time when
IPI arrived. console_flush_on_panic() is trying to workaround it by
ignoring the return status of console_trylock() and unconditionally
executing console_unlock().

console_unlock() has a dependency on at least one more
lock - `logbuf_lock', which can be corrupted, for example, thus
console_unlock() may not be able to print anything afterall.

Introduce console_reset_on_panic() function to zap (re-init) printk
locks and call this function from panic().

WARNING
=======
This must be improved. console_reset_on_panic() is called before
smp_send_stop(), so:
a) we can have several CPU looping in console_unlock(), which is not
so critical.
b) we can re-init logbuf_lock while other CPU is holding it. Which
is more serious and needs to fixed.

The reason why console_reset_on_panic() is called this early is that
panicing CPU does pr_emerg("Kernel panic...") and dump_stack()
before it sends out smp_send_stop(). So if console_sem or logbug_lock,
or some console device driver lock is/are corrupted then panic() may
never smp_send_stop().

patch 0002
***************
Console driver(-s) can be in any state when CPU stop IPI
arrives from panic() issued on another CPU, so
console_flush_on_panic()->console_unlock() can call
con->write() callback on a locked console driver.

Introduce reset_console_drivers() that attempts to reset()
every console in via a console driver specific ->reset()
call.

Invoke reset_console_drivers() from console_reset_on_panic().

WARNING
=======
console_reset_on_panic() needs to be fixed.

patch 0003 -- detect recursive spin_dump() and panic() the system
***************
spin_dump() calls printk() which can attempt to reacquire the
'buggy' lock (one of printk's lock, or console device driver lock,
etc.) and thus spin_dump() will recursive into itself. Steal most
significant bit of spin_lock->owner_cpu to keep there a mark
that spin_dump() is in progress for that particular spin_lock.
spin_dump() will now set SPIN_DUMP_IN_PROGRESS bit at the
beginning of spin_dump() and clear it at the end, so it's
possible to detect recursive spin_dump() calls by checking if
lock's owner_cpu already has SPIN_DUMP_IN_PROGRESS bit already
set. panic() the system when spin_dump() recursion occurs.

	-ss