[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160131123041.GA1306@swordfish>
Date: Sun, 31 Jan 2016 21:30:41 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Peter Hurley <peter@...leysoftware.com>
Cc: Byungchul Park <byungchul.park@....com>, akpm@...ux-foundation.org,
mingo@...nel.org, linux-kernel@...r.kernel.org,
akinobu.mita@...il.com, jack@...e.cz,
"torvalds@...ux-foundation.org Sergey Senozhatsky"
<sergey.senozhatsky.work@...il.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Subject: Re: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in
the debug code
On (01/29/16 15:37), Sergey Senozhatsky wrote:
>
> panic()->console_panic_mode()->{for_each_console()->reset(), zap_locks()}->console_trelock()->console_unlock().
Hello,
This is not a final submission, just a RFC, so we can settle a better
plan. the patches are not signed off, have known problems (and likely
some unknown). I put a summary in here and send them out as a reply to
this email, so it'll be easier to review/comment/discuss.
patch 0001
***************
CPU stop IPI issued from panic() on CPUA, can leave console_sem locked
on CPUB if that cpu was holding the console_sem lock at the time when
IPI arrived. console_flush_on_panic() is trying to workaround it by
ignoring the return status of console_trylock() and unconditionally
executing console_unlock().
console_unlock() has a dependency on at least one more
lock - `logbuf_lock', which can be corrupted, for example, thus
console_unlock() may not be able to print anything afterall.
Introduce console_reset_on_panic() function to zap (re-init) printk
locks and call this function from panic().
WARNING
=======
This must be improved. console_reset_on_panic() is called before
smp_send_stop(), so:
a) we can have several CPU looping in console_unlock(), which is not
so critical.
b) we can re-init logbuf_lock while other CPU is holding it. Which
is more serious and needs to fixed.
The reason why console_reset_on_panic() is called this early is that
panicing CPU does pr_emerg("Kernel panic...") and dump_stack()
before it sends out smp_send_stop(). So if console_sem or logbug_lock,
or some console device driver lock is/are corrupted then panic() may
never smp_send_stop().
patch 0002
***************
Console driver(-s) can be in any state when CPU stop IPI
arrives from panic() issued on another CPU, so
console_flush_on_panic()->console_unlock() can call
con->write() callback on a locked console driver.
Introduce reset_console_drivers() that attempts to reset()
every console in via a console driver specific ->reset()
call.
Invoke reset_console_drivers() from console_reset_on_panic().
WARNING
=======
console_reset_on_panic() needs to be fixed.
patch 0003 -- detect recursive spin_dump() and panic() the system
***************
spin_dump() calls printk() which can attempt to reacquire the
'buggy' lock (one of printk's lock, or console device driver lock,
etc.) and thus spin_dump() will recursive into itself. Steal most
significant bit of spin_lock->owner_cpu to keep there a mark
that spin_dump() is in progress for that particular spin_lock.
spin_dump() will now set SPIN_DUMP_IN_PROGRESS bit at the
beginning of spin_dump() and clear it at the end, so it's
possible to detect recursive spin_dump() calls by checking if
lock's owner_cpu already has SPIN_DUMP_IN_PROGRESS bit already
set. panic() the system when spin_dump() recursion occurs.
-ss
Powered by blists - more mailing lists