[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160129040500.GC4820@swordfish>
Date: Fri, 29 Jan 2016 13:05:00 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Byungchul Park <byungchul.park@....com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Peter Hurley <peter@...leysoftware.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
akpm@...ux-foundation.org, mingo@...nel.org,
linux-kernel@...r.kernel.org, akinobu.mita@...il.com, jack@...e.cz,
torvalds@...ux-foundation.org
Subject: Re: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in
the debug code
On (01/29/16 12:00), Byungchul Park wrote:
[..]
> > it took a while to even find out that you are reporting this issues
> > not against a real H/W, but a qemu. I suppose qemu-arm running on
> > x86_64 box.
>
> No matter what kind of box I used because I only kept talking about the
> possiblity. It does not depend on a box at all.
well, qemu completely invalidates all of the H/W theories - powered off,
etc. so in a way it's important.
> > on very spin_dump recursive call it waits for the spin_lock and when
> > it eventually grabs it, it does the job that it wanted to do under
> > that spin lock, unlock it and return back. and the only case when it
> > never "return back" is when it never "eventually grabs it".
>
> Right. I missed it.
hm... we also can hit problems in spin_unlock() path. AND there are chances
that spin_unlock() can explode WITH OUT any memory corruption on sight, but
due to a coding error... a theoretical one:
we do unlock logbuf_lock, and debug_spin_unlock() is performed on a
locked logbuf_lock spin_lock
static inline void debug_spin_unlock(raw_spinlock_t *lock)
{
SPIN_BUG_ON(lock->magic != SPINLOCK_MAGIC, lock, "bad magic");
SPIN_BUG_ON(!raw_spin_is_locked(lock), lock, "already unlocked");
SPIN_BUG_ON(lock->owner != current, lock, "wrong owner");
SPIN_BUG_ON(lock->owner_cpu != raw_smp_processor_id(),
lock, "wrong CPU");
lock->owner = SPINLOCK_OWNER_INIT;
lock->owner_cpu = -1;
}
void do_raw_spin_unlock(raw_spinlock_t *lock)
{
debug_spin_unlock(lock);
arch_spin_unlock(&lock->raw_lock);
}
so if there was a coding error (schedule while atomic, or unlock from another
CPU) which resulted in faulty
lock->owner_cpu != raw_smp_processor_id()
OR
lock->owner != current
then this will explode:
printk
spin_lock
>> coding error <<
spin_unlock
printk
spin_lock
printk
spin_lock
printk
spin_lock
... boom
vprintk_emit() recursion detection code will not work for logbuf_lock here.
because the only criteria how vprintk_emit() can detect a recursion is via
static `logbuf_cpu' which is set to UINT_MAX right before it
raw_spin_unlock(&logbuf_lock). so from vprintk_emit() POV the logbuf_lock is
already unlocked. which is not true.
in case of memory corruption I don't think we must care, 'coding error case'
is _probably/may be_ something that can be improved, but I'm not really 100%
sure... and this still doesn't explain your console_sem.lock case.
-ss
Powered by blists - more mailing lists