linux-kernel - RE: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in the debug code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <000301d15985$7f416690$7dc433b0$@lge.com>
Date:	Thu, 28 Jan 2016 13:36:46 +0900
From:	"byungchul.park" <byungchul.park@....com>
To:	"'Sergey Senozhatsky'" <sergey.senozhatsky.work@...il.com>
Cc:	<akpm@...ux-foundation.org>, <mingo@...nel.org>,
	<linux-kernel@...r.kernel.org>, <akinobu.mita@...il.com>,
	<jack@...e.cz>, <torvalds@...ux-foundation.org>,
	<peter@...leysoftware.com>, <sergey.senozhatsky@...il.com>
Subject: RE: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in the debug code

> From: Sergey Senozhatsky [mailto:sergey.senozhatsky.work@...il.com]
> Sent: Thursday, January 28, 2016 11:38 AM
> To: Byungchul Park
> Cc: akpm@...ux-foundation.org; mingo@...nel.org; linux-
> kernel@...r.kernel.org; akinobu.mita@...il.com; jack@...e.cz;
> torvalds@...ux-foundation.org; peter@...leysoftware.com;
> sergey.senozhatsky.work@...il.com; sergey.senozhatsky@...il.com
> Subject: Re: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in
> the debug code
> 
> ok, I'll repeat the questions.
> 
> under what circumstances you hit this problem? ...memory corruption, cpu
> core has been powered off, while it owned the spin_lock... your irqsave
> didn't work?

I think these are not the my case.

> 
> the thing is, it's really-really hard to lockup in console_trylock()...
> 
> int down_trylock(struct semaphore *sem)
> {
>         unsigned long flags;
>         int count;
> 
>         raw_spin_lock_irqsave(&sem->lock, flags);   <<<<<<  um...

I also think it's hard, but a backtrace said the lockup happened here.

>         count = sem->count - 1;
>         if (likely(count >= 0))
>                 sem->count = count;
>         raw_spin_unlock_irqrestore(&sem->lock, flags);
> 
>         return (count < 0);
> }
> 
> was not able to dereference sem->count? `*sem' was corrupted? or because
> sem->lock was corrupted? in any of those cases you solve the problem from
> the wrong side. if you have a memory corruption then it can corrupt

What I solved here is to make it possible to print what the problem is, by
the spinlock debug code instead of system lockup while printing a debug
message. I think it's not wrong.

> anything,
> including, for example, console driver spin_lock.
> 
> 
> suppose, that you hit do_raw_spin_lock()->spin_dump(), which means that
> the
> spin_lock was not corrupted, it passed debug_spin_lock_before() after all.
> and that spin_lock was taken for longer than `loops_per_jiffy * HZ', while
> other CPU was doing
> 
>         count = sem->count - 1;
>         if (likely(count >= 0))
>                 sem->count = count;
> 
> ???
> 
> was the CPU that owned the lock alive? (h/w fault, perhaps?).

I am just curious.. Is it impossible but by h/w fault? e.g. timing to
allocate
virtual cpus to a guest machine when using virtual machine and so on.

> 
> 
> dunno... yes, this
> 	printk()->console_trylock()->do_raw_spin_lock()->spin_dump()-
> >printk()
> is possible, but it's possible only when your system is screwed up badly,
> so
> badly that this spin_dump() loop is not really a problem, IMHO.

IMHO, even though a system is screwed up badly, the spinlock debug code have
to print the information about the problem without lockup.

> 
> if I'm missing something terribly obvious here, then please give more
> details.

There are already codes to prevent the recursive cycle in the bug case,
but not for the just suspected case. I just made it possible for the
latter case. That's all this patch is doing, now. :)

thanks,
byungchul

> 
> 	-ss
> 
> > > Signed-off-by: Byungchul Park <byungchul.park@....com>
> > > ---
> > >  include/linux/debug_locks.h     |  4 ++++
> > >  kernel/locking/spinlock_debug.c | 14 +++++++++++---
> > >  2 files changed, 15 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
> > > index 822c135..b0f977e 100644
> > > --- a/include/linux/debug_locks.h
> > > +++ b/include/linux/debug_locks.h
> > > @@ -10,6 +10,10 @@ struct task_struct;
> > >  extern int debug_locks;
> > >  extern int debug_locks_silent;
> > >
> > > +static inline void __debug_locks_on(void)
> > > +{
> > > +	debug_locks = 1;
> > > +}
> > >
> > >  static inline int __debug_locks_off(void)
> > >  {
> > > diff --git a/kernel/locking/spinlock_debug.c
> b/kernel/locking/spinlock_debug.c
> > > index 0374a59..65177ba 100644
> > > --- a/kernel/locking/spinlock_debug.c
> > > +++ b/kernel/locking/spinlock_debug.c
> > > @@ -113,11 +113,19 @@ static void __spin_lock_debug(raw_spinlock_t
> *lock)
> > >  			return;
> > >  		__delay(1);
> > >  	}
> > > -	/* lockup suspected: */
> > > -	spin_dump(lock, "lockup suspected");
> > > +
> > > +	/*
> > > +	 * We should prevent calling printk() further, since it would cause
> > > +	 * an infinite recursive cycle if it's called from printk()!
> > > +	 */
> > > +	if (__debug_locks_off()) {
> > > +		/* lockup suspected: */
> > > +		spin_dump(lock, "lockup suspected");
> > >  #ifdef CONFIG_SMP
> > > -	trigger_all_cpu_backtrace();
> > > +		trigger_all_cpu_backtrace();
> > >  #endif
> > > +		__debug_locks_on();
> > > +	}
> > >
> > >  	/*
> > >  	 * The trylock above was causing a livelock.  Give the lower level
> arch
> > > --
> > > 1.9.1
> >