linux-kernel - Re: Hard lockups using 3.10.0

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130711175015.GZ16780@linux.vnet.ibm.com>
Date:	Thu, 11 Jul 2013 10:50:15 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Borislav Petkov <bp@...en8.de>,
	Rolf Eike Beer <eike-kernel@...tec.de>,
	linux-kernel@...r.kernel.org, dhowells@...hat.com
Subject: Re: Hard lockups using 3.10.0

On Thu, Jul 11, 2013 at 12:52:07PM +0200, Peter Zijlstra wrote:
> On Thu, Jul 11, 2013 at 12:07:21PM +0200, Borislav Petkov wrote:
> > On Thu, Jul 11, 2013 at 11:38:37AM +0200, Rolf Eike Beer wrote:
> > > Hi,
> > > 
> > > I'm running 3.10.0 (from openSUSE packages) on an "Intel(R) Core(TM) i7-2600 
> > > CPU @ 3.40GHz". I got a hard lockup on one of my CPUs twice, once with 
> > > backtrace (see attached image). Graphics is the builtin Intel, used with X 7.6 
> > > and KDE 4.10beta2 (basically current openSUSE 12.3+KDE).
> > > 
> > > I'm not aware that I had done anything special, just "normal" desktop and 
> > > development usage, but no heavy compile work at the moment the lockups 
> > > happened.
> > 
> > Hmm, I can see commit_creds() doing some rcu pointers assignment and rcu
> > calling into the scheduler which screams about a cpu runqueue of the
> > task we're about to reschedule not being locked. Let's add some more
> > people who should know better.
> 
> Ok, for the other people too lazy to bother finding the picture:
> 
>   http://marc.info/?l=linux-kernel&m=137353587012001&q=p3
> 
> So we bug at:
> 
> kernel/sched/core.c:519 assert_raw_spin_locked(&task_rq(p)->lock);
> 
> and get there through:
> 
>   resched_task()
>   check_preempt_wakeup()
>   check_preempt_curr()
>   try_to_wake_up()
>   autoremove_wake_function()
>   __call_rcu_nocb_enqueue()
>   __call_rcu()
>   commit_creds()
>   ____call_usermodehelper()
>   ret_from_fork()
> 
> That don't make much sense though. Since:
> 
>   try_to_wake_up()
>     ttwu_queue()
>       raw_spin_lock(&rq->lock)
>       ttwu_do_activate()
>         ttwu_do_wakeup()
>           check_preempt_curr()
>             check_preempt_wakeup()
>               resched_task(rq->curr)
>                 assert_raw_spin_locked(task_rq(p)->lock)
> 
> It would somehow mean that 'task_rq(rq->curr) != rq', that's completely
> bonkers, we do after all have rq->lock locked.
> 
> I must also say that I've _never_ seen this bug before.

New one on me as well.  Is this reproducible?  If so, does it happen
when CONFIG_RCU_NOCB_CPU=n?  (Given the call to call_rcu_nocb_enqueue(),
I expect that you built with CONFIG_RCU_NOCB_CPU=y.)  Can't say that I
see how call_rcu_nocb_enqueue() would have caused this, but...

Well, I supposed that if RCU's callback lists got corrupted, this
(and much else besides) could in fact happen.  Does your build have
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y?  If not, could you please try it?

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/