[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090410153229.GB6719@linux.vnet.ibm.com>
Date: Fri, 10 Apr 2009 08:32:29 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Al Viro <viro@...IV.linux.org.uk>
Cc: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
linux-kernel@...r.kernel.org, hugh@...itas.com, jmorris@...ei.org,
akpm@...ux-foundation.org
Subject: Re: [2.6.30-rc1] RCU detected CPU 1 stall
On Fri, Apr 10, 2009 at 04:03:53PM +0100, Al Viro wrote:
> On Fri, Apr 10, 2009 at 07:22:03AM -0700, Paul E. McKenney wrote:
>
> > Hmmmm... This indicates that CPU 1 was spinning in the kernel for
> > a long time. At 250 HZ, 32,565 jiffies is 130 seconds, or just over
> > two -minutes-. Ouch!!!
> >
> > The interrupt happened on the stalled CPU, so we know that interrupts
> > were enabled. Because we have CONFIG_PREEMPT_NONE=y, there is no
> > preemption, so preemption need not be disabled. This could be due
> > to lock contention, or even a simple infinite loop.
> >
> > The timer interrupt (apic_timer_interrupt) occurred in either
> > __bprm_mm_init(), __get_user_4(), count(), or do_execve(). There
> > have been some recent changes around check_unsafe_exec() -- any
> > possibility that these introduced excessive lock contention or
> > an infinite loop? Ditto for the recent security fixes?
>
> Oh, joy... the loop in there is this:
> for (t = next_thread(p); t != p; t = next_thread(t)) {
> if (t->fs == p->fs)
> n_fs++;
> }
> I find it hard to believe that it can take two minutes, though.
Tetsuo, how many tasks did you have on this machine?
Though I too find it hard to believe that there were enough to chew up
two minutes. Maybe the list got corrupted so that it has a loop?
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists