[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131210211220.GE27373@redhat.com>
Date: Tue, 10 Dec 2013 16:12:20 -0500
From: Dave Jones <davej@...hat.com>
To: Darren Hart <dvhart@...ux.intel.com>
Cc: Oleg Nesterov <oleg@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Andrea Arcangeli <aarcange@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Mel Gorman <mgorman@...e.de>
Subject: Re: process 'stuck' at exit.
On Tue, Dec 10, 2013 at 01:06:23PM -0800, Darren Hart wrote:
> On Tue, 2013-12-10 at 15:49 -0500, Dave Jones wrote:
> > On Tue, Dec 10, 2013 at 09:35:59PM +0100, Oleg Nesterov wrote:
> > > Dave, I must have missed something, help.
> > >
> > > I am looking at the first message and I can't understand who stuck
> > > "at exit".
> > >
> > > The trace shows that the task with pid=10818 called sys_futex() ?
> > >
> > > Perhaps "exit" means the userspace paths?
> >
> > pid 1131 is wait()'ing for 10818 to exit
> >
> > pid 1130 is periodically sending SIGKILL to 10818 because it's gotten
> > tired of waiting. 10818 is ignoring these because it's stuck in a loop
> > somewhere in the kernel.
> >
> > I tried attaching to 10818 with gdb, and it just hangs.
> > (possibly because its weird stack situation [see 1st post])
> >
> > by inspecting the shared mapping that all processes have (by gdb'ing 1130)
> > I can see that 10818 did all its full run without incident, and the
> > "exit child" flag in the fuzzer had been in set.
> >
> > The last 'random syscall' the fuzzer did was to sys_accept4, so the futex call
> > must come from somewhere in libc maybe ?
>
> If that is the case, then Linus' requeue_pi path is highly unlikely as
> FUTEX_CMP_REQUEUE_PI is not used by glibc (yet). That gives me hope as
> that way there be dragons. Knowing exactly what syscall was made would
> be very useful, but I don't know if that information is even available
> anymore.
So that last syscall _that_ 'stuck' thread did was accept4, but I found
another pid that had done a futex call just before it exited.
(see other mail)
What I don't understand is how the running child has futex as part of
its stack trace, when the internal log that trinity keeps has on record
for that particular pid having called it.
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists