lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 10 Dec 2013 16:12:20 -0500
From:	Dave Jones <davej@...hat.com>
To:	Darren Hart <dvhart@...ux.intel.com>
Cc:	Oleg Nesterov <oleg@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Mel Gorman <mgorman@...e.de>
Subject: Re: process 'stuck' at exit.

On Tue, Dec 10, 2013 at 01:06:23PM -0800, Darren Hart wrote:
 > On Tue, 2013-12-10 at 15:49 -0500, Dave Jones wrote:
 > > On Tue, Dec 10, 2013 at 09:35:59PM +0100, Oleg Nesterov wrote:
 > >  > Dave, I must have missed something, help.
 > >  > 
 > >  > I am looking at the first message and I can't understand who stuck
 > >  > "at exit".
 > >  > 
 > >  > The trace shows that the task with pid=10818 called sys_futex() ?
 > >  >
 > >  > Perhaps "exit" means the userspace paths?
 > > 
 > > pid 1131 is wait()'ing for 10818 to exit
 > > 
 > > pid 1130 is periodically sending SIGKILL to 10818 because it's gotten
 > > tired of waiting. 10818 is ignoring these because it's stuck in a loop
 > > somewhere in the kernel.
 > > 
 > > I tried attaching to 10818 with gdb, and it just hangs.
 > > (possibly because its weird stack situation [see 1st post])
 > > 
 > > by inspecting the shared mapping that all processes have (by gdb'ing 1130)
 > > I can see that 10818 did all its full run without incident, and the
 > > "exit child" flag in the fuzzer had been in set.
 > > 
 > > The last 'random syscall' the fuzzer did was to sys_accept4, so the futex call
 > > must come from somewhere in libc maybe ?
 > 
 > If that is the case, then Linus' requeue_pi path is highly unlikely as
 > FUTEX_CMP_REQUEUE_PI is not used by glibc (yet). That gives me hope as
 > that way there be dragons. Knowing exactly what syscall was made would
 > be very useful, but I don't know if that information is even available
 > anymore.

So that last syscall _that_ 'stuck' thread did was accept4, but I found
another pid that had done a futex call just before it exited.
(see other mail)

What I don't understand is how the running child has futex as part of
its stack trace, when the internal log that trinity keeps has on record
for that particular pid having called it.

	Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ