[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131210210956.GD27373@redhat.com>
Date: Tue, 10 Dec 2013 16:09:56 -0500
From: Dave Jones <davej@...hat.com>
To: Darren Hart <dvhart@...ux.intel.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Andrea Arcangeli <aarcange@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: process 'stuck' at exit.
On Tue, Dec 10, 2013 at 12:57:57PM -0800, Darren Hart wrote:
> > > Call Trace:
> > > [<ffffffff817587a0>] ? retint_restore_args+0xe/0xe
> > > [<ffffffff8132af0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > > [<ffffffff8100b184>] ? native_sched_clock+0x24/0x80
> > > [<ffffffff8109624f>] ? local_clock+0xf/0x50
> > > [<ffffffff810aa27e>] ? put_lock_stats.isra.28+0xe/0x30
> > > [<ffffffff8103edd0>] ? gup_pud_range+0x170/0x190
> > > [<ffffffff8103f0d5>] ? get_user_pages_fast+0x1a5/0x1c0
> > > [<ffffffff810ad1f5>] ? trace_hardirqs_on_caller+0x115/0x1e0
> > > [<ffffffff810a8a2f>] ? up_read+0x1f/0x40
> > > [<ffffffff8103f0d5>] ? get_user_pages_fast+0x1a5/0x1c0
> > > [<ffffffff8115f76c>] ? put_page+0x3c/0x50
> > > [<ffffffff810dd525>] ? get_futex_key+0xd5/0x2c0
> > > [<ffffffff810df18a>] ? futex_requeue+0xfa/0x9c0
> > > [<ffffffff810e019e>] ? do_futex+0xae/0xc80
> > > [<ffffffff810aa27e>] ? put_lock_stats.isra.28+0xe/0x30
> > > [<ffffffff810aa7de>] ? lock_release_holdtime.part.29+0xee/0x170
> > > [<ffffffff8114f16e>] ? context_tracking_user_exit+0x4e/0x190
> > > [<ffffffff810ad1f5>] ? trace_hardirqs_on_caller+0x115/0x1e0
> > > [<ffffffff810e0de1>] ? SyS_futex+0x71/0x150
> > > [<ffffffff81010a45>] ? syscall_trace_enter+0x145/0x2a0
> > > [<ffffffff81760be4>] ? tracesys+0xdd/0xe2
> > >
>
> Can you get us an idea of the arguments trinity is tossing into
> SYS_futex?
>
> Op code? Would help to know if this was requeue_pi for example.
> Type of memory being used for the uaddr?
As is always the case, the interesting bugs only seem to happen
when I have logging disabled. So other than what I can glean from what's
left in the shm, no idea.
One of the other child processes (which exited already) did do a sys_futex.
the params it passed were..
1cb5000, -1, c57, 1cb5004, ffffffffffd8f420, 90000000091a6311
The result of this syscall was -1
> I see futex_requeue in the stack, which means the opcode is one of:
>
> FUTEX_REQUEUE
> FUTEX_CMP_REQUEUE
> FUTEX_CMP_REQUEUE_PI
>
> FUTEX_REQUEUE has a known issue and was replaced with FUTEX_CMP_REQUEUE,
> for details, test cases, and an analysis see the historic tree:
>
> commit 9b91d73bde9d68800f9e5c338c0cf9d0fe3bc862
> Author: Andrew Morton <akpm@...l.org>
> Date: 2004-05-31
>
> [PATCH] Add FUTEX_CMP_REQUEUE futex op
>
> Specifically:
> http://listman.redhat.com/archives/phil-list/2004-May/msg00023.html
>
>
> Trinity is going to trigger hangs in futexes just by it's very nature,
> but I believe you have watchdogs in place to kill such malformed tests
> after a timeout?
It should. Though that pid is happily ignoring the SIGKILL's the watchdog
is continuing to send, because it's never getting around to processing the
signals apparently.
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists