[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B4C3E4F.9060001@memeplex.com>
Date: Tue, 12 Jan 2010 04:18:07 -0500
From: Andrew Athan <linux_kernel_aathan@...eplex.com>
To: linux-kernel@...r.kernel.org
Subject: Futex hang/lockup problem in 2.6.30+ on AMD64
After some investigation I believe I am experiencing a problem similar
to the one described in this posting:
http://sourceware.org/ml/libc-help/2009-10/msg00026.html, in that the
poster suspects a problem in the futex implementation in 2.6.30 and
above kernels. In my case, the problem is not a soft lockup in the
kernel, but it does result in an application lock up due to all threads
waiting for futex's.
For me this problem began to appear once I upgraded my Debian
squeeze/testing x86_64 installation (AMD) to a new kernel. I'm not
sure what the prior kernel version was. The same software running on
different machines with earlier kernels (lenny) does not seem to
experience the problem.
I'm really not sure if this is a libc or kernel problem, but due to
the stack trace, which shows what appears to be a hang on the internal
__lock of the condition variable, it appears likely this is not an
application bug. Memory does not appear to be corrupt (I store
sentinels around the mutexes, and they have retained their values).
It appears that the cond var's __lock indicates there are waiters
even though there are/should-be none (assuming I'm interpreting the
__lock value of 2 correctly). Since the __lock in question is a futex
primitive, and it must be held regardless of other libc/nptl state
variables,
I don't believe this is a libc problem.
The problem occurs rarely, but innevitably, and sometimes only after
several hours of normal program operation. I have not yet
successfully created a reduced test program that can faithfully
reproduce the hang in a short timeframe.
The application contains a thread pool where threads perform many
operations between pthread calls but can be summarized as one of three
cases below. Due to the design of the thread pool, threads
round-robbin or at least are randomly assigned a workload (in contrast
to having one constant broadcast thread).
case 1: while(1){ *A* pthread_lock();pthread_unlock();}
case 2: pthread_lock();pthread_cond_wait();pthread_unlock();
case 3: pthread_lock(); *B* pthread_cond_broadcast();pthread_unlock();
The application becomes hung with all threads but one stuck at *A*,
and one thread at *B*.
The stack trace and other details appear below. I've saved the core
file in case I can provide additional information.
$ uname -a
Linux UK22 2.6.30-2-amd64 #1 SMP Fri Sep 25 22:16:56 UTC 2009 x86_64
GNU/Linux
I rebuilt Debian's eglibc-2.10.2 from source with -g flag to get a
better trace. Here is ldd on the application:
linux-vdso.so.1 => (0x00007fff149ff000)
libboost_python.so.1.40.0 => ./libboost_python.so.1.40.0
(0x00007f1f2c55a000)
libpython2.5.so.1.0 => /usr/lib/libpython2.5.so.1.0 (0x00007f1f2c1e1000)
libACEXML_Parser.so.5.4.0 => /var/ACE/libACEXML_Parser.so.5.4.0
(0x00007f1f2bfbf000)
libACEXML.so.5.4.0 => /var/ACE/libACEXML.so.5.4.0 (0x00007f1f2bd77000)
libACE.so.5.4.0 => /var/ACE/libACE.so.5.4.0 (0x00007f1f2acc3000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f1f2aabf000)
libpthread.so.0 =>
/home/root/eglibc-2.10.2/build-tree/amd64-libc/nptl/libpthread.so.0
(0x00007f1f2a8a2000)
librt.so.1 => /lib/librt.so.1 (0x00007f1f2a69a000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f1f2a38a000)
libm.so.6 => /lib/libm.so.6 (0x00007f1f2a107000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f1f29ef1000)
libc.so.6 => /lib/libc.so.6 (0x00007f1f29b9d000)
libutil.so.1 => /lib/libutil.so.1 (0x00007f1f29999000)
/lib64/ld-linux-x86-64.so.2 (0x00007f1f2c7b1000)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
GDB BACKTRACE
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
See below for source of last couple of stack frames.
All threads except thread 4 are waiting for a lock on the "external"
mutex being used in conjunction with the condition variable. The
owner of that lock is 25521 which sure enough is thread 4. However,
thread 4 appears to be waiting on the internal __lock of the condition
variable. Since that variable appears to have no waiters and the
other threads' traces are not inside any pthread calls associated with
that __lock, it seems reasonable that there is either a pthread or
futex problem.
Thread 7 (Thread 25524):
#0 __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007f9c9b282e79 in _L_lock_949 () from
/home/root/eglibc-2.10.2/build-tree/amd64-libc/nptl/libpthread.so.0
#2 0x00007f9c9b282c9b in __pthread_mutex_lock (mutex=0x1dc3960) at
pthread_mutex_lock.c:61
#3 0x00007f9c9c545021 in ACE_OS::mutex_lock (m=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/OS_NS_Thread.inl:1296
#4 0x00007f9c9c545061 in ACE_OS::thread_mutex_lock (m=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/OS_NS_Thread.inl:4443
#5 0x00007f9c9c54508f in ACE_Thread_Mutex::acquire (this=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Thread_Mutex.inl:57
#6 0x00007f9c9c5410e2 in ACE_Guard<ACE_Thread_Mutex>::acquire
(this=0x7f9c7f7f5e90)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Guard_T.inl:9
#7 0x00007f9c9c541123 in ACE_Guard (this=0x7f9c7f7f5e90, l=...)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Guard_T.inl:35
#8 0x00007f9c9c2e1da6 in TTWork::GeneratorSelect::reselect
(this=0x1dc38f0, wi=0x7f9c80af9660) at TTWork.cpp:873
#9 0x00007f9c9c2e1e92 in TTWork::WorkItemHandle::clearReadyMask
(this=0x7f9c80af9660, mask=1, resel=true)
at TTWork.cpp:1061
#10 0x00007f9c9c2eaea2 in TTWork::NetServiceTCP::doTheWork
(this=0x7f9c80af9660, workEV=...)
at TTWorkNetServiceTCP.cpp:278
#11 0x00007f9c9c2eb354 in TTWork::NetServiceTCP::doWork
(this=0x7f9c80af9660, workEV=...)
at TTWorkNetServiceTCP.cpp:351
#12 0x00007f9c9c2dfccb in TTWork::Dispatcher::dispatch (this=0x13b5c60)
at TTWork.cpp:234
#13 0x00007f9c9c2e3a4f in TTWork::Dispatcher::dispatchGenerate
(this=0x13b5c60, maxWait=0x0, min=0x7f9c7f7f6260)
at TTWork.cpp:324
#14 0x00007f9c9c2e44fd in TTWork::DispatcherTask::runTask
(this=0x13b6ec0) at TTWork.cpp:1580
#15 0x00007f9c9c2e4fee in TTWork::Task::svc (this=0x13b6ec0) at
TTWork.cpp:50
#16 0x00007f9c9b865344 in ACE_Task_Base::svc_run (args=0x13b6ee8) at
Task.cpp:210
#17 0x00007f9c9b7dcb0f in ACE_Thread_Adapter::invoke_i
(this=0x7f9c80000bc0) at Thread_Adapter.cpp:150
#18 0x00007f9c9b7dcbb9 in ACE_Thread_Adapter::invoke
(this=0x7f9c80000bc0) at Thread_Adapter.cpp:93
#19 0x00007f9c9b78c0e3 in ace_thread_adapter (args=0x7f9c80000bc0) at
Base_Thread_Adapter.cpp:131
#20 0x00007f9c9b28073a in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#21 0x00007f9c9a64169d in clone () from /lib/libc.so.6
#22 0x0000000000000000 in ?? ()
Thread 6 (Thread 25523):
#0 __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007f9c9b282e79 in _L_lock_949 () from
/home/root/eglibc-2.10.2/build-tree/amd64-libc/nptl/libpthread.so.0
#2 0x00007f9c9b282c9b in __pthread_mutex_lock (mutex=0x1dc3960) at
pthread_mutex_lock.c:61
#3 0x00007f9c9c545021 in ACE_OS::mutex_lock (m=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/OS_NS_Thread.inl:1296
#4 0x00007f9c9c545061 in ACE_OS::thread_mutex_lock (m=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/OS_NS_Thread.inl:4443
#5 0x00007f9c9c54508f in ACE_Thread_Mutex::acquire (this=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Thread_Mutex.inl:57
#6 0x00007f9c9c5410e2 in ACE_Guard<ACE_Thread_Mutex>::acquire
(this=0x7f9c7fff6e90)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Guard_T.inl:9
#7 0x00007f9c9c541123 in ACE_Guard (this=0x7f9c7fff6e90, l=...)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Guard_T.inl:35
#8 0x00007f9c9c2e1da6 in TTWork::GeneratorSelect::reselect
(this=0x1dc38f0, wi=0x7f9c80ab8e40) at TTWork.cpp:873
#9 0x00007f9c9c2e1e92 in TTWork::WorkItemHandle::clearReadyMask
(this=0x7f9c80ab8e40, mask=1, resel=true)
at TTWork.cpp:1061
#10 0x00007f9c9c2eaea2 in TTWork::NetServiceTCP::doTheWork
(this=0x7f9c80ab8e40, workEV=...)
at TTWorkNetServiceTCP.cpp:278
#11 0x00007f9c9c2eb354 in TTWork::NetServiceTCP::doWork
(this=0x7f9c80ab8e40, workEV=...)
at TTWorkNetServiceTCP.cpp:351
#12 0x00007f9c9c2dfccb in TTWork::Dispatcher::dispatch (this=0x13b5c60)
at TTWork.cpp:234
#13 0x00007f9c9c2e3a4f in TTWork::Dispatcher::dispatchGenerate
(this=0x13b5c60, maxWait=0x0, min=0x7f9c7fff7260)
at TTWork.cpp:324
#14 0x00007f9c9c2e44fd in TTWork::DispatcherTask::runTask
(this=0x13b6ec0) at TTWork.cpp:1580
#15 0x00007f9c9c2e4fee in TTWork::Task::svc (this=0x13b6ec0) at
TTWork.cpp:50
#16 0x00007f9c9b865344 in ACE_Task_Base::svc_run (args=0x13b6ee8) at
Task.cpp:210
#17 0x00007f9c9b7dcb0f in ACE_Thread_Adapter::invoke_i
(this=0x7f9c80000970) at Thread_Adapter.cpp:150
#18 0x00007f9c9b7dcbb9 in ACE_Thread_Adapter::invoke
(this=0x7f9c80000970) at Thread_Adapter.cpp:93
#19 0x00007f9c9b78c0e3 in ace_thread_adapter (args=0x7f9c80000970) at
Base_Thread_Adapter.cpp:131
#20 0x00007f9c9b28073a in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#21 0x00007f9c9a64169d in clone () from /lib/libc.so.6
#22 0x0000000000000000 in ?? ()
Thread 5 (Thread 25522):
#0 __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007f9c9b282e79 in _L_lock_949 () from
/home/root/eglibc-2.10.2/build-tree/amd64-libc/nptl/libpthread.so.0
#2 0x00007f9c9b282c9b in __pthread_mutex_lock (mutex=0x1dc3960) at
pthread_mutex_lock.c:61
#3 0x00007f9c9c545021 in ACE_OS::mutex_lock (m=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/OS_NS_Thread.inl:1296
#4 0x00007f9c9c545061 in ACE_OS::thread_mutex_lock (m=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/OS_NS_Thread.inl:4443
#5 0x00007f9c9c54508f in ACE_Thread_Mutex::acquire (this=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Thread_Mutex.inl:57
#6 0x00007f9c9c5410e2 in ACE_Guard<ACE_Thread_Mutex>::acquire
(this=0x7f9c84e14e90)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Guard_T.inl:9
#7 0x00007f9c9c541123 in ACE_Guard (this=0x7f9c84e14e90, l=...)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Guard_T.inl:35
#8 0x00007f9c9c2e1da6 in TTWork::GeneratorSelect::reselect
(this=0x1dc38f0, wi=0x7f9c80407020) at TTWork.cpp:873
#9 0x00007f9c9c2e1e92 in TTWork::WorkItemHandle::clearReadyMask
(this=0x7f9c80407020, mask=1, resel=true)
at TTWork.cpp:1061
#10 0x00007f9c9c2eaea2 in TTWork::NetServiceTCP::doTheWork
(this=0x7f9c80407020, workEV=...)
at TTWorkNetServiceTCP.cpp:278
#11 0x00007f9c9c2eb354 in TTWork::NetServiceTCP::doWork
(this=0x7f9c80407020, workEV=...)
at TTWorkNetServiceTCP.cpp:351
#12 0x00007f9c9c2dfccb in TTWork::Dispatcher::dispatch (this=0x13b5c60)
at TTWork.cpp:234
#13 0x00007f9c9c2e3a4f in TTWork::Dispatcher::dispatchGenerate
(this=0x13b5c60, maxWait=0x0, min=0x7f9c84e15260)
at TTWork.cpp:324
#14 0x00007f9c9c2e44fd in TTWork::DispatcherTask::runTask
(this=0x13b6ec0) at TTWork.cpp:1580
#15 0x00007f9c9c2e4fee in TTWork::Task::svc (this=0x13b6ec0) at
TTWork.cpp:50
#16 0x00007f9c9b865344 in ACE_Task_Base::svc_run (args=0x13b6ee8) at
Task.cpp:210
#17 0x00007f9c9b7dcb0f in ACE_Thread_Adapter::invoke_i
(this=0x7f9c80000bc0) at Thread_Adapter.cpp:150
#18 0x00007f9c9b7dcbb9 in ACE_Thread_Adapter::invoke
(this=0x7f9c80000bc0) at Thread_Adapter.cpp:93
#19 0x00007f9c9b78c0e3 in ace_thread_adapter (args=0x7f9c80000bc0) at
Base_Thread_Adapter.cpp:131
#20 0x00007f9c9b28073a in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#21 0x00007f9c9a64169d in clone () from /lib/libc.so.6
#22 0x0000000000000000 in ?? ()
Thread 4 (Thread 25521):
#0 __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007f9c9b2854d0 in pthread_cond_broadcast@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_broadcast.S:118
#2 0x00007f9c9c2b87c7 in ACE_OS::cond_broadcast (cv=0x1dc4500)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6/ace/OS_NS_Thread.inl:294
#3 0x00007f9c9c2b5325 in ACE_Condition<ACE_Thread_Mutex>::broadcast
(this=0x1dc4500)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6/ace/Condition_T.inl:81
#4 0x00007f9c9c2e229e in TTWork::GeneratorSelect::generate
(this=0x1dc38f0, nextGenTime=...,
maxWait=0x7f9c856161c0) at TTWork.cpp:814
#5 0x00007f9c9c2e38f2 in TTWork::Dispatcher::generate (this=0x13b5c60,
maxWait=0x7f9c85616220, min=0x7f9c85616260)
at TTWork.cpp:300
#6 0x00007f9c9c2e3a9b in TTWork::Dispatcher::dispatchGenerate
(this=0x13b5c60, maxWait=0x0, min=0x7f9c85616260)
at TTWork.cpp:331
#7 0x00007f9c9c2e44fd in TTWork::DispatcherTask::runTask
(this=0x13b6ec0) at TTWork.cpp:1580
#8 0x00007f9c9c2e4fee in TTWork::Task::svc (this=0x13b6ec0) at
TTWork.cpp:50
#9 0x00007f9c9b865344 in ACE_Task_Base::svc_run (args=0x13b6ee8) at
Task.cpp:210
#10 0x00007f9c9b7dcb0f in ACE_Thread_Adapter::invoke_i
(this=0x7f9c80000970) at Thread_Adapter.cpp:150
#11 0x00007f9c9b7dcbb9 in ACE_Thread_Adapter::invoke
(this=0x7f9c80000970) at Thread_Adapter.cpp:93
#12 0x00007f9c9b78c0e3 in ace_thread_adapter (args=0x7f9c80000970) at
Base_Thread_Adapter.cpp:131
#13 0x00007f9c9b28073a in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#14 0x00007f9c9a64169d in clone () from /lib/libc.so.6
#15 0x0000000000000000 in ?? ()
Thread 3 (Thread 25520):
#0 __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007f9c9b282e79 in _L_lock_949 () from
/home/root/eglibc-2.10.2/build-tree/amd64-libc/nptl/libpthread.so.0
#2 0x00007f9c9b282c9b in __pthread_mutex_lock (mutex=0x1dc3960) at
pthread_mutex_lock.c:61
#3 0x00007f9c9c545021 in ACE_OS::mutex_lock (m=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/OS_NS_Thread.inl:1296
#4 0x00007f9c9c545061 in ACE_OS::thread_mutex_lock (m=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/OS_NS_Thread.inl:4443
#5 0x00007f9c9c54508f in ACE_Thread_Mutex::acquire (this=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Thread_Mutex.inl:57
#6 0x00007f9c9c5410e2 in ACE_Guard<ACE_Thread_Mutex>::acquire
(this=0x7f9c85e16e90)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Guard_T.inl:9
#7 0x00007f9c9c541123 in ACE_Guard (this=0x7f9c85e16e90, l=...)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Guard_T.inl:35
#8 0x00007f9c9c2e1da6 in TTWork::GeneratorSelect::reselect
(this=0x1dc38f0, wi=0x7f9c78177200) at TTWork.cpp:873
#9 0x00007f9c9c2e1e92 in TTWork::WorkItemHandle::clearReadyMask
(this=0x7f9c78177200, mask=1, resel=true)
at TTWork.cpp:1061
#10 0x00007f9c9c2eaea2 in TTWork::NetServiceTCP::doTheWork
(this=0x7f9c78177200, workEV=...)
at TTWorkNetServiceTCP.cpp:278
#11 0x00007f9c9c2eb354 in TTWork::NetServiceTCP::doWork
(this=0x7f9c78177200, workEV=...)
at TTWorkNetServiceTCP.cpp:351
#12 0x00007f9c9c2dfccb in TTWork::Dispatcher::dispatch (this=0x13b5c60)
at TTWork.cpp:234
#13 0x00007f9c9c2e3a4f in TTWork::Dispatcher::dispatchGenerate
(this=0x13b5c60, maxWait=0x0, min=0x7f9c85e17260)
at TTWork.cpp:324
#14 0x00007f9c9c2e44fd in TTWork::DispatcherTask::runTask
(this=0x13b6ec0) at TTWork.cpp:1580
#15 0x00007f9c9c2e4fee in TTWork::Task::svc (this=0x13b6ec0) at
TTWork.cpp:50
#16 0x00007f9c9b865344 in ACE_Task_Base::svc_run (args=0x13b6ee8) at
Task.cpp:210
#17 0x00007f9c9b7dcb0f in ACE_Thread_Adapter::invoke_i (this=0x13b5b20)
at Thread_Adapter.cpp:150
#18 0x00007f9c9b7dcbb9 in ACE_Thread_Adapter::invoke (this=0x13b5b20) at
Thread_Adapter.cpp:93
#19 0x00007f9c9b78c0e3 in ace_thread_adapter (args=0x13b5b20) at
Base_Thread_Adapter.cpp:131
#20 0x00007f9c9b28073a in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#21 0x00007f9c9a64169d in clone () from /lib/libc.so.6
#22 0x0000000000000000 in ?? ()
Thread 2 (Thread 25519):
#0 __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007f9c9b282e79 in _L_lock_949 () from
/home/root/eglibc-2.10.2/build-tree/amd64-libc/nptl/libpthread.so.0
#2 0x00007f9c9b282c9b in __pthread_mutex_lock (mutex=0x1dc3960) at
pthread_mutex_lock.c:61
#3 0x00007f9c9c545021 in ACE_OS::mutex_lock (m=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/OS_NS_Thread.inl:1296
#4 0x00007f9c9c545061 in ACE_OS::thread_mutex_lock (m=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/OS_NS_Thread.inl:4443
#5 0x00007f9c9c54508f in ACE_Thread_Mutex::acquire (this=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Thread_Mutex.inl:57
#6 0x00007f9c9c5410e2 in ACE_Guard<ACE_Thread_Mutex>::acquire
(this=0x7f9c86617e90)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Guard_T.inl:9
#7 0x00007f9c9c541123 in ACE_Guard (this=0x7f9c86617e90, l=...)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Guard_T.inl:35
#8 0x00007f9c9c2e1da6 in TTWork::GeneratorSelect::reselect
(this=0x1dc38f0, wi=0x2ee6240) at TTWork.cpp:873
#9 0x00007f9c9c2e1e92 in TTWork::WorkItemHandle::clearReadyMask
(this=0x2ee6240, mask=1, resel=true)
at TTWork.cpp:1061
#10 0x00007f9c9c2eaea2 in TTWork::NetServiceTCP::doTheWork
(this=0x2ee6240, workEV=...)
at TTWorkNetServiceTCP.cpp:278
#11 0x00007f9c9c2eb354 in TTWork::NetServiceTCP::doWork (this=0x2ee6240,
workEV=...) at TTWorkNetServiceTCP.cpp:351
#12 0x00007f9c9c2dfccb in TTWork::Dispatcher::dispatch (this=0x13b5c60)
at TTWork.cpp:234
#13 0x00007f9c9c2e3a4f in TTWork::Dispatcher::dispatchGenerate
(this=0x13b5c60, maxWait=0x0, min=0x7f9c86618260)
at TTWork.cpp:324
#14 0x00007f9c9c2e44fd in TTWork::DispatcherTask::runTask
(this=0x13b6ec0) at TTWork.cpp:1580
#15 0x00007f9c9c2e4fee in TTWork::Task::svc (this=0x13b6ec0) at
TTWork.cpp:50
#16 0x00007f9c9b865344 in ACE_Task_Base::svc_run (args=0x13b6ee8) at
Task.cpp:210
#17 0x00007f9c9b7dcb0f in ACE_Thread_Adapter::invoke_i (this=0x1dc2cb0)
at Thread_Adapter.cpp:150
#18 0x00007f9c9b7dcbb9 in ACE_Thread_Adapter::invoke (this=0x1dc2cb0) at
Thread_Adapter.cpp:93
#19 0x00007f9c9b78c0e3 in ace_thread_adapter (args=0x1dc2cb0) at
Base_Thread_Adapter.cpp:131
#20 0x00007f9c9b28073a in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#21 0x00007f9c9a64169d in clone () from /lib/libc.so.6
#22 0x0000000000000000 in ?? ()
Thread 1 (Thread 25518):
#0 __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007f9c9b282e79 in _L_lock_949 () from
/home/root/eglibc-2.10.2/build-tree/amd64-libc/nptl/libpthread.so.0
#2 0x00007f9c9b282c9b in __pthread_mutex_lock (mutex=0x1dc3960) at
pthread_mutex_lock.c:61
#3 0x00007f9c9c545021 in ACE_OS::mutex_lock (m=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/OS_NS_Thread.inl:1296
#4 0x00007f9c9c545061 in ACE_OS::thread_mutex_lock (m=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/OS_NS_Thread.inl:4443
#5 0x00007f9c9c54508f in ACE_Thread_Mutex::acquire (this=0x1dc3960)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Thread_Mutex.inl:57
#6 0x00007f9c9c5410e2 in ACE_Guard<ACE_Thread_Mutex>::acquire
(this=0x7f9c86e18e90)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Guard_T.inl:9
#7 0x00007f9c9c541123 in ACE_Guard (this=0x7f9c86e18e90, l=...)
at /opt/ttdev/ACE/v5.4/x86_64.linux2.6-testing/ace/Guard_T.inl:35
#8 0x00007f9c9c2e1da6 in TTWork::GeneratorSelect::reselect
(this=0x1dc38f0, wi=0x7f9c78463100) at TTWork.cpp:873
#9 0x00007f9c9c2e1e92 in TTWork::WorkItemHandle::clearReadyMask
(this=0x7f9c78463100, mask=1, resel=true)
at TTWork.cpp:1061
#10 0x00007f9c9c2eaea2 in TTWork::NetServiceTCP::doTheWork
(this=0x7f9c78463100, workEV=...)
at TTWorkNetServiceTCP.cpp:278
#11 0x00007f9c9c2eb354 in TTWork::NetServiceTCP::doWork
(this=0x7f9c78463100, workEV=...)
at TTWorkNetServiceTCP.cpp:351
#12 0x00007f9c9c2dfccb in TTWork::Dispatcher::dispatch (this=0x13b5c60)
at TTWork.cpp:234
#13 0x00007f9c9c2e3a4f in TTWork::Dispatcher::dispatchGenerate
(this=0x13b5c60, maxWait=0x0, min=0x7f9c86e19260)
at TTWork.cpp:324
#14 0x00007f9c9c2e44fd in TTWork::DispatcherTask::runTask
(this=0x13b6ec0) at TTWork.cpp:1580
#15 0x00007f9c9c2e4fee in TTWork::Task::svc (this=0x13b6ec0) at
TTWork.cpp:50
#16 0x00007f9c9b865344 in ACE_Task_Base::svc_run (args=0x13b6ee8) at
Task.cpp:210
#17 0x00007f9c9b7dcb0f in ACE_Thread_Adapter::invoke_i (this=0x1dc2a60)
at Thread_Adapter.cpp:150
#18 0x00007f9c9b7dcbb9 in ACE_Thread_Adapter::invoke (this=0x1dc2a60) at
Thread_Adapter.cpp:93
#19 0x00007f9c9b78c0e3 in ace_thread_adapter (args=0x1dc2a60) at
Base_Thread_Adapter.cpp:131
#20 0x00007f9c9b28073a in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#21 0x00007f9c9a64169d in clone () from /lib/libc.so.6
#22 0x0000000000000000 in ?? ()
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DETAILS
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Note => markers in stack traces below for PC location
THREAD 4 -- hung in futex call getting internal __lock while holding
external mutex
--------------------------------------------------
Caller's view of the condition variable...
(gdb) p cv
$4 = (ACE_cond_t *) 0x1dc4500
(gdb) p *cv
$5 = {__data = {__lock = 2, __futex = 0, __total_seq = 0, __wakeup_seq =
0, __woken_seq = 0, __mutex = 0x0,
__nwaiters = 0, __broadcast_seq = 0}, __size = "\002", '\000'
<repeats 46 times>, __align = 2}
C code from glibc/nptl:
int
__pthread_cond_broadcast (cond)
pthread_cond_t *cond;
{
int pshared = (cond->__data.__mutex == (void *) ~0l)
? LLL_SHARED : LLL_PRIVATE;
/* Make sure we are alone. */
lll_lock (cond->__data.__lock, pshared);
/* Are there any waiters to be woken? */
if (cond->__data.__total_seq > cond->__data.__wakeup_seq)
{
/* Yes. Mark them all as woken. */
cond->__data.__wakeup_seq = cond->__data.__total_seq;
cond->__data.__woken_seq = cond->__data.__total_seq;
Lowest stack from gdb (I guess what was actually compiled is a hand
coded assembly version of above):
.globl __pthread_cond_broadcast
.type __pthread_cond_broadcast, @function
.align 16
__pthread_cond_broadcast:
/* Get internal lock. */
movl $1, %esi
xorl %eax, %eax
LOCK
#if cond_lock == 0
cmpxchgl %esi, (%rdi)
#else
cmpxchgl %esi, cond_lock(%rdi)
#endif
jnz 1f
2: addq $cond_futex, %rdi
movq total_seq-cond_futex(%rdi), %r9
cmpq wakeup_seq-cond_futex(%rdi), %r9
jna 4f
/* Cause all currently waiting threads to recognize they are
woken up. */
movq %r9, wakeup_seq-cond_futex(%rdi)
movq %r9, woken_seq-cond_futex(%rdi)
addq %r9, %r9
movl %r9d, (%rdi)
incl broadcast_seq-cond_futex(%rdi)
/* Get the address of the mutex used. */
movq dep_mutex-cond_futex(%rdi), %r8
/* Unlock. */
LOCK
decl cond_lock-cond_futex(%rdi)
jne 7f
8: cmpq $-1, %r8
je 9f
/* XXX: The kernel so far doesn't support requeue to PI futex. */
/* XXX: The kernel only supports FUTEX_CMP_REQUEUE to the same
type of futex (private resp. shared). */
testl $(PI_BIT | PS_BIT), MUTEX_KIND(%r8)
jne 9f
/* Wake up all threads. */
#ifdef __ASSUME_PRIVATE_FUTEX
movl $(FUTEX_CMP_REQUEUE|FUTEX_PRIVATE_FLAG), %esi
#else
movl %fs:PRIVATE_FUTEX, %esi
orl $FUTEX_CMP_REQUEUE, %esi
#endif
movl $SYS_futex, %eax
movl $1, %edx
movl $0x7fffffff, %r10d
syscall
/* For any kind of error, which mainly is EAGAIN, we try again
with WAKE. The general test also covers running on old
kernels. */
cmpq $-4095, %rax
jae 9f
10: xorl %eax, %eax
retq
.align 16
/* Unlock. */
4: LOCK
decl cond_lock-cond_futex(%rdi)
jne 5f
6: xorl %eax, %eax
retq
/* Initial locking failed. */
1:
#if cond_lock != 0
addq $cond_lock, %rdi
#endif
cmpq $-1, dep_mutex-cond_lock(%rdi)
movl $LLL_PRIVATE, %eax
movl $LLL_SHARED, %esi
cmovne %eax, %esi
=> callq __lll_lock_wait
#if cond_lock != 0
subq $cond_lock, %rdi
#endif
jmp 2b
..................................................
next stack down
..................................................
#ifdef NOT_IN_libc
.globl __lll_lock_wait
.type __lll_lock_wait,@function
.hidden __lll_lock_wait
.align 16
__lll_lock_wait:
cfi_startproc
pushq %r10
cfi_adjust_cfa_offset(8)
pushq %rdx
cfi_adjust_cfa_offset(8)
cfi_offset(%r10, -16)
cfi_offset(%rdx, -24)
xorq %r10, %r10 /* No timeout. */
movl $2, %edx
LOAD_FUTEX_WAIT (%esi)
cmpl %edx, %eax /* NB: %edx == 2 */
jne 2f
1: movl $SYS_futex, %eax
syscall
=> movl %edx, %eax
xchgl %eax, (%rdi) /* NB: lock is implied */
testl %eax, %eax
jnz 1b
OTHER THREADS -- waiting to get the external mutex
--------------------------------------------------
Caller's view of the mutex
(gdb) p m
$2 = (ACE_thread_mutex_t *) 0x1dc3960
(gdb) p *m
$3 = {__data = {__lock = 2, __count = 0, __owner = 25521, __nusers = 1,
__kind = 0, __spins = 0, __list = {
__prev = 0x0, __next = 0x0}},
Lower stack levels:
int
__pthread_mutex_lock (mutex)
pthread_mutex_t *mutex;
{
assert (sizeof (mutex->__size) >= sizeof (mutex->__data));
unsigned int type = PTHREAD_MUTEX_TYPE (mutex);
if (__builtin_expect (type & ~PTHREAD_MUTEX_KIND_MASK_NP, 0))
return __pthread_mutex_lock_full (mutex);
pid_t id = THREAD_GETMEM (THREAD_SELF, tid);
if (__builtin_expect (type, PTHREAD_MUTEX_TIMED_NP)
== PTHREAD_MUTEX_TIMED_NP)
{
simple:
/* Normal mutex. */
=> LLL_MUTEX_LOCK (mutex);
assert (mutex->__data.__owner == 0);
..................................................
next stack down
..................................................
#ifdef NOT_IN_libc
.globl __lll_lock_wait
.type __lll_lock_wait,@function
.hidden __lll_lock_wait
.align 16
__lll_lock_wait:
cfi_startproc
pushq %r10
cfi_adjust_cfa_offset(8)
pushq %rdx
cfi_adjust_cfa_offset(8)
cfi_offset(%r10, -16)
cfi_offset(%rdx, -24)
xorq %r10, %r10 /* No timeout. */
movl $2, %edx
LOAD_FUTEX_WAIT (%esi)
cmpl %edx, %eax /* NB: %edx == 2 */
jne 2f
1: movl $SYS_futex, %eax
syscall
=> movl %edx, %eax
xchgl %eax, (%rdi) /* NB: lock is implied */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists