lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <8ef39357-06c9-0959-71da-fba80e1fa934@yeslogic.com>
Date:   Sun, 23 Jul 2017 12:14:04 +1000
From:   Michael Day <mikeday@...logic.com>
To:     linux-kernel@...r.kernel.org
Subject: signal not interrupting futex

We have hit an apparent kernel bug where a signal is not interrupting a 
futex, leading to a deadlock in our code. Here is the relevant strace 
output just before it blocks (complete strace log is attached):

14069 set_robust_list(0x7f7b3e7ee9e0, 24 <unfinished ...>
14061 futex(0x7f7b46721fd8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
14069 <... set_robust_list resumed> )   = 0
14069 futex(0x7f7b46721fd8, FUTEX_WAKE_PRIVATE, 1) = 1
14061 <... futex resumed> )             = 0
14061 futex(0x1585ea0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
14069 tgkill(14061, 14061, SIGPWR)      = 0
14069 futex(0x1586280, FUTEX_WAIT_PRIVATE, 0, NULL

Thread '69 sends SIGPWR to thread '61, but it is never delivered and we 
have not been able to figure out why.

Background information: this deadlock is experienced by our customer 
running Prince on CentOS 7. The bug happens every time on their system, 
but we have not been able to reproduce it on ours yet. They have tried 
two different kernel versions:

3.10.0-327.28.2.el7.x86_64
3.10.0-514.26.2.el7.x86_64

Over the past two years we have heard similar deadlock issues from other 
customers, always on CentOS and typically involving PHP, although these 
are of course very popular systems.

This issue appears to be unrelated to the earlier futex bug affecting 
Haswell processors, but could there be another bug along these lines 
affecting futexes or signal delivery?

What can we do to help debug this issue?

Best regards,

Michael

-- 
Prince: Print with CSS!
http://www.princexml.com

View attachment "prince.strace" of type "text/plain" (52230 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ