lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 28 Dec 2011 19:55:55 +0100
From:	Denys Vlasenko <vda.linux@...glemail.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Denys Vlasenko <dvlasenk@...hat.com>,
	Oleg Nesterov <oleg@...hat.com>, linux-kernel@...r.kernel.org,
	Łukasz Michalik <lmi@....uni.wroc.pl>,
	"Dmitry V. Levin" <ldv@...linux.org>
Subject: Possible bug introduced in commit 9b84cca

Hi Tejun, Oleg,

Apologies if you are already informed about this bug
by people who originally discovered it.

Looks like after commit 9b84cca, waitpid under strace
sometimes returns bogus ECHILD while child does exist.

I did not yet confirm that the bug appeared exactly
at this commit - Łukasz says that.

I confirmed that bug exists on kernels 3.1.6 (in Fedora)
and 3.1.0-rc4 (vanilla).

We have a testcase which spawns N threads, each of them
performs an infinite loop "fork, exit in child, waitpid
in parent for the child". When straced, sometimes waitpid
returns ECHILD. In fact, there is no need to run many threads -
I just saw it happening with single thread on 4-CPU machine
when I ran "strace -otestcase1.LOG -f ./testcase1 1".
This machine uses 3.1.0-rc4.

Please find testcase attached.

Also please find testcase1.LOG attached.
The key part is here:

931   clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xf763dbd8) = 1048
1048  exit_group(42)                    = ?
931   waitpid(1048,  <unfinished ...>
1048  +++ exited with 42 +++
931   <... waitpid resumed> 0xf763d3a0, 0) = -1 ECHILD (No child processes)

To complicate matters, this is observed only under development
version of strace. Old (released) versions of strace do not
let ptraced processes to die - they detach from them when
they think they are going to die (such as when they enter _exit()
or receive a "deadly" signal). Which is a aesthetically horrible and
logically buggy (racy) hack, so we are removing it from strace.
Łukasz says that old strace versions (ones which still use the hack)
don't trigger the bug.

For testing, I will send you strace source tree and pre-compiled
strace binary in a separate email. Alternatively, pull latest
strace git and "autoreconf -fvi && ./configure && make" it.

-- 
vda

View attachment "testcase1.c" of type "text/x-csrc" (948 bytes)

Download attachment "testcase1.LOG.bz2" of type "application/x-bzip2" (3172 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ