linux-kernel - waitid() problem?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <fb6b09f20702050413o6123b27eof07e7501e8d4a7e5@mail.gmail.com>
Date:	Mon, 5 Feb 2007 07:13:26 -0500
From:	"Sue Alfano" <smalfano@...il.com>
To:	linux-kernel@...r.kernel.org
Subject: waitid() problem?

I didn't get any responses on this.  I thought maybe the subject line
was misleading, so I'm trying a different one.  If I'm using the wrong
mailing list, or if there's another site other than kernel.org that I
should be using to search through archives and FAQs, please let me
know.

I'm accessing the waitid function in the kernel using the _syscall4
macro  since it's not supported in uClibc.  waitid worked great, but
then I started having problems when more than one child needed to be
reaped.

Basically, the parent process registers a SIGCHLD signal handler via sigaction()
with the SA_RESTART option and then sits on a timed select().  All the signal
handler does is set a flag - the waitid processing takes place when the parent
drops out of the select because of the signal or a timeout.  When the waitid
succeeds, the pid is used to do some processing and then is used to call
waitpid to clean up the mess.  This is all in a while loop, so when the waitid
is called again, 0 is returned with si_pid = 0 indicating that there are no
more children to reap.  That's all great.  The problem occurs when there's 2
or more children to reap.  When this occurs, the 2nd waitid call returns -1
and si_pid = 0.  The waitid continues to fail in this manner until the parent
receives another SIGCHLD signal, then it works for the next zombie child in
the waiting line, but only one.  I can send SIGCHLD kills to the parent from
the console and eventually get everything cleaned up.  Sending the kill from
the parent isn't enough - I suppose it wants to be blocking on the select.

I tried registering the signal handler without SA_RESRART and also via signal(),
since there was a posting about SA_RESTART, but that made no difference.  I
also tried not having a signal handler at all, but then I couldn't reap any
children on the timeout.

So, are there known issues with waitid when there are multiple children piled
up waiting to be reaped?  If so, is there a patch or a workaround?

The product that I'm working on is using Linux version 2.6.12-2.0.0-258

Thanks for any help,
Sue
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/