[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1199480498017@dmwebmail.japan.chezphil.org>
Date: Fri, 04 Jan 2008 21:01:38 +0000
From: "Phil Endecott" <phil_wueww_endecott@...zphil.org>
To: <linux-kernel@...r.kernel.org>
Subject: strace, accept(), ERESTARTSYS and EINTR
Dear Experts,
I have some code like this:
struct sockaddr_in client_addr;
socklen_t client_size=sizeof(client_addr);
int connfd = accept(fd,(struct sockaddr*)(&client_addr),&client_size);
if (connfd==-1) {
// [1]
.....report error and terminate......
}
int rc = fcntl(connfd,F_SETFD,FD_CLOEXEC);
I believe that I should be checking for errno==EINTR at [1] and
retrying the accept(); currently I'm not doing so.
When I strace -f this application - which is multi-threaded - I see this:
[pid 11079] accept(3, <unfinished ...>
[pid 11093] restart_syscall(<... resuming interrupted call ...>
<unfinished ...>
[pid 8799] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[pid 11079] <... accept resumed> 0xbfdaa73c, [16]) = ? ERESTARTSYS (To
be restarted)
[pid 8799] read(6, <unfinished ...>
[pid 11079] fcntl64(-512, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
This shows accept() "returning" ERESTARTSYS; as I understand it this is
an artefact of how strace works, and my code will not have seen accept
return at all at that point. However, the strace output does not show
any other return from the call to accept() before reporting that
thread's call to fcntl(). And the first parameter to fcntl, -512, is
the return value from accept() which should be -1 or >0. What is going
on here???
Google found a couple of related reports:
http://lkml.org/lkml/2001/11/22/65 - Phil Howard reports getting
ERESTARTSYS returned from accept(), not only in the strace output, and
fixed his problem by treating it like EINTR. He looked at errno if
accept() returned <0, not ==-1.
http://lkml.org/lkml/2005/9/20/135 - Peter Duellings reports seeing
accept() return -512 with errno==0.
Some more background: I started stracing the program because it was
already misbehaving. It had created a larger than usual number of
threads (30?). It's quite possible that some process resource limit
had been reached; could this have confused the glibc syscall wrapper,
causing it to return the mysterious -512? Could it have confused the
kernel into returng ERESTARTSYS instead of e.g. E-too-many-sockets-open?
It's also possible that some random memory corruption had occurred.
valgrind is on my to-do list. But even if this had occurred, I would
expect to see strace reporting a legitimate return from accept() before
the call to fcntl(); there was no sign of any global resource limit
that might affect strace occurring.
This is a Debian system running kernel 2.6.21-1 i686, glibc 2.7, and
strace 4.5.14.
Many thanks for any suggestions.
Phil.
(You're welcome to Cc: me in any replies.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists