linux-kernel - Re: recv() hangs until SIGCHLD ?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081010211700.58e953a2@speedy>
Date:	Fri, 10 Oct 2008 21:17:00 +0200
From:	Stephen Hemminger <shemminger@...tta.com>
To:	Nicolas Cannasse <ncannasse@...ion-twin.com>
Cc:	linux-net@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: recv() hangs until SIGCHLD ?

On Fri, 10 Oct 2008 15:30:01 +0200
Nicolas Cannasse <ncannasse@...ion-twin.com> wrote:

> Hi,
> 
> We've been tracking a bug in our server application for some time now, 
> and now that we could isolate it we're stuck without a meaningful 
> explanation. Hope somehow would be able to give use some answers.
> 
> We run a multithread application which is using pthreads and sockets. A 
> thread uses accept() then dispatch the socket to one of the workers 
> threads that process it. Sockets are then not used simultaneously by 
> several threads.
> 
> In some rare cases, one (or several) threads are hanging in recv(). Both 
> lsof and ls /proc/<pid>/fd show that the socket used is in ESTABLISHED 
> mode but when checking on the host on which it's connected (a mysql DB) 
> we can't find the corresponding client socket (as it's been closed 
> already on the other side).
> 
> We are using the Boehm GC which uses the signals SIGXCPU and SIGPWR to 
> pause+restart the threads when running a GC cycle. We are correctly 
> handling EINTR in send() and recv() by restarting the call in case they 
> get interrupted this way.
> 
> However, when attaching GDB to our locked thread it seems that even when 
> the GC runs, recv() does not exit (the breakpoint after it is not 
> reached). If we send SIGCHLD to the hanging thread with GDB, recv() does 
> exit and the thread is correctly unlocked. If we don't, it will hang 
> forever.
> 
> Additional details : recv() is using MSG_NOSIGNAL and we have enabled 
> TCP_NODELAY on the socket by using setsockopt. Some other 
> not-multithreaded apps are using the same Databases and this behavior 
> does not occur for them.
> 
> Any idea how we can stop this from happening or what additional things 
> we can check to get more informations on what's occurring ?
> 
> Thanks a lot,
> Nicolas

Look at Receive queue length with ss or netstat for the hung thread. It will
show if there is anything that thread could read.

If there is data and the thread didn't wake up then that is a libc or kernel problem;
but if there is no data, then look for cases where earlier interrupted io actually
consumed the data already or blame the sending process not the receiver.
Also are the sockets blocking or non-blocking?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/