[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100412080633.GA27418@bombadil.infradead.org>
Date: Mon, 12 Apr 2010 04:06:33 -0400
From: Thomas Graf <tgraf@...radead.org>
To: netdev@...r.kernel.org
Subject: [RFC] random SYN drops causing connect() delays
Hello,
I have been tracking down an issue commonly referred to as the 3-sec
connect() delay. It exists since recent 2.6.x kernels and has never
been fixed even though it disappeared in recent releases unless
sched_child_runs_first is set to 1 again.
What happens is that if a client attemps to open many connections to
a socket with only minimal delay inbetween attemps some SYNs are
randomly dropped on the server side causing the client to resend after
the 3 sec TCP timeout and thus causing connect()s to be randomly delayed.
Steps to reproduce:
1. Compile reproducer attached below
2. run ./test_delay 127.0.0.1 22 10000 0 > log
3. awk -F: '{if ($2>2990) print $1 $2;}' log
4. all listed connection attemps will have been delayed for >3s
Facts:
- Issue can be reproduced over loopback or real networks.
- Enabling SO_LINGER on the client side will make the issue disappear!!
- While the issue is appearing, the acceptq seems to be overflowing. Both
LISTENOVERFLOWS and LISTENDROPS are increasing although not by the exact
number of delay occurences. inetdiag reports sk_max_ack_backlog to be 0
therefore one possibility that comes to mind is that sk_ack_backlog
underflows due to a race.
- The issue disappeared in recent kernels, I bisected it down to the following
commit:
commit 2bba22c50b06abe9fd0d23933b1e64d35b419262
Author: Mike Galbraith <efault@....de>
Date: Wed Sep 9 15:41:37 2009 +0200
sched: Turn off child_runs_first
Set child_runs_first default to off.
Setting kernel.sched_child_runs_first=1 makes the isssue reappear in recent
kernels. This hardens the theory of a race condition.
- It looks like that the issue can only be reproduced if the server
socket sends out data immediately after the connection has been established
but I cannot proof this theory.
I will continue to look into the sk_ack_backlog underflow theory but would
appreciate any comments or theories.
Thanks,
Reproducer:
#include <sys/socket.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <time.h>
#include <sys/time.h>
int main(int argc, char *argv[])
{
int sock,i;
struct timeval tim;
double start,end;
struct hostent *host;
struct sockaddr_in server_addr, local;
socklen_t len = sizeof(local);
char* hostname;
int port, count, delay;
if( argc < 3 ){
printf("Usage:\n\t%s host port [count=1000] [delay=0]\n",argv[0]);
return 1;
}
hostname = argv[1];
port = atoi(argv[2]);
if( argc > 3 )
count = atoi(argv[3]);
else
count = 1000;
if( argc > 4 )
delay = atoi(argv[4]);
else
delay = 0;
host = gethostbyname(hostname);
server_addr.sin_family = AF_INET;
server_addr.sin_port = htons(port);
server_addr.sin_addr = *((struct in_addr *)host->h_addr);
bzero(&(server_addr.sin_zero),8);
for(i=0; i< count; i=i+1){
gettimeofday(&tim, NULL);
start=tim.tv_sec*1000+(tim.tv_usec/1000);
if ((sock = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
perror("Socket");
exit(1);
}
if (connect(sock, (struct sockaddr *)&server_addr,
sizeof(struct sockaddr)) == -1)
{
perror("Connect");
exit(1);
}
getsockname(sock, (struct sockaddr *) &local, &len);
close(sock);
gettimeofday(&tim, NULL);
end=tim.tv_sec*1000+(tim.tv_usec/1000);
printf("[%d] %u-> Time to open socket (clock): %d\n",
i, ntohs(local.sin_port), (int)(end - start));
usleep(delay*1000);
}
/*
printf("Time to open socket (ms): %d\n", ((end - start)*1000)/CLOCKS_PER_SEC);
*/
return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists