netdev - Re: TCP: orphans broken by RFC 2525 #2.17

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1285534871.2357.19.camel@edumazet-laptop>
Date:	Sun, 26 Sep 2010 23:01:11 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Willy Tarreau <w@....eu>
Cc:	netdev@...r.kernel.org
Subject: Re: TCP: orphans broken by RFC 2525 #2.17

Le dimanche 26 septembre 2010 à 20:49 +0200, Willy Tarreau a écrit :
> On Sun, Sep 26, 2010 at 08:35:15PM +0200, Eric Dumazet wrote:
> > I was referring to this code. It works well for me.
> > 
> > shutdown(fd, SHUT_RDWR);
> > while (recv(fd, buff, sizeof(buff), 0) > 0)
> > 	;
> > close(fd);
> 
> Ah this one yes, but it's overkill. We're actively pumping data from the
> other side to drop it on the floor until it finally closes while we only
> need to know when it has ACKed the FIN. In practice, doing that on a POST
> request which causes a 302 or 401 will result in the whole amount of data
> being transferred twice. Not only this is bad for the bandwidth, this is
> also bad for the user, as we're causing him to experience a complete upload
> twice, just to be sure it has received the FIN, while it's pretty obvious
> that it's not necessary in 99.9% of the cases.
> 

I dont understand how recv() could transfert data twice.

Once you issued shutdown(RDWR), socket rx buffer is freezed and no more
incoming data should be accepted/queued.

You only read from the socket receive queue to null buffer, and in most
cases a single recv() call will be enough to drain the queue.


> Since this method is the least efficient one and clearly not acceptable
> for practical cases, I wanted to dig at the root, where the information
> is known. And the TCP recv code is precisely the place where we know
> exactly when it's safe to reset.
> 

And its safe to reset exactly when application just does close(), if
unread data was not drained. Not only its safe, but required. A new RFC
might be needed ?


> Also there's another issue in doing this. It requires polling of the
> receive side for all requests, which adds one epoll_ctl() syscall and
> one recv() call, which have a much noticeable negative performance
> impact at high rates (at 100000 connections per second, every syscall
> counts). For now I could very well consider that I do this only for
> POST requests, which currently are the ones exhibiting the issue the
> most, but since HTTP browsers will try to enable pipelining again
> soon, the problem will generalize to all types of requests. Hence my
> attempts to do it the optimal way.

This might be overkill but is a portable way of doing this, on all
versions of linux.

Sure, you could patch kernel to add a new system call

close_proper(fd);

As shutdown() only uses two bits, you can eventually add another bit to
flush receive queue as well (to avoid the copy of it)

Another question, is : why the server closes the socket, if you believe
more pipelining is coming soon ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html