[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <478E5E27.3050307@psc.edu>
Date: Wed, 16 Jan 2008 14:42:31 -0500
From: John Heffner <jheffner@....edu>
To: Ritesh Kumar <ritesh@...unc.edu>
CC: Bill Fink <billfink@...dspring.com>, netdev@...r.kernel.org
Subject: Re: SO_RCVBUF doesn't change receiver advertised window
Ritesh Kumar wrote:
> On 1/16/08, Bill Fink <billfink@...dspring.com> wrote:
>> On Tue, 15 Jan 2008, Ritesh Kumar wrote:
>>
>>> Hi,
>>> I am using linux 2.6.20 and am trying to limit the receiver window
>>> size for a TCP connection. However, it seems that auto tuning is not
>>> turning itself off even after I use the syscall
>>>
>>> rwin=65536
>>> setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &rwin, sizeof(rwin));
>>>
>>> and verify using
>>>
>>> getsockopt(sock, SOL_SOCKET, SO_RCVBUF, &rwin, &rwin_size);
>>>
>>> that RCVBUF indeed is getting set (the value returned from getsockopt
>>> is double that, 131072).
>> Linux doubles what you requested, and then uses (by default) 1/4
>> of the socket space for overhead, so you effectively get 1.5 times
>> what you requested as an actual advertised receiver window, which
>> means since you specified 64 KB, you actually get 96 KB.
>>
>>> The above calls are made before connect() on the client side and
>>> before bind(), accept() on the server side. Bulk data is being sent
>>> from the client to the server. The client and the server machines also
>>> have tcp_moderate_rcvbuf set to 0 (though I don't think that's really
>>> needed; setting a value to SO_RCVBUF should automatically turnoff auto
>>> tuning.).
>>>
>>> However the tcp trace shows the SYN, SYN/ACK and the first few packets as:
>>> 14:34:18.831703 IP 192.168.1.153.45038 > 192.168.2.204.9999: S
>>> 3947298186:3947298186(0) win 5840 <mss 1460,sackOK,timestamp 2842625
>>> 0,nop,wscale 5>
>>> 14:34:18.836000 IP 192.168.2.204.9999 > 192.168.1.153.45038: S
>>> 3955381015:3955381015(0) ack 3947298187 win 5792 <mss
>>> 1460,sackOK,timestamp 2843649 2842625,nop,wscale 2>
>>> 14:34:18.837654 IP 192.168.1.153.45038 > 192.168.2.204.9999: . ack 1
>>> win 183 <nop,nop,timestamp 2842634 2843649>
>>> 14:34:18.837849 IP 192.168.1.153.45038 > 192.168.2.204.9999: .
>>> 1:1449(1448) ack 1 win 183 <nop,nop,timestamp 2842634 2843649>
>>> 14:34:18.837851 IP 192.168.1.153.45038 > 192.168.2.204.9999: P
>>> 1449:1461(12) ack 1 win 183 <nop,nop,timestamp 2842634 2843649>
>>> 14:34:18.839001 IP 192.168.2.204.9999 > 192.168.1.153.45038: . ack
>>> 1449 win 2172 <nop,nop,timestamp 2843652 2842634>
>>> 14:34:18.839011 IP 192.168.2.204.9999 > 192.168.1.153.45038: . ack
>>> 1461 win 2172 <nop,nop,timestamp 2843652 2842634>
>>> 14:34:18.840875 IP 192.168.1.153.45038 > 192.168.2.204.9999: .
>>> 1461:2909(1448) ack 1 win 183 <nop,nop,timestamp 2842637 2843652>
>>> 14:34:18.840997 IP 192.168.1.153.45038 > 192.168.2.204.9999: .
>>> 2909:4357(1448) ack 1 win 183 <nop,nop,timestamp 2842637 2843652>
>>> 14:34:18.841120 IP 192.168.1.153.45038 > 192.168.2.204.9999: .
>>> 4357:5805(1448) ack 1 win 183 <nop,nop,timestamp 2842637 2843652>
>>> 14:34:18.841244 IP 192.168.1.153.45038 > 192.168.2.204.9999: .
>>> 5805:7253(1448) ack 1 win 183 <nop,nop,timestamp 2842637 2843652>
>>> 14:34:18.841388 IP 192.168.2.204.9999 > 192.168.1.153.45038: . ack
>>> 2909 win 2896 <nop,nop,timestamp 2843655 2842637>
>>> 14:34:18.841399 IP 192.168.2.204.9999 > 192.168.1.153.45038: . ack
>>> 4357 win 3620 <nop,nop,timestamp 2843655 2842637>
>>> 14:34:18.841413 IP 192.168.2.204.9999 > 192.168.1.153.45038: . ack
>>> 5805 win 4344 <nop,nop,timestamp 2843655 2842637>
>>>
>>> As you can see, the syn and syn ack show rcv windows to be 5840 and
>>> 5792 and it automatically increases for the receiver to values 2172
>>> till 4344 and more in the later part of the trace till 24214.
>> Since the window scale was 2, the final advertised receiver window
>> you indicate of 24214 gives 2^2*24214 or right around 96 KB, which
>> is what is expected given the way Linux works.
>>
>> -Bill
>
> Thanks for the explanation Bill. That surely clears part of my doubt.
> However, why doesn't linux advertise 24214 in the SYN packets? I was
> hoping that the moment I setup a RCVBUF, linux would pre-allocate
> buffers and drop any autotuning. Doesn't the above behavior count as
> autotuning?
Linux also starts all connections with a small advertised window. It
only grows the window after observing the ratio of data to overhead in
received packets. If it receives only small packets from the sender
with a high overhead ratio, it will only open the window just far enough
that it doesn't overflow the receive buffer. This algorithm (look for
rcv_ssthresh in the code) controls the advertised window given a receive
buffer size. This is separate from autotuning, which adjusts the buffer
size. You're correct that autotuning is disabled when SO_RCVBUF is set,
but the "receive slow-start" is always used.
-John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists