[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20080708020235.388a7bd5.billfink@mindspring.com>
Date: Tue, 8 Jul 2008 02:02:35 -0400
From: Bill Fink <billfink@...dspring.com>
To: Evgeniy Polyakov <johnpol@....mipt.ru>
Cc: David Miller <davem@...emloft.net>, aglo@...i.umich.edu,
shemminger@...tta.com, netdev@...r.kernel.org, rees@...ch.edu,
bfields@...ldses.org
Subject: Re: setsockopt()
On Tue, 8 Jul 2008, Evgeniy Polyakov wrote:
> On Mon, Jul 07, 2008 at 02:49:12PM -0700, David Miller (davem@...emloft.net) wrote:
> > There is no reason these days to ever explicitly set the socket
> > buffer sizes on TCP sockets under Linux.
> >
> > If something is going wrong it's a bug and we should fix it.
>
> Just for the reference: autosizing is (was?) not always working correctly
> for some workloads at least couple of years ago.
> For example I worked with small enough embedded systems with 16-32 MB
> of RAM where socket buffer size never grew up more than 200Kb (100mbit
> network), but workload was very bursty, so if remote system froze for
> several milliseconds (and sometimes upto couple of seconds), socket
> buffer was completely filled with new burst of data and either sending
> started to sleep or returned EAGAIN, which resulted in semi-realtime
> data to be dropped.
>
> Setting buffer size explicitely to large enough value like 8Mb fixed
> this burst issues. Another fix was to allocate data each time it becomes
> ready and copy portion to this buffer, but allocation was quite slow,
> which led to unneded latencies, which again could lead to data loss.
I admittedly haven't tested on the latest greatest kernel versions,
but the testing I have done on large RTT 10-GigE networks, if I want
to get the ultimate TCP performance I still need to explicitly set
the socket buffer sizes, although I give kudos to the autotuning which
does remarkably well. Here's a comparison across an ~72 ms RTT
10-GigE path (sender is 2.6.20.7 and receiver is 2.6.22.9).
Autotuning (30-second TCP test with 1-second interval reports):
# nuttcp -T30 -i1 192.168.21.82
nuttcp-6.0.1: Using beta version: retrans interface/output subject to change
(to suppress this message use "-f-beta")
7.2500 MB / 1.01 sec = 60.4251 Mbps 0 retrans
43.6875 MB / 1.00 sec = 366.4509 Mbps 0 retrans
169.4375 MB / 1.00 sec = 1421.2296 Mbps 0 retrans
475.3125 MB / 1.00 sec = 3986.8873 Mbps 0 retrans
827.6250 MB / 1.00 sec = 6942.0247 Mbps 0 retrans
877.6250 MB / 1.00 sec = 7361.2792 Mbps 0 retrans
878.1250 MB / 1.00 sec = 7365.7750 Mbps 0 retrans
878.4375 MB / 1.00 sec = 7368.2710 Mbps 0 retrans
878.3750 MB / 1.00 sec = 7367.7173 Mbps 0 retrans
878.7500 MB / 1.00 sec = 7370.6932 Mbps 0 retrans
878.8125 MB / 1.00 sec = 7371.6818 Mbps 0 retrans
879.1875 MB / 1.00 sec = 7374.5546 Mbps 0 retrans
878.6875 MB / 1.00 sec = 7370.3754 Mbps 0 retrans
878.2500 MB / 1.00 sec = 7366.3742 Mbps 0 retrans
878.6875 MB / 1.00 sec = 7370.6407 Mbps 0 retrans
878.8125 MB / 1.00 sec = 7371.4239 Mbps 0 retrans
878.5000 MB / 1.00 sec = 7368.8174 Mbps 0 retrans
879.0625 MB / 1.00 sec = 7373.4766 Mbps 0 retrans
878.8125 MB / 1.00 sec = 7371.4386 Mbps 0 retrans
878.3125 MB / 1.00 sec = 7367.2152 Mbps 0 retrans
878.8125 MB / 1.00 sec = 7371.3723 Mbps 0 retrans
878.6250 MB / 1.00 sec = 7369.8585 Mbps 0 retrans
878.8125 MB / 1.00 sec = 7371.4460 Mbps 0 retrans
875.5000 MB / 1.00 sec = 7373.0401 Mbps 0 retrans
878.8125 MB / 1.00 sec = 7371.5123 Mbps 0 retrans
878.3750 MB / 1.00 sec = 7367.5037 Mbps 0 retrans
878.5000 MB / 1.00 sec = 7368.9647 Mbps 0 retrans
879.4375 MB / 1.00 sec = 7376.6073 Mbps 0 retrans
878.8750 MB / 1.00 sec = 7371.8891 Mbps 0 retrans
878.4375 MB / 1.00 sec = 7368.3521 Mbps 0 retrans
23488.6875 MB / 30.10 sec = 6547.0228 Mbps 81 %TX 49 %RX 0 retrans
Same test but with explicitly specified 100 MB socket buffer:
# nuttcp -T30 -i1 -w100m 192.168.21.82
nuttcp-6.0.1: Using beta version: retrans interface/output subject to change
(to suppress this message use "-f-beta")
7.1250 MB / 1.01 sec = 59.4601 Mbps 0 retrans
120.3750 MB / 1.00 sec = 1009.7464 Mbps 0 retrans
859.4375 MB / 1.00 sec = 7208.5832 Mbps 0 retrans
939.3125 MB / 1.00 sec = 7878.9965 Mbps 0 retrans
935.5000 MB / 1.00 sec = 7847.0249 Mbps 0 retrans
934.8125 MB / 1.00 sec = 7841.1248 Mbps 0 retrans
933.8125 MB / 1.00 sec = 7832.7291 Mbps 0 retrans
933.1875 MB / 1.00 sec = 7827.5727 Mbps 0 retrans
932.1875 MB / 1.00 sec = 7819.1300 Mbps 0 retrans
933.1250 MB / 1.00 sec = 7826.8059 Mbps 0 retrans
933.3125 MB / 1.00 sec = 7828.6760 Mbps 0 retrans
933.0000 MB / 1.00 sec = 7825.9608 Mbps 0 retrans
932.6875 MB / 1.00 sec = 7823.1753 Mbps 0 retrans
932.0625 MB / 1.00 sec = 7818.0268 Mbps 0 retrans
931.7500 MB / 1.00 sec = 7815.6088 Mbps 0 retrans
931.0625 MB / 1.00 sec = 7809.7717 Mbps 0 retrans
931.5000 MB / 1.00 sec = 7813.3711 Mbps 0 retrans
931.8750 MB / 1.00 sec = 7816.4931 Mbps 0 retrans
932.0625 MB / 1.00 sec = 7817.8157 Mbps 0 retrans
931.5000 MB / 1.00 sec = 7813.4180 Mbps 0 retrans
931.6250 MB / 1.00 sec = 7814.5134 Mbps 0 retrans
931.6250 MB / 1.00 sec = 7814.4821 Mbps 0 retrans
931.3125 MB / 1.00 sec = 7811.7124 Mbps 0 retrans
930.8750 MB / 1.00 sec = 7808.0818 Mbps 0 retrans
931.0625 MB / 1.00 sec = 7809.6233 Mbps 0 retrans
930.6875 MB / 1.00 sec = 7806.6964 Mbps 0 retrans
931.2500 MB / 1.00 sec = 7811.0164 Mbps 0 retrans
931.3125 MB / 1.00 sec = 7811.9077 Mbps 0 retrans
931.3750 MB / 1.00 sec = 7812.3617 Mbps 0 retrans
931.4375 MB / 1.00 sec = 7812.6750 Mbps 0 retrans
26162.6875 MB / 30.15 sec = 7279.7648 Mbps 93 %TX 54 %RX 0 retrans
As you can see, the autotuned case maxed out at about 7.37 Gbps,
whereas by explicitly specifying a 100 MB socket buffer it was
possible to achieve a somewhat higher rate of about 7.81 Gbps.
Admittedly the autotuning did great, with a difference of only
about 6 %, but if you want to squeeze the last drop of performance
out of your network, explicitly setting the socket buffer sizes
can still be helpful in certain situations (perhaps newer kernels
have reduced the gap even more).
But I would definitely agree with the general recommendation to
just take advantage of the excellent Linux TCP autotuning for most
common scenarios.
-Bill
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists