lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 8 Jul 2008 02:02:35 -0400
From:	Bill Fink <billfink@...dspring.com>
To:	Evgeniy Polyakov <johnpol@....mipt.ru>
Cc:	David Miller <davem@...emloft.net>, aglo@...i.umich.edu,
	shemminger@...tta.com, netdev@...r.kernel.org, rees@...ch.edu,
	bfields@...ldses.org
Subject: Re: setsockopt()

On Tue, 8 Jul 2008, Evgeniy Polyakov wrote:

> On Mon, Jul 07, 2008 at 02:49:12PM -0700, David Miller (davem@...emloft.net) wrote:
> > There is no reason these days to ever explicitly set the socket
> > buffer sizes on TCP sockets under Linux.
> > 
> > If something is going wrong it's a bug and we should fix it.
> 
> Just for the reference: autosizing is (was?) not always working correctly
> for some workloads at least couple of years ago.
> For example I worked with small enough embedded systems with 16-32 MB
> of RAM where socket buffer size never grew up more than 200Kb (100mbit
> network), but workload was very bursty, so if remote system froze for
> several milliseconds (and sometimes upto couple of seconds), socket
> buffer was completely filled with new burst of data and either sending
> started to sleep or returned EAGAIN, which resulted in semi-realtime
> data to be dropped.
> 
> Setting buffer size explicitely to large enough value like 8Mb fixed
> this burst issues. Another fix was to allocate data each time it becomes
> ready and copy portion to this buffer, but allocation was quite slow,
> which led to unneded latencies, which again could lead to data loss.

I admittedly haven't tested on the latest greatest kernel versions,
but the testing I have done on large RTT 10-GigE networks, if I want
to get the ultimate TCP performance I still need to explicitly set
the socket buffer sizes, although I give kudos to the autotuning which
does remarkably well.  Here's a comparison across an ~72 ms RTT
10-GigE path (sender is 2.6.20.7 and receiver is 2.6.22.9).

Autotuning (30-second TCP test with 1-second interval reports):

# nuttcp -T30 -i1 192.168.21.82
nuttcp-6.0.1: Using beta version: retrans interface/output subject to change
              (to suppress this message use "-f-beta")

    7.2500 MB /   1.01 sec =   60.4251 Mbps     0 retrans
   43.6875 MB /   1.00 sec =  366.4509 Mbps     0 retrans
  169.4375 MB /   1.00 sec = 1421.2296 Mbps     0 retrans
  475.3125 MB /   1.00 sec = 3986.8873 Mbps     0 retrans
  827.6250 MB /   1.00 sec = 6942.0247 Mbps     0 retrans
  877.6250 MB /   1.00 sec = 7361.2792 Mbps     0 retrans
  878.1250 MB /   1.00 sec = 7365.7750 Mbps     0 retrans
  878.4375 MB /   1.00 sec = 7368.2710 Mbps     0 retrans
  878.3750 MB /   1.00 sec = 7367.7173 Mbps     0 retrans
  878.7500 MB /   1.00 sec = 7370.6932 Mbps     0 retrans
  878.8125 MB /   1.00 sec = 7371.6818 Mbps     0 retrans
  879.1875 MB /   1.00 sec = 7374.5546 Mbps     0 retrans
  878.6875 MB /   1.00 sec = 7370.3754 Mbps     0 retrans
  878.2500 MB /   1.00 sec = 7366.3742 Mbps     0 retrans
  878.6875 MB /   1.00 sec = 7370.6407 Mbps     0 retrans
  878.8125 MB /   1.00 sec = 7371.4239 Mbps     0 retrans
  878.5000 MB /   1.00 sec = 7368.8174 Mbps     0 retrans
  879.0625 MB /   1.00 sec = 7373.4766 Mbps     0 retrans
  878.8125 MB /   1.00 sec = 7371.4386 Mbps     0 retrans
  878.3125 MB /   1.00 sec = 7367.2152 Mbps     0 retrans
  878.8125 MB /   1.00 sec = 7371.3723 Mbps     0 retrans
  878.6250 MB /   1.00 sec = 7369.8585 Mbps     0 retrans
  878.8125 MB /   1.00 sec = 7371.4460 Mbps     0 retrans
  875.5000 MB /   1.00 sec = 7373.0401 Mbps     0 retrans
  878.8125 MB /   1.00 sec = 7371.5123 Mbps     0 retrans
  878.3750 MB /   1.00 sec = 7367.5037 Mbps     0 retrans
  878.5000 MB /   1.00 sec = 7368.9647 Mbps     0 retrans
  879.4375 MB /   1.00 sec = 7376.6073 Mbps     0 retrans
  878.8750 MB /   1.00 sec = 7371.8891 Mbps     0 retrans
  878.4375 MB /   1.00 sec = 7368.3521 Mbps     0 retrans

23488.6875 MB /  30.10 sec = 6547.0228 Mbps 81 %TX 49 %RX 0 retrans

Same test but with explicitly specified 100 MB socket buffer:

# nuttcp -T30 -i1 -w100m 192.168.21.82
nuttcp-6.0.1: Using beta version: retrans interface/output subject to change
              (to suppress this message use "-f-beta")

    7.1250 MB /   1.01 sec =   59.4601 Mbps     0 retrans
  120.3750 MB /   1.00 sec = 1009.7464 Mbps     0 retrans
  859.4375 MB /   1.00 sec = 7208.5832 Mbps     0 retrans
  939.3125 MB /   1.00 sec = 7878.9965 Mbps     0 retrans
  935.5000 MB /   1.00 sec = 7847.0249 Mbps     0 retrans
  934.8125 MB /   1.00 sec = 7841.1248 Mbps     0 retrans
  933.8125 MB /   1.00 sec = 7832.7291 Mbps     0 retrans
  933.1875 MB /   1.00 sec = 7827.5727 Mbps     0 retrans
  932.1875 MB /   1.00 sec = 7819.1300 Mbps     0 retrans
  933.1250 MB /   1.00 sec = 7826.8059 Mbps     0 retrans
  933.3125 MB /   1.00 sec = 7828.6760 Mbps     0 retrans
  933.0000 MB /   1.00 sec = 7825.9608 Mbps     0 retrans
  932.6875 MB /   1.00 sec = 7823.1753 Mbps     0 retrans
  932.0625 MB /   1.00 sec = 7818.0268 Mbps     0 retrans
  931.7500 MB /   1.00 sec = 7815.6088 Mbps     0 retrans
  931.0625 MB /   1.00 sec = 7809.7717 Mbps     0 retrans
  931.5000 MB /   1.00 sec = 7813.3711 Mbps     0 retrans
  931.8750 MB /   1.00 sec = 7816.4931 Mbps     0 retrans
  932.0625 MB /   1.00 sec = 7817.8157 Mbps     0 retrans
  931.5000 MB /   1.00 sec = 7813.4180 Mbps     0 retrans
  931.6250 MB /   1.00 sec = 7814.5134 Mbps     0 retrans
  931.6250 MB /   1.00 sec = 7814.4821 Mbps     0 retrans
  931.3125 MB /   1.00 sec = 7811.7124 Mbps     0 retrans
  930.8750 MB /   1.00 sec = 7808.0818 Mbps     0 retrans
  931.0625 MB /   1.00 sec = 7809.6233 Mbps     0 retrans
  930.6875 MB /   1.00 sec = 7806.6964 Mbps     0 retrans
  931.2500 MB /   1.00 sec = 7811.0164 Mbps     0 retrans
  931.3125 MB /   1.00 sec = 7811.9077 Mbps     0 retrans
  931.3750 MB /   1.00 sec = 7812.3617 Mbps     0 retrans
  931.4375 MB /   1.00 sec = 7812.6750 Mbps     0 retrans

26162.6875 MB /  30.15 sec = 7279.7648 Mbps 93 %TX 54 %RX 0 retrans

As you can see, the autotuned case maxed out at about 7.37 Gbps,
whereas by explicitly specifying a 100 MB socket buffer it was
possible to achieve a somewhat higher rate of about 7.81 Gbps.
Admittedly the autotuning did great, with a difference of only
about 6 %, but if you want to squeeze the last drop of performance
out of your network, explicitly setting the socket buffer sizes
can still be helpful in certain situations (perhaps newer kernels
have reduced the gap even more).

But I would definitely agree with the general recommendation to
just take advantage of the excellent Linux TCP autotuning for most
common scenarios.

						-Bill
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ