netdev - Re: Autotuning and send buffer size

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 11 Jul 2008 17:01:09 -0400
From:	Bill Fink <billfink@...dspring.com>
To:	Rick Jones <rick.jones2@...com>
Cc:	Jim Rees <rees@...ch.edu>, netdev@...r.kernel.org
Subject: Re: Autotuning and send buffer size

On Fri, 11 Jul 2008, Rick Jones wrote:

> > I don't undestand how a "too big" sender buffer can hurt performance.  I
> > have not measured what size the sender's buffer is in the autotuning case.
> 
> In broad handwaving terms, TCP will have no more data outstanding at one 
> time than the lesser of:
> 
> *) what the application has sent
> *) the current value of the computed congestion window
> *) the receiver's advertised window
> *) the quantity of data TCP can hold in its retransmission queue
> 
> That last one is, IIRC directly related to "SO_SNDBUF"
> 
> That leads to an hypothesis of all of those being/growing large enough 
> to overflow a queue somewhere - for example an interface's transmit 
> queue and causing retransmissions.  Ostensibly, one could check that in 
> ifconfig and/or netstat statistics.

The latest 6.0.1-beta version of nuttcp, available at:

	http://lcp.nrl.navy.mil/nuttcp/beta/nuttcp-6.0.1.c

will report TCP retransmission info.

I did some tests on 10-GigE and TCP retransmissions weren't an issue,
but specifying too large a socket buffer size did have a performance
penalty (tests run on 2.6.20.7 kernel).

First, using a 512 KB socket buffer:

[root@...nce8 ~]# repeat 10 taskset 1 nuttcp -f-beta -M1460 -w512k 192.168.88.13 | ./mam 7
 5620.7500 MB /  10.01 sec = 4709.4941 Mbps 99 %TX 66 %RX 0 retrans
 5465.5000 MB /  10.01 sec = 4579.4129 Mbps 100 %TX 63 %RX 0 retrans
 5704.0625 MB /  10.01 sec = 4781.2377 Mbps 100 %TX 71 %RX 0 retrans
 5398.5000 MB /  10.01 sec = 4525.1052 Mbps 99 %TX 62 %RX 0 retrans
 5691.6250 MB /  10.01 sec = 4770.8076 Mbps 99 %TX 71 %RX 0 retrans
 5404.1875 MB /  10.01 sec = 4529.8749 Mbps 99 %TX 64 %RX 0 retrans
 5698.3125 MB /  10.01 sec = 4776.3878 Mbps 100 %TX 70 %RX 0 retrans
 5400.6250 MB /  10.01 sec = 4526.8575 Mbps 100 %TX 65 %RX 0 retrans
 5694.7500 MB /  10.01 sec = 4773.3970 Mbps 100 %TX 71 %RX 0 retrans
 5440.9375 MB /  10.01 sec = 4558.8289 Mbps 100 %TX 64 %RX 0 retrans

min/avg/max = 4525.1052/4653.1404/4781.2377

I specified a TCP MSS of 1460 to force use of standard 1500-byte
Ethernet IP MTU since my default mode is to use 9000-byte jumbo
frames (I also have TSO disabled).

Then, using a 10 MB socket buffer:

[root@...nce8 ~]# repeat 10 taskset 1 nuttcp -f-beta -M1460 -w10m 192.168.88.13 | ./mam 7
 5675.8750 MB /  10.01 sec = 4757.6071 Mbps 100 %TX 66 %RX 0 retrans
 5717.6250 MB /  10.01 sec = 4792.6069 Mbps 100 %TX 72 %RX 0 retrans
 5679.0000 MB /  10.01 sec = 4760.2204 Mbps 100 %TX 70 %RX 0 retrans
 5444.3125 MB /  10.01 sec = 4563.4777 Mbps 99 %TX 63 %RX 0 retrans
 5689.0625 MB /  10.01 sec = 4768.6363 Mbps 100 %TX 72 %RX 0 retrans
 5583.1875 MB /  10.01 sec = 4679.8851 Mbps 100 %TX 67 %RX 0 retrans
 5647.1250 MB /  10.01 sec = 4731.5889 Mbps 100 %TX 68 %RX 0 retrans
 5605.2500 MB /  10.01 sec = 4696.5324 Mbps 100 %TX 68 %RX 0 retrans
 5609.2500 MB /  10.01 sec = 4701.7601 Mbps 100 %TX 66 %RX 0 retrans
 5633.0000 MB /  10.01 sec = 4721.6696 Mbps 100 %TX 65 %RX 0 retrans

min/avg/max = 4563.4777/4717.3984/4792.6069

Not much difference (about a 1.38 % increase).

But then switching to a 100 MB socket buffer:

[root@...nce8 ~]# repeat 10 taskset 1 nuttcp -f-beta -M1460 -w100m 192.168.88.13 | ./mam 7
 4887.6250 MB /  10.01 sec = 4095.2239 Mbps 99 %TX 68 %RX 0 retrans
 4956.0625 MB /  10.01 sec = 4152.5652 Mbps 100 %TX 68 %RX 0 retrans
 4935.3750 MB /  10.01 sec = 4136.9084 Mbps 99 %TX 69 %RX 0 retrans
 4962.5000 MB /  10.01 sec = 4159.6409 Mbps 100 %TX 69 %RX 0 retrans
 4919.9375 MB /  10.01 sec = 4123.9685 Mbps 100 %TX 68 %RX 0 retrans
 4947.0625 MB /  10.01 sec = 4146.7009 Mbps 100 %TX 69 %RX 0 retrans
 5071.0625 MB /  10.01 sec = 4250.6175 Mbps 100 %TX 75 %RX 0 retrans
 4958.3125 MB /  10.01 sec = 4156.1080 Mbps 100 %TX 71 %RX 0 retrans
 5078.3750 MB /  10.01 sec = 4256.7461 Mbps 100 %TX 74 %RX 0 retrans
 4955.1875 MB /  10.01 sec = 4151.8279 Mbps 100 %TX 67 %RX 0 retrans

min/avg/max = 4095.2239/4163.0307/4256.7461

This did take about a 8.95 % performance hit.

And using TCP autotuning:

[root@...nce8 ~]# repeat 10 taskset 1 nuttcp -f-beta -M1460 192.168.88.13 | ./mam 7
 5673.6875 MB /  10.01 sec = 4755.7692 Mbps 100 %TX 66 %RX 0 retrans
 5659.3125 MB /  10.01 sec = 4743.6986 Mbps 99 %TX 67 %RX 0 retrans
 5835.5000 MB /  10.01 sec = 4891.3760 Mbps 99 %TX 70 %RX 0 retrans
 4985.5625 MB /  10.01 sec = 4177.2838 Mbps 99 %TX 68 %RX 0 retrans
 5753.0000 MB /  10.01 sec = 4820.2951 Mbps 100 %TX 67 %RX 0 retrans
 5536.8750 MB /  10.01 sec = 4641.0910 Mbps 100 %TX 63 %RX 0 retrans
 5610.5625 MB /  10.01 sec = 4702.8626 Mbps 100 %TX 62 %RX 0 retrans
 5576.5625 MB /  10.01 sec = 4674.3628 Mbps 100 %TX 66 %RX 0 retrans
 5573.5625 MB /  10.01 sec = 4671.8411 Mbps 100 %TX 64 %RX 0 retrans
 5550.0000 MB /  10.01 sec = 4652.0684 Mbps 100 %TX 65 %RX 0 retrans

min/avg/max = 4177.2838/4673.0649/4891.3760

For the 10-GigE testing there was no performance penalty using the
TCP autotuning, getting basically the same performance as the "-w512k"
test case.  Perhaps this is because the send socket buffer size never
gets up to the 100 MB levels for 10-GigE where it would be an issue
(GigE may have lower thresholds for encountering the issue).

While I was it, I decided to also check the CPU affinity issue,
since these tests are CPU limited, and re-ran the "-w512k" test
case on CPU 1 (using "taskset 2"):

[root@...nce8 ~]# repeat 10 taskset 2 nuttcp -f-beta -M1460 -w512k 192.168.88.13 | ./mam 7
 4942.0625 MB /  10.01 sec = 4142.5086 Mbps 100 %TX 56 %RX 0 retrans
 4833.4375 MB /  10.01 sec = 4051.4628 Mbps 100 %TX 52 %RX 0 retrans
 5291.0000 MB /  10.01 sec = 4434.9701 Mbps 99 %TX 63 %RX 0 retrans
 5287.7500 MB /  10.01 sec = 4432.2468 Mbps 100 %TX 62 %RX 0 retrans
 5011.7500 MB /  10.01 sec = 4200.9007 Mbps 99 %TX 56 %RX 0 retrans
 5198.5625 MB /  10.01 sec = 4355.7784 Mbps 100 %TX 62 %RX 0 retrans
 4981.0000 MB /  10.01 sec = 4173.4818 Mbps 100 %TX 54 %RX 0 retrans
 4991.1250 MB /  10.01 sec = 4183.6394 Mbps 100 %TX 55 %RX 0 retrans
 5234.7500 MB /  10.01 sec = 4387.8510 Mbps 99 %TX 60 %RX 0 retrans
 4994.3125 MB /  10.01 sec = 4186.3108 Mbps 100 %TX 57 %RX 0 retrans

min/avg/max = 4051.4628/4254.9150/4434.9701

This took about a 8.56 % performance hit relative to running the
same test on CPU 0, which is also the CPU that handles the 10-GigE
NIC interrupts.  Note the test systems are dual-CPU but single-core
(dual 2.8 GHz AMD Opterons).

						-Bill
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html