netdev - Re: [RFC] [PATCH] Optimize TCP sendmsg in favour of fast devices?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4B50B032.2060609@hp.com>
Date:	Fri, 15 Jan 2010 10:13:06 -0800
From:	Rick Jones <rick.jones2@...com>
To:	Krishna Kumar <krkumar2@...ibm.com>
CC:	davem@...emloft.net, ilpo.jarvinen@...sinki.fi,
	netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [RFC] [PATCH] Optimize TCP sendmsg in favour of fast devices?

Krishna Kumar wrote:
> From: Krishna Kumar <krkumar2@...ibm.com>
> 
> Remove inline skb data in tcp_sendmsg(). For the few devices that
> don't support NETIF_F_SG, dev_queue_xmit will call skb_linearize,
> and pass the penalty to those slow devices (the following drivers
> do not support NETIF_F_SG: 8139cp.c, amd8111e.c, dl2k.c, dm9000.c,
> dnet.c, ethoc.c, ibmveth.c, ioc3-eth.c, macb.c, ps3_gelic_net.c,
> r8169.c, rionet.c, spider_net.c, tsi108_eth.c, veth.c,
> via-velocity.c, atlx/atl2.c, bonding/bond_main.c, can/dev.c,
> cris/eth_v10.c).
> 
> This patch does not affect devices that support SG but turn off
> via ethtool after register_netdev.
> 
> I ran the following test cases with iperf - #threads: 1 4 8 16 32
> 64 128 192 256, I/O sizes: 256 4K 16K 64K, each test case runs for
> 1 minute, repeat 5 iterations. Total test run time is 6 hours.
> System is 4-proc Opteron, with a Chelsio 10gbps NIC. Results (BW
> figures are the aggregate across 5 iterations in mbps):
> 
> -------------------------------------------------------
> #Process   I/O Size    Org-BW     New-BW   %-change
> -------------------------------------------------------
> 1           256        2098       2147      2.33
> 1           4K         14057      14269     1.50
> 1           16K        25984      27317     5.13
> 1           64K        25920      27539     6.24
> ...
> 256         256        1947       1955      0.41
> 256         4K         9828       12265     24.79
> 256         16K        25087      24977     -0.43
> 256         64K        26715      27997     4.79
> -------------------------------------------------------
> Total:      -          600071     634906    5.80
> -------------------------------------------------------

Does bandwidth alone convey the magnitude of the change?  I would think that 
would only be the case if the CPU(s) were 100% utilized, and perhaps not even 
completely then.  At the risk of a shameless plug, it's not for nothing that 
netperf reports service demand :)

I would think that change in service demand (CPU per unit of work) would be 
something one wants to see.

Also, the world does not run on bandwidth alone, so small packet performance and 
any delta there would be good to have.

Multiple process tests may not be as easy in netperf as it is in iperf, but under:

ftp://ftp.netperf.org/netperf/misc

I have a single-stream test script I use called runemomni.sh and an example of 
its output, as well as an aggregate script I use called runemomniagg2.sh - I'll 
post an example of its output there as soon as I finish some runs.  The script 
presumes one has ./configure'd netperf:

./configure --enable-burst --enable-omni ...

The netperf omni tests still ass-u-me that the CPU util each measures is all his 
own, which means the service demands from aggrgate tests require some 
post-processing fixup.

http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Using-Netperf-to-Measure-Aggregate-Performance

happy benchmarking,

rick jones

FWIW, service demand and pps performance may be even more important for non-SG 
devices because they may be slow 1 Gig devices and still hit link-rate on a bulk 
throughput test even with a non-trivial increase in CPU util.  However, a 
non-trivial hit in CPU util may rather change the pps performance.

PPS - there is a *lot* of output in those omni test results - best viewed with a 
spreadsheet program.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html