[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51C8BB3E.8090701@hp.com>
Date: Mon, 24 Jun 2013 14:33:50 -0700
From: Rick Jones <rick.jones2@...com>
To: Ricardo Landim <ricardolan@...il.com>
CC: Eric Dumazet <eric.dumazet@...il.com>,
Ben Hutchings <bhutchings@...arflare.com>,
netdev@...r.kernel.org
Subject: Re: UDP splice
On 06/24/2013 11:08 AM, Ricardo Landim wrote:
> Help in zero copy and improve in cost of syscalls.
>
> In my intel xeon(3.3ghz), read udp socket and write udp socket (proxy)
> spends ~40000 cycles (~12 us).
Are you quite certain your Xeon was actually running at 3.3GHz at the
time? I just did a quick netperf UDP_RR test between an old
Centrino-based laptop (HP 8510w) pegged at 1.6 GHz (cpufreq-set) and it
was reporting a service demand of 12.2 microseconds per transaction,
which is, basically, a send and recv pair plus stack:
root@...-8510w:~# netperf -t UDP_RR -c -i 30,3 -H tardy.usa.hp.com -- -r
140,1MIGRATED UDP REQUEST/RESPONSE TEST from 0.0.0.0 () port 0 AF_INET
to tardy.usa.hp.com () port 0 AF_INET : +/-2.500% @ 99% conf. : demo :
first burst 0
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput : 1.120%
!!! Local CPU util : 6.527%
!!! Remote CPU util : 0.000%
Local /Remote
Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem
Send Recv Size Size Time Rate local remote local remote
bytes bytes bytes bytes secs. per sec % S % U us/Tr us/Tr
180224 180224 140 1 10.00 12985.58 7.93 -1.00 12.221
-1.000
212992 212992
(Don't fret too much about the confidence intervals bit, it almost made it.)
Also, my 1400 byte test didn't have all that different a service demand:
root@...-8510w:~# netperf -t UDP_RR -c -i 30,3 -H tardy.usa.hp.com -- -r
1400,1
MIGRATED UDP REQUEST/RESPONSE TEST from 0.0.0.0 () port 0 AF_INET to
tardy.usa.hp.com () port 0 AF_INET : +/-2.500% @ 99% conf. : demo :
first burst 0
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput : 1.123%
!!! Local CPU util : 6.991%
!!! Remote CPU util : 0.000%
Local /Remote
Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem
Send Recv Size Size Time Rate local remote local remote
bytes bytes bytes bytes secs. per sec % S % U us/Tr us/Tr
180224 180224 1400 1 10.00 10055.33 6.27 -1.00 12.469
-1.000
212992 212992
Of course I didn't try very hard to force cache misses (eg using a big
send/recv ring) and there may have been other things happening on the
system causing a change between the two tests (separated by an hour or
so). I didn't make sure that interrupts stayed assigned to a specific
CPU, nor that netperf did. The kernel:
root@...-8510w:~# uname -a
Linux raj-8510w 3.8.0-25-generic #37-Ubuntu SMP Thu Jun 6 20:47:30 UTC
2013 i686 i686 i686 GNU/Linux
In general, I suppose if you want to quantify the overhead of copies,
you can try something like the two tests above, but for longer run times
and with more intermediate data points, as you walk the request or
response size up. Watch the change in service demand as you go. So
long as you stay below 1472 bytes (assuming IPv4 over a "standard" 1500
byte MTU Ethernet) you won't generate fragments, and so will still have
the same number of packets per transaction.
Or you could "perf" profile and look for copy routines.
happy benchmarking,
rick jones
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists