[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+mtBx_RH3v92Pmd_TYM=i0Anx-iG63762NPRSeCc1k98-Q6UQ@mail.gmail.com>
Date: Mon, 6 Jan 2014 12:59:59 -0800
From: Tom Herbert <therbert@...gle.com>
To: Benjamin Poirier <bpoirier@...e.de>
Cc: Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: tx-nocache-copy performance
On Mon, Jan 6, 2014 at 12:27 PM, Benjamin Poirier <bpoirier@...e.de> wrote:
> Hi Tom,
>
> In commit "c6e1a0d net: Allow no-cache copy from user on transmit
> (v3.0-rc1)" you introduced the tx-nocache-copy performance optimization
> and set it to on by default. I've tried to reproduce your testcase, as
> well as a few more, but I did not find any performance improvement from
> turning on tx-nocache-copy. Do you think tx-nocache-copy is still a
> worthwhile optimization and it should remain on by default? In which
> situations does it help?
>
Unfortunately, I think this is probably not a worthwhile optimization
at this point. The benefits should manifest themselves under high
networking load and high CPU load where we are getting a lot of
pressure on the cache, the non-temporal copy should alleviate that
case. In reality, I suspect that rep movsq is more efficient that
movntq's so the advantages of skipping the cache might be wiped out.
It would be nice if Intel had a movntsq instruction!
btw, I still believe it would be a win if we could use vmsplice to
mitigate the copy altogether, unfortunately no one has yet to come up
with an interface to reliably reclaim buffers :-(.
> I've ran latency tests similar to the ones you described in the commit
> log. I've also tested how the option affects single stream throughput
> tests. According to the results I obtained, it seems that
> tx-nocache-copy has either no impact (in the latency test) or a negative
> impact (in the throughput test).
>
> My test results follow. I tested using 3.12.6 on one Intel Xeon W3565
> and one i7 920 connected by ixgbe adapters. The results are from the
> Xeon, but they're similar on the i7. All numbers report the meanąstddev
> over 10 runs of 10s.
>
> 1) latency tests similar to what you described
> There is no statistically significant difference between tx-nocache-copy
> on/off.
> nic irqs spread out (one queue per cpu)
>
> 200x netperf -r 1400,1
> tx-nocache-copy off
> 692000ą1000 tps
> 50/90/95/99% latency (us): 275ą2/643.8ą0.4/799ą1/2474.4ą0.3
> tx-nocache-copy on
> 693000ą1000 tps
> 50/90/95/99% latency (us): 274ą1/644.1ą0.7/800ą2/2474.5ą0.7
>
> 200x netperf -r 14000,14000
> tx-nocache-copy off
> 86450ą80 tps
> 50/90/95/99% latency (us): 334.37ą0.02/838ą1/2100ą20/3990ą40
> tx-nocache-copy on
> 86110ą60 tps
> 50/90/95/99% latency (us): 334.28ą0.01/837ą2/2110ą20/3990ą20
>
> 2) single stream throughput tests
> tx-nocache-copy leads to higher service demand
>
> throughput cpu0 cpu1 demand
> (Gb/s) (Gcycle) (Gcycle) (cycle/B)
>
> nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)
>
> tx-nocache-copy off 9402ą5 9.4ą0.2 0.80ą0.01
> tx-nocache-copy on 9403ą3 9.85ą0.04 0.838ą0.004
>
> nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)
>
> tx-nocache-copy off 9401ą5 5.83ą0.03 5.0ą0.1 0.923ą0.007
> tx-nocache-copy on 9404ą2 5.74ą0.03 5.523ą0.009 0.958ą0.002
>
> -Benjamin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists