netdev - Re: tx-nocache-copy performance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+mtBx_RH3v92Pmd_TYM=i0Anx-iG63762NPRSeCc1k98-Q6UQ@mail.gmail.com>
Date:	Mon, 6 Jan 2014 12:59:59 -0800
From:	Tom Herbert <therbert@...gle.com>
To:	Benjamin Poirier <bpoirier@...e.de>
Cc:	Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: tx-nocache-copy performance

On Mon, Jan 6, 2014 at 12:27 PM, Benjamin Poirier <bpoirier@...e.de> wrote:
> Hi Tom,
>
> In commit "c6e1a0d net: Allow no-cache copy from user on transmit
> (v3.0-rc1)" you introduced the tx-nocache-copy performance optimization
> and set it to on by default. I've tried to reproduce your testcase, as
> well as a few more, but I did not find any performance improvement from
> turning on tx-nocache-copy. Do you think tx-nocache-copy is still a
> worthwhile optimization and it should remain on by default? In which
> situations does it help?
>
Unfortunately, I think this is probably not a worthwhile optimization
at this point. The benefits should manifest themselves under high
networking load and high CPU load where we are getting a lot of
pressure on the cache, the non-temporal copy should alleviate that
case. In reality, I suspect that rep movsq is more efficient that
movntq's so the advantages of skipping the cache might be wiped out.
It would be nice if Intel had a movntsq instruction!

btw, I still believe it would be a win if we could use vmsplice to
mitigate the copy altogether, unfortunately no one has yet to come up
with an interface to reliably reclaim buffers :-(.

> I've ran latency tests similar to the ones you described in the commit
> log. I've also tested how the option affects single stream throughput
> tests. According to the results I obtained, it seems that
> tx-nocache-copy has either no impact (in the latency test) or a negative
> impact (in the throughput test).
>
> My test results follow. I tested using 3.12.6 on one Intel Xeon W3565
> and one i7 920 connected by ixgbe adapters. The results are from the
> Xeon, but they're similar on the i7. All numbers report the meanąstddev
> over 10 runs of 10s.
>
> 1) latency tests similar to what you described
> There is no statistically significant difference between tx-nocache-copy
> on/off.
> nic irqs spread out (one queue per cpu)
>
> 200x netperf -r 1400,1
> tx-nocache-copy off
>         692000ą1000 tps
>         50/90/95/99% latency (us): 275ą2/643.8ą0.4/799ą1/2474.4ą0.3
> tx-nocache-copy on
>         693000ą1000 tps
>         50/90/95/99% latency (us): 274ą1/644.1ą0.7/800ą2/2474.5ą0.7
>
> 200x netperf -r 14000,14000
> tx-nocache-copy off
>         86450ą80 tps
>         50/90/95/99% latency (us): 334.37ą0.02/838ą1/2100ą20/3990ą40
> tx-nocache-copy on
>         86110ą60 tps
>         50/90/95/99% latency (us): 334.28ą0.01/837ą2/2110ą20/3990ą20
>
> 2) single stream throughput tests
> tx-nocache-copy leads to higher service demand
>
>                         throughput  cpu0        cpu1        demand
>                         (Gb/s)      (Gcycle)    (Gcycle)    (cycle/B)
>
> nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)
>
> tx-nocache-copy off     9402ą5      9.4ą0.2                 0.80ą0.01
> tx-nocache-copy on      9403ą3      9.85ą0.04               0.838ą0.004
>
> nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)
>
> tx-nocache-copy off     9401ą5      5.83ą0.03   5.0ą0.1     0.923ą0.007
> tx-nocache-copy on      9404ą2      5.74ą0.03   5.523ą0.009 0.958ą0.002
>
> -Benjamin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html