lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20140827133426.7e734beb@redhat.com>
Date:	Wed, 27 Aug 2014 13:34:26 +0200
From:	Jesper Dangaard Brouer <brouer@...hat.com>
To:	Jesper Dangaard Brouer <brouer@...hat.com>
Cc:	Daniel Borkmann <dborkman@...hat.com>, davem@...emloft.net,
	netdev@...r.kernel.org, Daniel Borkmann <dborkman@...hat.com>,
	Hannes Frederic Sowa <hannes@...hat.com>,
	Florian Westphal <fw@...len.de>
Subject: Re: [RFC PATCH net-next 1/3] ixgbe: support
 netdev_ops->ndo_xmit_flush()

On Mon, 25 Aug 2014 14:07:21 +0200
Jesper Dangaard Brouer <brouer@...hat.com> wrote:

> On Sun, 24 Aug 2014 15:42:16 +0200
> Daniel Borkmann <dborkman@...hat.com> wrote:
> 
> > This implements the deferred tail pointer flush API for the ixgbe
> > driver. Similar version also proposed longer time ago by Alexander Duyck.
> 
> I've run some benchmarks with this patch only, which actually shows a
> performance regression.
> 
[...]
>
> Still a small regression: -14187 pps
>  * In nanosec: (1/1562539*10^9)-(1/1548352*10^9) = -5.86 ns
>  
> I was not expecting this "slowdown", with this rather simple use of the
> new ndo_xmit_flush API.  Can anyone explain why this is happening?

I've re-run this experiment with more accuracy, e.g. C-state tuning, no
Hyper-Threading, and using pktgen. See desc in thread subj: "Get rid of
ndo_xmit_flush"[1].

DaveM was right in reverting this API, according to my new more
accurate measurements, the conclusion is the same, this API hurts performance.

Compared to baseline, with this patch (except not using mmiowb()):
 * (1/5609929*10^9)-(1/5388719*10^9) = -7.32 ns

Details below signature.

[1] http://thread.gmane.org/gmane.linux.network/327502/focus=327803
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


Base setup
==========

BIOS: Disabled HT (Hyper-Threading)

Setup commands:
 sudo killall irqbalance
 base_device_setup.sh eth4 # calls set_irq_affinity
 base_device_setup.sh eth5
 netfilter_unload_modules.sh
 sudo ethtool -C eth5 rx-usecs 30
 sudo tuned-adm profile latency-performance

pktgen cmdline:
 ./example03.sh -i eth5 -d 192.168.21.4 -m 00:12:c0:80:1d:54
 (SKB_CLONE="100000" and no UDP port random)

Vanilla kernel for baselining, just **before**:
 * commit 4798248e4e02 ("net: Add ops->ndo_xmit_flush()").
Thus at:
 * commit 4c83acbc565d53 ("ipv6: White-space cleansing : gaps between function and symbol export").

With no HT:
 * ethtool -C eth5 rx-usecs 30
 * tuned-adm profile latency-performance
Results (pktgen):
 * instant rx:2 tx:5620736 pps n:120 average: rx:1 tx:5618140 pps
   (instant variation TX 0.082 ns (min:-0.088 max:0.147) RX 0.000 ns)
 * instant rx:1 tx:5622300 pps n:250 average: rx:1 tx:5619732 pps
   (instant variation TX 0.081 ns (min:-0.858 max:0.098) RX 0.000 ns)
 * accuracy: (1/5618140*10^9)-(1/5619732*10^9) = 0.05 ns
 * instant rx:1 tx:5618692 pps n:120 average: rx:1 tx:5617469 pps
   (instant variation TX 0.039 ns (min:-0.043 max:0.045) RX 0.000 ns)
 * accuracy: (1/5619732*10^9)-(1/5617469*10^9) = -0.072 ns
 * (reboot same kernel)
 * Some hickup:
 * instant rx:1 tx:5610140 pps n:190 average: rx:1 tx:5587229 pps
   (instant variation TX 0.731 ns (min:-2.612 max:2.627) RX 0.000 ns)
 * accuracy: (1/5587229*10^9)-(1/5617469*10^9) = 0.963 ns
 * accuracy: (1/5587229*10^9)-(1/5619732*10^9) = 1.035 ns
 * instant rx:1 tx:5607568 pps n:120 average: rx:1 tx:5606006 pps
   (instant variation TX 0.050 ns (min:-0.855 max:0.066) RX 0.000 ns)
 * instant rx:1 tx:5608168 pps n:120 average: rx:1 tx:5611001 pps
   (instant variation TX -0.090 ns (min:-0.156 max:0.100) RX 0.000 ns)
 * Average: (5618140+5619732+5617469+5587229+5606006+5611001)/6 = 5609929 pps

Results: on branch 'ndo_xmit_flush'
-----------------------------------
Kernel at:
 * commit fe88e6dd8b9 ("Merge branch 'ndo_xmit_flush'")

Sending out ixgbe, which in this kernel does not have the defined the
ndo_xmit_flush function.

With no HT:
 * ethtool -C eth5 rx-usecs 30
 * tuned-adm profile latency-performance
Results (pktgen):
 * instant rx:1 tx:5600404 pps n:161 average: rx:1 tx:5600257 pps
  (instant variation TX 0.005 ns (min:-0.047 max:0.050) RX 0.000 ns)
 * instant rx:1 tx:5594840 pps n:120 average: rx:1 tx:5595316 pps
  (instant variation TX -0.015 ns (min:-0.028 max:0.025) RX 0.000 ns)
 * instant rx:1 tx:5599644 pps n:140 average: rx:1 tx:5599155 pps
  (instant variation TX 0.016 ns (min:-0.074 max:0.059) RX 0.000 ns)
 * instant rx:1 tx:5601296 pps n:75 average: rx:1 tx:5599074 pps
  (instant variation TX 0.071 ns (min:-0.051 max:0.087) RX 0.000 ns)
 * Averaged: (5600257+5595316+5599155+5599074)/4 = 5598450 pps

Compared to baseline: (averaged 5609929 pps)
 * (1/5609929*10^9)-(1/5598450*10^9) = -0.365ns

Conclusion: When ndo_xmit_flush is not active in driver, performance
is the same, as 0.365ns difference is below our accuracy level.

Results: on branch bulking01
----------------------------

Kernel at:
 * commit fe88e6dd8b9 ("Merge branch 'ndo_xmit_flush'")
 * Plus ixgbe support netdev_ops->ndo_xmit_flush()

With no HT:
 * ethtool -C eth5 rx-usecs 30
 * tuned-adm profile latency-performance
Results (pktgen):
 * instant rx:1 tx:5387528 pps n:170 average: rx:1 tx:5387842 pps
  (instant variation TX -0.011 ns (min:-0.193 max:0.125) RX 0.000 ns)
 * instant rx:1 tx:5387588 pps n:212 average: rx:1 tx:5387930 pps
  (instant variation TX -0.012 ns (min:-0.852 max:0.177) RX 0.000 ns)
 * instant rx:1 tx:5391172 pps n:70 average: rx:1 tx:5389684 pps
  (instant variation TX 0.051 ns (min:-0.097 max:0.087) RX 0.000 ns)
 * instant rx:1 tx:5388444 pps n:150 average: rx:1 tx:5389421 pps
  (instant variation TX -0.034 ns (min:-1.014 max:0.092) RX 0.000 ns
 * Average: (5387842+5387930+5389684+5389421)/4 = 5388719

Compared to baseline: (averaged 5609929 pps)
 * (1/5609929*10^9)-(1/5388719*10^9) = -7.32 ns

Conclusion: When ndo_xmit_flush is ACTIVE in the driver, then this new
API of calling ndo_xmit_flush(), hurts performance.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ