netdev - Re: [RFC PATCH net-next 1/3] ixgbe: support netdev_ops->ndo_xmit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140826084455.28dd4058@redhat.com>
Date:	Tue, 26 Aug 2014 08:44:55 +0200
From:	Jesper Dangaard Brouer <brouer@...hat.com>
To:	Alexander Duyck <alexander.h.duyck@...el.com>
Cc:	Daniel Borkmann <dborkman@...hat.com>, davem@...emloft.net,
	netdev@...r.kernel.org, brouer@...hat.com
Subject: Re: [RFC PATCH net-next 1/3] ixgbe: support
 netdev_ops->ndo_xmit_flush()

On Mon, 25 Aug 2014 15:51:50 -0700
Alexander Duyck <alexander.h.duyck@...el.com> wrote:

> On 08/25/2014 05:07 AM, Jesper Dangaard Brouer wrote:
> > On Sun, 24 Aug 2014 15:42:16 +0200
> > Daniel Borkmann <dborkman@...hat.com> wrote:
> > 
> >> This implements the deferred tail pointer flush API for the ixgbe
> >> driver. Similar version also proposed longer time ago by Alexander Duyck.
> > 
> > I've run some benchmarks with this patch only, which actually shows a
> > performance regression.
> > 
> > Using trafgen with QDISC_BYPASS and mmap mode, via cmdline:
> >  trafgen --cpp  --dev eth5 --conf udp_example01.trafgen -V --cpus 1
> > 
> > BASELINE(no-patch): trafgen QDISC_BYPASS and mmap:
> >  - tx:1562539 pps
> > 
> > (This patch only): ixgbe use of .ndo_xmit_flush.
> >  - tx:1532299 pps
> > 
> > Regression: -30240 pps
> >  * In nanosec: (1/1562539*10^9)-(1/1532299*10^9) = -12.63 ns
> > 
> > 
> > As DaveM points out, me might not need the mmiowb().
> > Result when not performing the mmiowb():
> >  - tx:1548352 pps
> > 
> > Still a small regression: -14187 pps
> >  * In nanosec: (1/1562539*10^9)-(1/1548352*10^9) = -5.86 ns
> > 
> > 
> > I was not expecting this "slowdown", with this rather simple use of the
> > new ndo_xmit_flush API.  Can anyone explain why this is happening?
> 
> One possibility is that we are now doing less stuff between the time we
> write tail and when we grab the qdisc lock (locked transactions are
> stalled by MMIO) so that we are spending more time stuck waiting for the
> write to complete and doing nothing.

In this testcase we are bypassing the qdisc code path, but still taking
the HARD_TX_LOCK.  I were only expecting in the area of -2ns due to the
extra function call overhead.

But when we start to include the qdisc code path, then the performance
regression gets even worse.  I would like an explanation for that, see:

 http://thread.gmane.org/gmane.linux.network/327254/focus=327431


> Then of course there are always the funny oddball quirks such as the
> code changes might have changed the alignment of a loop resulting in Tx
> cleanup more expensive than it was before.

Yes, this is when it gets hairy!

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html