[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57BF21C7.5070709@caviumnetworks.com>
Date: Thu, 25 Aug 2016 09:50:15 -0700
From: David Daney <ddaney@...iumnetworks.com>
To: Ed Swierk <eswierk@...portsystems.com>
CC: linux-mips <linux-mips@...ux-mips.org>,
driverdev-devel <devel@...verdev.osuosl.org>,
netdev <netdev@...r.kernel.org>,
Aaro Koskinen <aaro.koskinen@...ia.com>
Subject: Re: Improving OCTEON II 10G Ethernet performance
On 08/24/2016 06:29 PM, Ed Swierk wrote:
> I'm trying to migrate from the Octeon SDK to a vanilla Linux 4.4
> kernel for a Cavium OCTEON II (CN6880) board running in 64-bit
> little-endian mode. So far I've gotten most of the hardware features I
> need working, including XAUI/RXAUI, USB, boot bus and I2C, with a
> fairly small set of patches.
> https://github.com/skyportsystems/linux/compare/master...octeon2
>
It is unclear what your motivations for doing this are, so I can think
of several things you could do:
A) Get v4.4 based SDK from Cavium.
B) Major rewrite of octeon-ethernet driver.
C) Live with current staging driver.
> The biggest remaining hurdle is improving 10G Ethernet performance:
> iperf -P 10 on the SDK kernel gets close to 10 Gbit/sec throughput,
> while on my 4.4 kernel, it tops out around 1 Gbit/sec.
>
> Comparing the octeon-ethernet driver in the SDK
> (http://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto-contrib/tree/drivers/net/ethernet/octeon?h=apaliwal/octeon)
> against the one in 4.4, the latter appears to utilize only a single
> CPU core for the rx path. It's not clear to me if there is a similar
> issue on the tx side, or other bottlenecks.
The main limiting factor to performance is single threaded RX
processing. The main manner this is handled in the out-of-tree vendor
driver is to have multiple NAPI processing threads running against the
same RX queue when there is a queue backlog. The disadvantage of doing
this is that packets may be received out of order due to
non-synchronization across multiple CPUs.
On the TX side, the locks on the queuing discipline can become contended
leading to cache line bouncing. In the TX code of the driver itself,
there should be no impediments to parallel TX operations.
Ideally we would configure the packet classifiers on the RX side to
create multiple RX queues based on a hash of the TCP 5-tuple, and handle
each queue with a single NAPI instance. That should result in better
performance while maintaining packet ordering.
>
> I started trying to port multi-CPU rx from the SDK octeon-ethernet
> driver, but had trouble teasing out just the necessary bits without
> following a maze of dependencies on unrelated functions. (Dragging
> major parts of the SDK wholesale into 4.4 defeats the purpose of
> switching to a vanilla kernel, and doesn't bring us closer to getting
> octeon-ethernet out of staging.)
Yes, you have identified the main problem with this code.
All the code managing the SerDes and other MAC functions needs a
complete rewrite. One main problem is that all the SerDes/MACs in the
system are configured simultaneously instead of on a per device basis.
There are also a plethora of different SerDes technologies in use:
(RGMII, SGMII, QSGMII, XFI, XAUI, RXAUI, SPI-4.1, XLAUI, KR, ...) The
code that handles all of these is mixed together with huge case
statements switching on interface mode all over the place.
There is also code to handle target-mode PCI/PCIe packet engines mixed
in as well. This stuff should probably be removed.
>
> Has there been any work on the octeon-ethernet driver since this patch
> set? https://www.linux-mips.org/archives/linux-mips/2015-08/msg00338.html
>
> Any hints on what to pick out of the SDK code to improve 10G
> performance would be appreciated.
>
> --Ed
>
Powered by blists - more mailing lists