[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160901181211.GI14316@raspberrypi.musicnaut.iki.fi>
Date: Thu, 1 Sep 2016 21:12:11 +0300
From: Aaro Koskinen <aaro.koskinen@....fi>
To: Ed Swierk <eswierk@...portsystems.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
David Daney <ddaney@...iumnetworks.com>,
devel@...verdev.osuosl.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 00/11] staging: octeon: multi rx group (queue) support
Hi,
On Wed, Aug 31, 2016 at 07:09:13PM -0700, Ed Swierk wrote:
> On 8/31/16 13:57, Aaro Koskinen wrote:
> > This series implements multiple RX group support that should improve
> > the networking performance on multi-core OCTEONs. Basically we register
> > IRQ and NAPI for each group, and ask the HW to select the group for
> > the incoming packets based on hash.
> >
> > Tested on EdgeRouter Lite with a simple forwarding test using two flows
> > and 16 RX groups distributed between two cores - the routing throughput
> > is roughly doubled.
> >
> > Also tested with EBH5600 (8 cores) and EBB6800 (16 cores) by sending
> > and receiving traffic in both directions using SGMII interfaces.
>
> With this series on 4.4.19, rx works with receive_group_order > 0.
Good.
> Setting receive_group_order=4, I do see 16 Ethernet interrupts. I tried
> fiddling with various smp_affinity values (e.g. setting them all to
> ffffffff, or assigning a different one to each interrupt, or giving a
> few to some and a few to others), as well as different values for
> rps_cpus. 10-thread parallel iperf performance varies between 0.5 and 1.5
> Gbit/sec total depending on the particular settings.
>
> With the SDK kernel I get over 8 Gbit/sec. It seems to be achieving that
> using just one interrupt (not even a separate one for tx, as far as I can
> tell) pegged to CPU 0 (the default smp_affinity). I must be missing some
> other major configuration tweak, perhaps specific to 10G.
>
> Can you run a test on the EBB6800 with the interfaces in 10G mode?
Yes, I attached two EBB6800s with XAUI and ran iperf -P 10.
With single group it gives 2.9 Gbit/s, and with 16 groups (on 16 cores)
4.3 Gbit/s. In 16 group case none of the CPUs are even close to 100%,
so the bottleneck is somewhere else. I guess implementing the proper
SSO init should increase the throughput.
A.
Powered by blists - more mailing lists