[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131121004430.GX8581@1wt.eu>
Date: Thu, 21 Nov 2013 01:44:30 +0100
From: Willy Tarreau <w@....eu>
To: Arnaud Ebalard <arno@...isbad.org>
Cc: Thomas Petazzoni <thomas.petazzoni@...e-electrons.com>,
Florian Fainelli <f.fainelli@...il.com>,
simon.guinot@...uanux.org, Eric Dumazet <eric.dumazet@...il.com>,
netdev@...r.kernel.org, edumazet@...gle.com,
Cong Wang <xiyou.wangcong@...il.com>,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s
Hi Arnaud,
On Wed, Nov 20, 2013 at 10:54:35PM +0100, Willy Tarreau wrote:
> I'm currently trying to implement TX IRQ handling. I found the registers
> description in the neta driver that is provided in Marvell's LSP kernel
> that is shipped with some devices using their CPUs. This code is utterly
> broken (eg: splice fails with -EBADF) but I think the register descriptions
> could be trusted.
>
> I'd rather have real IRQ handling than just relying on mvneta_poll(), so
> that we can use it for asymmetric traffic/routing/whatever.
OK it paid off. And very well :-)
I did it at once and it worked immediately. I generally don't like this
because I always fear that some bug was left there hidden in the code. I have
only tested it on the Mirabox, so I'll have to try on the OpenBlocks AX3-4 and
on the XP-GP board for some SMP stress tests.
I upgraded my Mirabox to latest Linus' git (commit 5527d151) and compared
with and without the patch.
without :
- need at least 12 streams to reach gigabit.
- 60% of idle CPU remains at 1 Gbps
- HTTP connection rate on empty objects is 9950 connections/s
- cumulated outgoing traffic on two ports reaches 1.3 Gbps
with the patch :
- a single stream easily saturates the gigabit
- 87% of idle CPU at 1 Gbps (12 streams, 90% idle at 1 stream)
- HTTP connection rate on empty objects is 10250 connections/s
- I saturate the two gig ports at 99% CPU, so 2 Gbps sustained output.
BTW I must say I was impressed to see that big an improvement in CPU
usage between 3.10 and 3.13, I suspect some of the Tx queue improvements
that Eric has done in between account for this.
I cut the patch in 3 parts :
- one which reintroduces the hidden bits of the driver
- one which replaces the timer with the IRQ
- one which changes the default Tx coalesce from 16 to 4 packets
(larger was preferred with the timer, but less is better now).
I'm attaching them, please test them on your device.
Note that this is *not* for inclusion at the moment as it has not been
tested on the SMP CPUs.
Cheers,
Willy
View attachment "0001-net-mvneta-add-missing-bit-descriptions-for-interrup.patch" of type "text/plain" (3902 bytes)
View attachment "0002-net-mvneta-replace-Tx-timer-with-a-real-interrupt.patch" of type "text/plain" (6447 bytes)
View attachment "0003-net-mvneta-reduce-Tx-coalesce-from-16-to-4-packets.patch" of type "text/plain" (1013 bytes)
Powered by blists - more mailing lists