[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c80a9e7846bf903728327a1ca2c3bdcc078057a2.camel@redhat.com>
Date: Wed, 10 Jul 2019 09:27:01 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Edward Cree <ecree@...arflare.com>,
David Miller <davem@...emloft.net>
Cc: netdev <netdev@...r.kernel.org>,
Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [RFC PATCH net-next 0/3] net: batched receive in GRO path
Hi,
On Tue, 2019-07-09 at 20:27 +0100, Edward Cree wrote:
> Where not specified (as batch=), net.core.gro_normal_batch was set to 8.
> The net-next baseline used for these tests was commit 7d30a7f6424e.
> TCP 4 streams, GRO on: all results line rate (9.415Gbps)
> net-next: 210.3% cpu
> after #1: 181.5% cpu (-13.7%, p=0.031 vs net-next)
> after #3: 191.7% cpu (- 8.9%, p=0.102 vs net-next)
> TCP 4 streams, GRO off:
> after #1: 7.785 Gbps
> after #3: 8.387 Gbps (+ 7.7%, p=0.215 vs #1, but note *)
> TCP 1 stream, GRO on: all results line rate & ~200% cpu.
> TCP 1 stream, GRO off:
> after #1: 6.444 Gbps
> after #3: 7.363 Gbps (+14.3%, p=0.003 vs #1)
> batch=16: 7.199 Gbps
> batch= 4: 7.354 Gbps
> batch= 0: 5.899 Gbps
> TCP 100 RR, GRO off:
> net-next: 995.083 us
> after #1: 969.167 us (- 2.6%, p=0.204 vs net-next)
> after #3: 976.433 us (- 1.9%, p=0.254 vs net-next)
>
> (*) These tests produced a mixture of line-rate and below-line-rate results,
> meaning that statistically speaking the results were 'censored' by the
> upper bound, and were thus not normally distributed, making a Welch t-test
> mathematically invalid. I therefore also calculated estimators according
> to [2], which gave the following:
> after #1: 8.155 Gbps
> after #3: 8.716 Gbps (+ 6.9%, p=0.291 vs #1)
> (though my procedure for determining ν wasn't mathematically well-founded
> either, so take that p-value with a grain of salt).
I'm toying with a patch similar to your 3/3 (most relevant difference
being the lack of a limit to the batch size), on top of ixgbe (which
sends all the pkts to the GRO engine), and I'm observing more
controversial results (UDP only):
* when a single rx queue is running, I see a just-above-noise
peformance delta
* when multiple rx queues are running, I observe measurable regressions
(note: I use small pkts, still well under line rate even with multiple
rx queues)
I'll try to test your patch in the following days.
Side note: I think that in patch 3/3, it's necessary to add a call to
gro_normal_list() also inside napi_busy_loop().
Cheers,
Paolo
Powered by blists - more mailing lists