[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1462895053.23934.86.camel@edumazet-glaptop3.roam.corp.google.com>
Date: Tue, 10 May 2016 08:44:13 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: Alexander Duyck <alexander.duyck@...il.com>,
Netdev <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Saeed Mahameed <saeedm@...lanox.com>,
Or Gerlitz <gerlitz.or@...il.com>,
Eugenia Emantayev <eugenia@...lanox.com>
Subject: Re: [net-next PATCH V1 1/3] net: bulk alloc and reuse of SKBs in
NAPI context
On Tue, 2016-05-10 at 16:48 +0200, Jesper Dangaard Brouer wrote:
> On Tue, 10 May 2016 06:48:54 -0700
> Eric Dumazet <eric.dumazet@...il.com> wrote:
>
> > On Tue, 2016-05-10 at 14:30 +0200, Jesper Dangaard Brouer wrote:
> >
> > > Disable busy poll on both client and server, Not patched:
> > >
> > > $ netperf -H 198.18.40.2 -t TCP_RR -l 60 -T 6,6 -Cc
> > > MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 port 0 AF_INET to 198.18.40.2
> > > () port 0 AF_INET : histogram : demo : first burst 0 : cpu bind
> > > Local /Remote
> > > Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem
> > > Send Recv Size Size Time Rate local remote local remote
> > > bytes bytes bytes bytes secs. per sec % S % S us/Tr us/Tr
> > >
> > > 16384 87380 1 1 60.00 78077.55 3.74 2.69 3.830 8.265
> > > 16384 87380
> >
> > Tell us more about the -T6,6
> >
> > For example how many TX/RX queues you have on the NIC, and which cpus
> > service interrupts.
>
> The -T6,6 option:
> -T lcpu,rcpu Request netperf/netserver be bound to local/remote cpu
>
Sure, I know -T option in netperf.
> I use the option to get more stable results. If I don't pin/bind the
> CPU netperf/netserver is running on then the CPU scheduler will migrate
> the processes around. This gives unpredictable results, worst for the
> busy_poll tests. Especially if the RX softirq runs on the same CPU
> (also true if it runs on a HyperTread siping).
>
> Netperf client (8 cores i7-4790K CPU @ 4.00GHz) RX:8 and TX:8 queues.
> Netserver server (2x 12 cores E5-2630 @ 2.30GHz) RX:8 and TX:24 queues.
> Driver mlx4.
> Disabled GRO to hit code path I changed in patch 2.
But are you using stuff like aRFS, RPS , RFS ?
Each netperf run lands on different cpus, and we know that results can
have a 25% variability because of that, even more on 2-node systems.
By forcing -T6,6 you force the netperf/netserver cpu, not the RX queues.
A nice effort would be to be able to chose the source in the 4-tuple at
connect() time so that we know that Toeplitz hash will select the
'correct' RX queue.
Powered by blists - more mailing lists