[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <AM4PR0501MB1940F3957F0AE02C355717AEDBD40@AM4PR0501MB1940.eurprd05.prod.outlook.com>
Date: Wed, 5 Jul 2017 09:01:11 +0000
From: Ilan Tayari <ilant@...lanox.com>
To: Ilan Tayari <ilant@...lanox.com>, Florian Westphal <fw@...len.de>
CC: Yossi Kuperman <yossiku@...lanox.com>,
Steffen Klassert <steffen.klassert@...unet.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [RFC net-next 9/9] xfrm: add a small xdst pcpu cache
> -----Original Message-----
> From: netdev-owner@...r.kernel.org [mailto:netdev-owner@...r.kernel.org]
> Subject: RE: [RFC net-next 9/9] xfrm: add a small xdst pcpu cache
>
> > -----Original Message-----
> > From: netdev-owner@...r.kernel.org [mailto:netdev-owner@...r.kernel.org]
> > Subject: [RFC net-next 9/9] xfrm: add a small xdst pcpu cache
> >
> > retain last used xfrm_dst in a pcpu cache.
> > On next request, reuse this dst if the policies are the same.
> >
> > The cache does'nt help at all with strictly-RR workloads as
> > we never have a hit.
> >
> > Also, the cache adds cost of this_cpu_xchg() in packet path.
> > It would be better to use plain this_cpu_read/write, however,
> > a netdev notifier can run in parallel on other cpu and write same
> > pcpu value so the xchg is needed to avoid race.
> >
> > The notifier is needed so we do not add long hangs when a device
> > is dismantled but some pcpu xdst still holds a reference.
> >
> > Test results using 4 network namespaces and null encryption:
> >
> > ns1 ns2 -> ns3 -> ns4
> > netperf -> xfrm/null enc -> xfrm/null dec -> netserver
> >
> > what TCP_STREAM UDP_STREAM UDP_RR
> > Flow cache: 14804.4 279.738 326213.0
> > No flow cache: 14158.3 257.458 228486.8
> > Pcpu cache: 14766.4 286.958 239433.5
> >
> > UDP tests used 64byte packets, tests ran for one minute each,
> > value is average over ten iterations.
>
> Hi Florian,
>
> I want to give this a go with hw-offload and see the impact on
> performance.
> It may take us a few days to do that.
Hi Florian,
We tested with and without your patchset, using single SA with hw-crypto
offload (RFC4106) IPv4 ESP tunnel mode, and a single netperf TCP_STREAM
with a few different messages Sizes.
We didn't separate the pcpu cache patch from the rest of the patchset.
Here are the findings:
What 64-byte 512-byte 1024-byte 1500-byte
Flow cache 1602.89 11004.97 14634.46 14577.60
Pcpu cache 1513.38 10862.55 14246.94 14231.07
The overall degradation seems a bit more than what you measured with
null-crypto.
We used two machines and no namespaces.
Ilan.
Powered by blists - more mailing lists