[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YwRmzozIY4iqKTs2@unreal>
Date: Tue, 23 Aug 2022 08:34:06 +0300
From: Leon Romanovsky <leon@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Steffen Klassert <steffen.klassert@...unet.com>,
Jason Gunthorpe <jgg@...dia.com>,
"David S . Miller" <davem@...emloft.net>,
Herbert Xu <herbert@...dor.apana.org.au>,
netdev@...r.kernel.org, Raed Salem <raeds@...dia.com>,
ipsec-devel <devel@...ux-ipsec.org>
Subject: Re: [PATCH xfrm-next v2 0/6] Extend XFRM core to allow full offload
configuration
On Mon, Aug 22, 2022 at 09:33:04AM -0700, Jakub Kicinski wrote:
> On Mon, 22 Aug 2022 11:54:42 +0300 Leon Romanovsky wrote:
> > On Mon, Aug 22, 2022 at 10:41:05AM +0200, Steffen Klassert wrote:
> > > On Fri, Aug 19, 2022 at 10:53:56AM -0700, Jakub Kicinski wrote:
> > > > Yup, that's what I thought you'd say. Can't argue with that use case
> > > > if Steffen is satisfied with the technical aspects.
> > >
> > > Yes, everything that can help to overcome the performance problems
> > > can help and I'm interested in this type of offload. But we need to
> > > make sure the API is usable by the whole community, so I don't
> > > want an API for some special case one of the NIC vendors is
> > > interested in.
> >
> > BTW, we have a performance data, I planned to send it as part of cover
> > letter for v3, but it is worth to share it now.
> >
> > ================================================================================
> > Performance results:
> >
> > TCP multi-stream, using iperf3 instance per-CPU.
> > +----------------------+--------+--------+--------+--------+---------+---------+
> > | | 1 CPU | 2 CPUs | 4 CPUs | 8 CPUs | 16 CPUs | 32 CPUs |
> > | +--------+--------+--------+--------+---------+---------+
> > | | BW (Gbps) |
> > +----------------------+--------+--------+-------+---------+---------+---------+
> > | Baseline | 27.9 | 59 | 93.1 | 92.8 | 93.7 | 94.4 |
> > +----------------------+--------+--------+-------+---------+---------+---------+
> > | Software IPsec | 6 | 11.9 | 23.3 | 45.9 | 83.8 | 91.8 |
> > +----------------------+--------+--------+-------+---------+---------+---------+
> > | IPsec crypto offload | 15 | 29.7 | 58.5 | 89.6 | 90.4 | 90.8 |
> > +----------------------+--------+--------+-------+---------+---------+---------+
> > | IPsec full offload | 28 | 57 | 90.7 | 91 | 91.3 | 91.9 |
> > +----------------------+--------+--------+-------+---------+---------+---------+
> >
> > IPsec full offload mode behaves as baseline and reaches linerate with same amount
> > of CPUs.
> >
> > Setups details (similar for both sides):
> > * NIC: ConnectX6-DX dual port, 100 Gbps each.
> > Single port used in the tests.
> > * CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
>
> My questions about performance were more about where does
> the performance loss originate. Is it because of loss of GRO?
> Maybe sharing perf traces could answer some of those questions?
Crypto mode doesn't scale good in terms of CPUs.
CPU load data:
* Remind that this is 160 CPUs machine with 2 threads per-core
Baseline:
PROCESSES TOTAL_BW HOST_LOCAL_CPU HOST_REMOTE_CPU
1 27.95 0.6 1.1
2 58.99 1 2
4 93.05 1.3 3.2
8 92.75 2 3.4
16 93.74 2.2 4
32 94.37 2.6 4.5
IPsec crypto:
PROCESSES TOTAL_BW HOST_LOCAL_CPU HOST_REMOTE_CPU
1 15.04 0.7 1.2
2 29.68 1.2 2.1
4 58.52 2 3.9
8 89.58 2.8 5.1
16 90.42 3.1 7.1
32 90.81 3.16 6.9
Powered by blists - more mailing lists