netdev - Re: [PATCH xfrm-next v2 0/6] Extend XFRM core to allow full offload configuration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YwTfAg/Fq4m85+u/@unreal>
Date:   Tue, 23 Aug 2022 17:06:58 +0300
From:   Leon Romanovsky <leon@...nel.org>
To:     Steffen Klassert <steffen.klassert@...unet.com>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        Saeed Mahameed <saeed@...nel.org>,
        Jason Gunthorpe <jgg@...dia.com>,
        "David S . Miller" <davem@...emloft.net>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        netdev@...r.kernel.org, Raed Salem <raeds@...dia.com>,
        ipsec-devel <devel@...ux-ipsec.org>
Subject: Re: [PATCH xfrm-next v2 0/6] Extend XFRM core to allow full offload
 configuration

On Tue, Aug 23, 2022 at 07:22:03AM +0200, Steffen Klassert wrote:
> On Mon, Aug 22, 2022 at 05:17:06PM -0700, Jakub Kicinski wrote:
> > On Mon, 22 Aug 2022 14:27:16 -0700 Saeed Mahameed wrote:
> > > >My questions about performance were more about where does
> > > >the performance loss originate. Is it because of loss of GRO?  
> > > 
> > > Performance loss between full and baseline ? it's hardly measurable .. 
> > > less than 3% in the worst case.
> > 
> > The loss for crypto only vs baseline is what I meant. Obviously if we
> > offload everything the CPU perf may look great but we're giving up
> > flexibility and ability to fix problems in SW. 
> 
> The main difference between baseline TCP and crypto offload
> is the policy/state lookup and ESP encapsulation/decapsulation.
> 
> We don't loose GRO on crypto offload. The ESP header/trailer
> is stripped away in the GRO layer so that the inner TCP
> packet gets GRO. But we loose hardware LRO, as the ESP
> headers are not stripped in HW.
> 
> It would be really good to see where the bottlenecks are
> with crypto offload (RX or TX).
> 
> Also, it would be good to see why the full offload performs
> better. Some perf traces would be helpfull.
> 
> When I thought about possible offloads, I came to three
> different types:
> 
> 1. encapsulation offload:
>    - Kernel does policy enforcement
>    - NIC does encapsulation
>    - NIC manages anti replay window and lifetime of SAs
>    - NIC sends lifetime and anti replay notifications to the kernel
>    - The Kernel talks to the keymanager
> 
> 2. statefull offload with fallback:
>    - NIC handles the full datapath, but kernel can take over (fragments
>      etc.)
>    - Kernel and NIC hold the full SA and policy database
>    - Kernel and NIC must sync the state (lifetime, replay window etc.)
>      of SAs and policies
>    - The Kernel talks to the keymanager
> 
> 3. statefull offload:
>    - NIC handles the full datapath
>    - NIC talks directly with the keymanager
>    - Kernel xfrm acts as a stub layer to pass messages from
>      userspace to the NIC.
> 
> The offload that is implemented here comes option 2 closest.
> The statefull handling is shared between host and NIC.
> Unfortunalely, this is the most complicated one.

Our implementation something like option 2 but without fallback.

1. I didn't implement fallback for a number of reasons:
 * It is almost impossible to keep in sync HW and SW states in linerate.
 * IPv6 sets (require???) do-not-fragment bit.
2. NIC and kernel keep their SA and policy databases in sync to make
sure that they both see coherent picture and for users to be able to
use existing iproute2 tools.
3. Like any other offload, I want to make sure that kernel in the middle
between our HW and user. This includes keymanager.

> 
> If just encapsulation/decapsulation brings the performance
> we could also go with option 1. That would be still a
> stateless offload.

Option 1 isn't sufficient for us as it doesn't support combination of TC
and eswitch manager. In that mode, packets that arrive to uplink forwarded
directly to representor without kernel involvement,

> 
> Option 3 is what I would consider as a full offload.
> Kernel acts as a stateless stub layer, NIC does
> statefull IPsec.