netdev - Re: [PATCH xfrm-next v2 0/6] Extend XFRM core to allow full offload configuration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yv41wPd11Sg8WU1F@unreal>
Date:   Thu, 18 Aug 2022 15:51:12 +0300
From:   Leon Romanovsky <leon@...nel.org>
To:     Steffen Klassert <steffen.klassert@...unet.com>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        "David S . Miller" <davem@...emloft.net>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        netdev@...r.kernel.org, Raed Salem <raeds@...dia.com>,
        ipsec-devel <devel@...ux-ipsec.org>,
        Jason Gunthorpe <jgg@...dia.com>
Subject: Re: [PATCH xfrm-next v2 0/6] Extend XFRM core to allow full offload
 configuration

On Thu, Aug 18, 2022 at 12:10:31PM +0200, Steffen Klassert wrote:
> On Thu, Aug 18, 2022 at 08:24:13AM +0300, Leon Romanovsky wrote:
> > On Wed, Aug 17, 2022 at 11:10:52AM -0700, Jakub Kicinski wrote:
> > > On Wed, 17 Aug 2022 08:22:02 +0300 Leon Romanovsky wrote:
> > > > On Tue, Aug 16, 2022 at 07:54:08PM -0700, Jakub Kicinski wrote:
> > > > > This is making a precedent for full tunnel offload in netdev, right?  
> > > > 
> > > > Not really. SW IPsec supports two modes: tunnel and transport.
> > > > 
> > > > However HW and SW stack supports only offload of transport mode.
> > > > This is the case for already merged IPsec crypto offload mode and
> > > > the case for this full offload.
> > > 
> > > My point is on what you called "full offload" vs "crypto offload".
> > > The policy so far has always been that Linux networking stack should
> > > populate all the headers and instruct the device to do crypto, no
> > > header insertion. Obviously we do header insertion in switch/router
> > > offloads but that's different and stateless.
> > > 
> > > I believe the reasoning was to provide as much flexibility and control
> > > to the software as possible while retaining most of the performance
> > > gains.
> > 
> > I honestly don't know the reasoning, but "performance gains" are very
> > limited as long as IPsec stack involved with various policy/state
> > lookups. These lookups are expensive in terms of CPU and they can't
> > hold 400 Gb/s line rate.
> 
> Can you provide some performance results that show the difference
> between crypto and full offload? In particular because on the TX
> path, the full policy and state offload is done twice (in software
> to find the offloading device and then in hardware to match policy
> to state).

I will prepare the numbers.

> 
> > 
> > https://docs.nvidia.com/networking/display/connectx7en/Introduction#Introduction-ConnectX-7400GbEAdapterCards
> > 
> > > 
> > > You must provide a clear analysis (as in examination in data) and
> > > discussion (as in examination in writing) if you're intending to 
> > > change the "let's keep packet formation in the SW" policy. What you 
> > > got below is a good start but not sufficient.
> 
> I'm still a bit unease about this approach. I fear that doing parts
> of statefull IPsec procesing in software and parts in hardware will
> lead to all sort of problems. E.g. with this implementation
> the software has no stats, liftetime, lifebyte and packet count
> information but is responsible to do the IKE communication.
> 
> We might be able to sort out all problems during the upstraming
> process, but I still have no clear picture how this should work
> in the end with all corener cases this creates.

Like we discussed in IPsec coffee hour, there is no reliable way
to synchronize SW and HW. This is why we offload both policy and state
and skip stack completely.

> 
> Also the name full offload is a bit missleading, because the
> software still has to hold all offloaded states and policies.
> In a full offload, the stack would IMO just act as a stub
> layer between IKE and hardware.

It is just a name, I'm open to change it to any other name.

> 
> > > > Some of them:
> > > > 1. Request to have reqid for policy and state. I use reqid for HW
> > > > matching between policy and state.
> > > 
> > > reqid?
> > 
> > Policy and state are matched based on their selectors (src/deet IP, direction ...),
> > but they independent. The reqid is XFRM identification that this specific policy
> > is connected to this specific state.
> > https://www.man7.org/linux/man-pages/man8/ip-xfrm.8.html
> > https://docs.kernel.org/networking/xfrm_device.html
> > ip x s add ....
> >    reqid 0x07 ...
> >    offload dev eth4 dir in
> 
> Can you elaborate this a bit more? Does that matching happen in
> hardware? The reqid is not a unique identifyer to match between
> policy and state. You MUST match the selectors as defined in 
> https://www.rfc-editor.org/rfc/rfc4301

The reqid is needed for TX path and part of mlx5 flow steering logic.
https://lore.kernel.org/netdev/51ee028577396c051604703c46bd31d706b4b387.1660641154.git.leonro@nvidia.com/
I'm relying on it to make sure that both policy and state exist.

For everything else, I rely on selectors.

Thanks

>