[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <72746760-f045-d7bc-1557-255720d7638d@grimberg.me>
Date: Thu, 23 Feb 2023 17:33:40 +0200
From: Sagi Grimberg <sagi@...mberg.me>
To: Aurelien Aptel <aaptel@...dia.com>, linux-nvme@...ts.infradead.org,
netdev@...r.kernel.org, hch@....de, kbusch@...nel.org,
axboe@...com, chaitanyak@...dia.com, davem@...emloft.net,
kuba@...nel.org
Cc: aurelien.aptel@...il.com, smalin@...dia.com, malin1024@...il.com,
ogerlitz@...dia.com, yorayz@...dia.com, borisp@...dia.com
Subject: Re: [PATCH v11 00/25] nvme-tcp receive offloads
> Hi,
>
> Here is the next iteration of our nvme-tcp receive offload series.
>
> The main changes are in patch 3 (netlink).
>
> Rebased on top of today net-next
> 8065c0e13f98 ("Merge branch 'yt8531-support'")
>
> The changes are also available through git:
>
> Repo: https://github.com/aaptel/linux.git branch nvme-rx-offload-v11
> Web: https://github.com/aaptel/linux/tree/nvme-rx-offload-v11
>
> The NVMeTCP offload was presented in netdev 0x16 (video now available):
> - https://netdevconf.info/0x16/session.html?NVMeTCP-Offload-%E2%80%93-Implementation-and-Performance-Gains
> - https://youtu.be/W74TR-SNgi4
>
> From: Aurelien Aptel <aaptel@...dia.com>
> From: Shai Malin <smalin@...dia.com>
> From: Ben Ben-Ishay <benishay@...dia.com>
> From: Boris Pismenny <borisp@...dia.com>
> From: Or Gerlitz <ogerlitz@...dia.com>
> From: Yoray Zack <yorayz@...dia.com>
Hey Aurelien and Co,
I've spent some time today looking at the last iteration of this,
What I cannot understand, is how will this ever be used outside
of the kernel nvme-tcp host driver?
It seems that the interface is diesigned to fit only a kernel
consumer, and a very specific one.
Have you considered using a more standard interfaces to use this
such that spdk or an io_uring based initiator can use it?
To me it appears that:
- ddp limits can be obtained via getsockopt
- sk_add/sk_del can be done via setsockopt
- offloaded DDGST crc can be obtained via something like
msghdr.msg_control
- Perhaps for setting up the offload per IO, recvmsg would be the
vehicle with a new msg flag MSG_RCV_DDP or something, that would hide
all the details of what the HW needs (the command_id would be set
somewhere in the msghdr).
- And all of the resync flow would be something that a separate
ulp socket provider would take care of. Similar to how TLS presents
itself to a tcp application. So the application does not need to be
aware of it.
I'm not sure that such interface could cover everything that is needed,
but what I'm trying to convey, is that the current interface limits the
usability for almost anything else. Please correct me if I'm wrong.
Is this designed to also cater anything else outside of the kernel
nvme-tcp host driver?
> Compatibility
> =============
> * The offload works with bare-metal or SRIOV.
> * The HW can support up to 64K connections per device (assuming no
> other HW accelerations are used). In this series, we will introduce
> the support for up to 4k connections, and we have plans to increase it.
> * SW TLS could not work together with the NVMeTCP offload as the HW
> will need to track the NVMeTCP headers in the TCP stream.
Can't say I like that.
> * The ConnectX HW support HW TLS, but in ConnectX-7 those features
> could not co-exists (and it is not part of this series).
> * The NVMeTCP offload ConnectX 7 HW can support tunneling, but we
> don’t see the need for this feature yet.
> * NVMe poll queues are not in the scope of this series.
bonding/teaming?
>
> Future Work
> ===========
> * NVMeTCP transmit offload.
> * NVMeTCP host offloads incremental features.
> * NVMeTCP target offload.
Which target? which host?
Powered by blists - more mailing lists