[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8a4ccb05-b9c5-fd45-69cb-c531fd017941@nvidia.com>
Date: Tue, 1 Aug 2023 02:25:34 +0000
From: Chaitanya Kulkarni <chaitanyak@...dia.com>
To: Aurelien Aptel <aaptel@...dia.com>, "sagi@...mberg.me" <sagi@...mberg.me>,
Shai Malin <smalin@...dia.com>
CC: "davem@...emloft.net" <davem@...emloft.net>, Boris Pismenny
<borisp@...dia.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"kuba@...nel.org" <kuba@...nel.org>, "aurelien.aptel@...il.com"
<aurelien.aptel@...il.com>, "hch@....de" <hch@....de>, "axboe@...com"
<axboe@...com>, "malin1024@...il.com" <malin1024@...il.com>, Or Gerlitz
<ogerlitz@...dia.com>, Yoray Zack <yorayz@...dia.com>,
"linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>, Gal Shalom
<galshalom@...dia.com>, Max Gurtovoy <mgurtovoy@...dia.com>,
"kbusch@...nel.org" <kbusch@...nel.org>
Subject: Re: [PATCH v12 07/26] nvme-tcp: Add DDP offload control path
On 7/12/23 09:14, Aurelien Aptel wrote:
> From: Boris Pismenny <borisp@...dia.com>
>
> This commit introduces direct data placement offload to NVME
> TCP. There is a context per queue, which is established after the
> handshake using the sk_add/del NDOs.
>
> Additionally, a resynchronization routine is used to assist
> hardware recovery from TCP OOO, and continue the offload.
> Resynchronization operates as follows:
>
> 1. TCP OOO causes the NIC HW to stop the offload
>
> 2. NIC HW identifies a PDU header at some TCP sequence number,
> and asks NVMe-TCP to confirm it.
> This request is delivered from the NIC driver to NVMe-TCP by first
> finding the socket for the packet that triggered the request, and
> then finding the nvme_tcp_queue that is used by this routine.
> Finally, the request is recorded in the nvme_tcp_queue.
>
> 3. When NVMe-TCP observes the requested TCP sequence, it will compare
> it with the PDU header TCP sequence, and report the result to the
> NIC driver (resync), which will update the HW, and resume offload
> when all is successful.
>
> Some HW implementation such as ConnectX-7 assume linear CCID (0...N-1
> for queue of size N) where the linux nvme driver uses part of the 16
> bit CCID for generation counter. To address that, we use the existing
> quirk in the nvme layer when the HW driver advertises if the device is
> not supports the full 16 bit CCID range.
>
> Furthermore, we let the offloading driver advertise what is the max hw
> sectors/segments via ulp_ddp_limits.
>
> A follow-up patch introduces the data-path changes required for this
> offload.
>
> Socket operations need a netdev reference. This reference is
> dropped on NETDEV_GOING_DOWN events to allow the device to go down in
> a follow-up patch.
>
> Signed-off-by: Boris Pismenny <borisp@...dia.com>
> Signed-off-by: Ben Ben-Ishay <benishay@...dia.com>
> Signed-off-by: Or Gerlitz <ogerlitz@...dia.com>
> Signed-off-by: Yoray Zack <yorayz@...dia.com>
> Signed-off-by: Shai Malin <smalin@...dia.com>
> Signed-off-by: Aurelien Aptel <aaptel@...dia.com>
> Reviewed-by: Chaitanya Kulkarni <kch@...dia.com>
> ---
For NVMe related code :-
Offload feature is configurable and maybe not be turned on in the absence
of the H/W. In order to keep the nvme/host/tcp.c file small to only handle
core related functionality, I wonder if we should to move tcp-offload code
into it's own file say nvme/host/tcp-offload.c ?
-ck
Powered by blists - more mailing lists